# Lesson 5. Automate Workflows Using Loops in R

Clean coding tidyverse intro

## Learning objectives

At the end of this activity, you will be able to:

• Use for-loops to handle repetitive tasks
• Bind multiple data frames together by row

## Don’t Repeat Yourself (DRY)

The DRY (Don’t Repeat Yourself) principles refers to repeating code over and over in a script. When you notice yourself doing this, it’s a good time to consider whether there is another approach that may be more efficient.

A snippet of the code that we examined at the beginning of this workshop is below. Notice here, our colleague is building a data.frame of elements, manually, line by line.

finalSUMMARYmean <- data.frame(jan_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "01"], na.rm = TRUE),
feb_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "02"], na.rm = TRUE),
march_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "03"], na.rm = TRUE),
apr_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "04"], na.rm = TRUE),
may_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "05"], na.rm = TRUE),
june_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "05"], na.rm = TRUE),
may_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "06"], na.rm = TRUE),
july_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "07"], na.rm = TRUE),
aug_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "08"], na.rm = TRUE),
sept_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "09"], na.rm = TRUE),
oct_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "09"], na.rm = TRUE),
nov_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "11"], na.rm = TRUE),
dec_mean_2003 = mean(myFinalData$HPCP[myFinalData$month == "12"], na.rm = TRUE))

finalSUMMARYmean


Similarly, our colleague may opt to open a set of csv files line by line.

myDATA1 <- read.csv("https://s3-us-west-2.amazonaws.com/earthlab-teaching/vchm/My_Data2004.csv",
na.strings = c("999.99"))

na.strings = c("999.99"))

na.strings = c("999.99"))


We refer to this as copy pasta. When you repeat code over and over. The DRY principle supports automating these types of tasks using for-loops, functions and other approaches. In this lesson we will review using for loops to automate opening and aggregating a set of .csv files.

## For-loops in R

For-loops provide a way to iterate over objects in R. For example, pretend you want to print each number in a sequence of numbers: 1:10. You can do that with a for loop as follows:

numbers <- 1:10
for (i in numbers) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10


In the above for-loop, the object i will take on the values in numbers sequentially: first i will be set to the first element in numbers (1), then it will be set to the second element (2), and so on, until finally i = 10. Everything contained within the curly braces {...} is considered the body of the for-loop, and this will be executed for every iteration of the loop.

IMPORTANT: The variable, i, and any other variable in the for-loop will persist as an object in your R environment after the for-loop is done executing. This is the opposite of what you may have learned working with functions.

i
## [1] 10


You can use a for-loop in the same way with a character vector:

charvec <- c('first element', 'second element', 'third element')
for (i in charvec) {
print(i)
}
## [1] "first element"
## [1] "second element"
## [1] "third element"


Notice here that i takes on the values of the elements of charvec.

When iterating over objects with loops, it is often useful to use the seq_along function to create a numeric sequence of element indices. For example, here’s what seq_along returns when given our charvec as an input:

seq_along(charvec)
## [1] 1 2 3


Protip: Using seq_along in a for-loop allows you to get numeric indices for the object that you want to iterate over:

for (i in seq_along(charvec)) {
print(paste('i =', i))
}
## [1] "i = 1"
## [1] "i = 2"
## [1] "i = 3"


Of course, if you wanted to iterate over charvec and still get the character elements, you can use these i values as indices:

for (i in seq_along(charvec)) {
print(paste('charvec[i] =', charvec[i]))
}
## [1] "charvec[i] = first element"
## [1] "charvec[i] = second element"
## [1] "charvec[i] = third element"


### Populate Objects with For-loops

Suppose you wanted to create a list, and have each element in that list be some number. You could create an empty list, then populate each element in that list using a for loop.

my_list <- list()
for (i in seq_along(charvec)) {
my_list[i] <- charvec[i]
}


Let’s dissect this a bit.

1. First, you create an empty list using my_list <- list(). This is the object that you will populate in the loop.
2. Then, you use seq_along(charvec) as the thing to iterate over with our for-loop, so that first i=1, then i=2, then i=3, because there are 3 elements in charvec.
3. Finally, within the body of the for loop (between the curly braces), you assign charvec[i] to be the #i element in my_list.

• What will my_list be after this for-loop? +What is its class? +What is its length? +What is the first element?

Let’s have a look:

class(my_list)
## [1] "list"

length(my_list)
## [1] 3

my_list
## [[1]]
## [1] "first element"
##
## [[2]]
## [1] "second element"
##
## [[3]]
## [1] "third element"


## Challenge

There are multiple url’s in the data/data_urls.csv file that you provided in this workshop. Your challenge is to combine all of the .csv files into 1 data.frame in R.

Your list of url’s looks something like the code below

urls <- c(
'https://s3-us-west-2.amazonaws.com/earthlab-teaching/vchm/My_Data2003-boulder.csv',
'https://s3-us-west-2.amazonaws.com/earthlab-teaching/vchm/My_Data2003-denver.csv',
'https://s3-us-west-2.amazonaws.com/earthlab-teaching/vchm/My_Data2003-lyons.csv'
)


• You might find it useful to populate a list, so that each element in your list is a data.frame.
• You can create one data.frame from a list of data.frames using the function, bind_rows(list_object_here). bind_rows is a dplyr function that combines data.frames contained in a list row-wise (it stacks them on top of each other).