Lesson 3. Automate Tasks With Loops


In this lesson, you will learn how to write Python code to automate tasks using loops.

Learning Objectives

After completing this lesson, you will be able to:

  • Start automating your tasks with loops in Python

What You Need

Be sure that you have completed the lessons on Intro to DRY Code and Intro to Loops.

Example: Create New List of Calculated Values

Recall that in the lessons on variables and lists, you learned how to run calculations on individual variables to convert the units, and then you created a list that contained the recalculated values.

Average monthly precipitation for Boulder, Colorado, provided by the U.S. National Oceanic and Atmospheric Administration (NOAA).

MonthPrecipitation (inches)
Jan0.70
Feb0.75
Mar1.85
Apr2.93
May3.05
June2.02
July1.93
Aug1.62
Sept1.84
Oct1.31
Nov1.39
Dec0.84

In this lesson, you will build a loop to automate these tasks, so that you recalculate each value in a list, and then add the new value to a new list.

Create Variable For Loop

Begin by creating a variable upon which your loop will execute. In this case, you will run the calculation * 25.4 on each item in a list containing the average monthly precipitation values.

# create a list of average monthly precipitation in inches
avg_monthly_precip_in = [0.70, 0.75, 1.85, 2.93, 3.05, 2.02, 1.93, 1.62, 1.84, 1.31, 1.39, 0.84]

# print list
print(avg_monthly_precip_in)
[0.7, 0.75, 1.85, 2.93, 3.05, 2.02, 1.93, 1.62, 1.84, 1.31, 1.39, 0.84]

Select Type of Loop

Next, select and compose a structure for your loop. Think about whether a while or for loop would work better for your task. You have a fixed list of values upon which you want to iterate a calculation.

Which type of loop structure is used in the code below?

# use loop to convert values in `avg_monthly_precip_in`
for month in avg_monthly_precip_in:
    
    # multiply each item in the list by 25.4 to convert from inches to mm
    month = month * 25.4
    
    # print the new value of each item within the loop, so you see results after each iteration
    print(month)    
17.779999999999998
19.049999999999997
46.99
74.422
77.46999999999998
51.308
49.022
41.148
46.736
33.274
35.306
21.336

Expand Loop To Include More Tasks

In previous lessons, you learned how to append items to the end of an existing list using listname += [value], which employs an assignment operator to add the new values to an existing list.

You can add do this with your loop with only two new lines of code.

First, create an empty Python list that will receive new values using listname = [].

Then, you can add a new line of code to your loop that will use listname += [value] to append each value after it is calculated.

# create a new list that is empty, so that you can add the values calculated in the loop
avg_monthly_precip_mm = []

# use loop to convert values in `avg_monthly_precip_in` and add values to new list
for month in avg_monthly_precip_in:
    
    # multiply each item in the list by 25.4 to convert from inches to mm
    month = month * 25.4
    
    # add each item to the new list 
    avg_monthly_precip_mm += [month]
    
# print the new list after the loop has completed
print(avg_monthly_precip_mm) 
[17.779999999999998, 19.049999999999997, 46.99, 74.422, 77.46999999999998, 51.308, 49.022, 41.148, 46.736, 33.274, 35.306, 21.336]

Look carefully at how the variables avg_monthly_precip_mm and month are created.

The list variable avg_monthly_precip_mm was explicitly created, meaning that you initalized and assigned the value to the variable manually. In this case, you manually created the variable avg_monthly_precip_mm as an empty list.

The variable month is an implicit variable, meaning that it was not explicitly created by you, by rather it is created as part of the loop and serves as a placeholder to receive data in each iteration of the loop. At the end of the loop, an implicit variable is equal to the last value that it was assigned.

Be mindful of the differences between implicit and explicit variables, as sometimes you may have to employ a slightly different syntax when trying to use implicit variables to access data within data structures.

In this course, syntax differences will be noted in the lessons and in the assignments.

Example: Run a Summary Statistic on Multiple Numpy Arrays

By now, you may be excited that you can automate these kinds of tasks, but you may also be thinking that you would prefer to iterate on numpy arrays or pandas dataframes, instead of working with data values in lists.

You can do that, too! For example, you can build a loop that will calculate summary statistics (such as the sum or median values) of multiple data structures.

Note that these two summary statistics (i.e. sum and median) are not provided by the describe() method of pandas dataframes, so this is a good time to use numpy arrays.

You can use the functions np.sum() and np.median() to calculate sum and median values of a numpy array.

Create Variables For Loop

Just like before, create the variables containing the values you want to iterate upon.

Begin with two new numpy array containing the monthly precipitation values in 2002 and 2013 for Boulder, Colorado, provided by the U.S. National Oceanic and Atmospheric Administration (NOAA).

MonthPrecipitation (inches) in 2002Precipitation (inches) in 2013
Jan1.070.27
Feb0.441.13
Mar1.501.72
Apr0.204.14
May3.202.66
June1.180.61
July0.091.03
Aug1.441.40
Sept1.5218.16
Oct2.442.24
Nov0.780.29
Dec0.020.50
# import necessary packages
import numpy as np

# manually create a new numpy array for 2002
avg_monthly_precip_2002 = np.array([1.07, 0.44, 1.50, 0.20, 3.20, 1.18, 0.09, 1.44, 1.52, 2.44, 0.78, 0.02])

# manually create a new numpy array for 2013
avg_monthly_precip_2013 = np.array([0.27, 1.13, 1.72, 4.14, 2.66, 0.61, 1.03, 1.40, 18.16, 2.24, 0.29, 0.50])

# create list of numpy arrays for iteration
arraylist = [avg_monthly_precip_2002, avg_monthly_precip_2013]

Select Type of Loop

Again, think about what type of loop would work best for this data.

This time you have two objects upon which you want to iterate a calculation: the numpy array for 2002 and the the numpy array for 2013.

Use the np.sum() and np.median() functions to calculate these statistics on the numpy arrays.

# use loop to calculate sum and median values for each array in arraylist
for array in arraylist:
    
    array_sum = np.sum(array)
    array_median = np.median(array)
    
    # print the calculated sum and median values within the loop, so you see result for each array
    print("sum:", array_sum)
    print("median:", array_median)
    print("")
    
sum: 13.879999999999999
median: 1.125

sum: 34.15
median: 1.265

Example: Run a Calculation on Multiple Columns in Pandas Dataframe

Another example would include a loop that runs not on multiple data structures, but for example, on multiple columns in a pandas dataframe.

Create Variables For Loop

Just like before, create the variable containing the values you want to iterate upon.

Begin by creating a new pandas dataframe. Download and import precip-2002-2013-months-seasons.csv)from https://ndownloader.figshare.com/files/12710621.

# import necessary Python packages
import os
import urllib.request
import pandas as pd

# replace `jpalomino` with your username here and all paths in this lesson
os.chdir("/home/jpalomino/earth-analytics-bootcamp/")

# download .csv containing monthly precipitation for Boulder, CO in 2002 and 2013
urllib.request.urlretrieve(url = "https://ndownloader.figshare.com/files/12710621", 
                           filename = "data/precip-2002-2013-months-seasons.csv")

# import the monthly precipitation values in 2002 and 2013 as a pandas dataframe
precip_2002_2013 = pd.read_csv("/home/jpalomino/earth-analytics-bootcamp/data/precip-2002-2013-months-seasons.csv")

# print data
precip_2002_2013

# create a list of columns for iteration
columnlist = ["precip_2002", "precip_2013"]
monthsprecip_2002precip_2013seasons
0Jan1.070.27Winter
1Feb0.441.13Winter
2Mar1.501.72Spring
3Apr0.204.14Spring
4May3.202.66Spring
5June1.180.61Summer
6July0.091.03Summer
7Aug1.441.40Summer
8Sept1.5218.16Fall
9Oct2.442.24Fall
10Nov0.780.29Fall
11Dec0.020.50Winter

Select Type of Loop

Again, think about what type of loop would work best for this data.

This time you have two columns in one pandas dataframe upon which you want to iterate a calculation: one for 2002 and one for 2013. Recall how to recalculate columns in pandas dataframes (e.g. dataframe.column = dataframe.column + 4).

Is that the syntax used below?

# use loop to recalculate each column in `columnlist`
for column in columnlist:
    
    precip_2002_2013[[column]] = precip_2002_2013[[column]] * 25.4
        
precip_2002_2013
monthsprecip_2002precip_2013seasons
0Jan27.1786.858Winter
1Feb11.17628.702Winter
2Mar38.10043.688Spring
3Apr5.080105.156Spring
4May81.28067.564Spring
5June29.97215.494Summer
6July2.28626.162Summer
7Aug36.57635.560Summer
8Sept38.608461.264Fall
9Oct61.97656.896Fall
10Nov19.8127.366Fall
11Dec0.50812.700Winter

Note that this is an instance of when you need a slighly different syntax to use implicit variables to access data within data structures (e.g. dataframe[[column]] = dataframe[[column]] * 25.4).

You know you are using an implicit variable because the column name will change with each iteration.

In the first iteration, column would contain the values for precip_2002, while in the second iteration, column would contain the values for precip_2013.

Also, notice the placement of the dataframe name (e.g. precip_2002_2013) after the loop to display the results. It is not contained with the loop, so you do not see the dataframe each time that the loop iterates. You only see the dataframe when the loop is completed.

Congratulations - you have automated your first tasks in this course using Python!

Optional Challenge

Test your Python skills to:

  1. Expand the loop for the numpy array example above to convert the values in each numpy array from inches to millimeters (1 inch = 25.4 millimeters), before calculating the summary statistics (hint: you only need to add one line!).

It can also help to think about how these types of calculations are completed on numpy arrays. Recall how you previously converted the values in a numpy array in the numpy array lessons.

sum: 352.55199999999996
median: 28.575

sum: 867.4099999999999
median: 32.13099999999999

Leave a Comment