Lesson 1. Introduction to Using Loops to Automate Workflows in Open Source Python
Introduction to Loops in Python - Intro to earth data science textbook course moduleWelcome to the first lesson in the Introduction to Loops in Python module. Loops can help reduce repetition in code by iteratively executing the same code on a range or list of values. Learn how to write loops in Python to write Do Not Repeat Yourself, or DRY, code in Python.
Chapter Eighteen - Loops
In this chapter, you will learn about the basic types of loops in Python and how you can use them to remove repetition in your code by iteratively executing code on a range or list of values.
After completing this chapter, you will be able to:
- Explain how using loops help you to write DRY (Don’t Repeat Yourself) code in Python.
- Describe the syntax for two basic looping structures used in Python:
- Remove repetition in code by using loops to automate data tasks.
What You Need
You should have Conda setup on your computer and the Earth Analytics Python Conda environment. Follow the Set up Git, Bash, and Conda on your computer to install these tools.
Be sure that you have completed the chapters on Jupyter Notebook, Numpy Arrays, and Pandas Dataframes.
Why Use Loops – Make Your Code DRY
A loop is a sequence of operations that are performed over and over in some specified order. Following the DRY (Don’t Repeat Yourself) principle of clean coding, loops can help you to eliminate repetition in code by replacing duplicate lines of code with an iteration, meaning that you can iteratively execute the same code line or block until it reaches a specified end point.
Instead of copying and pasting the same code over and over to run it multiple times (say, on multiple variables or files), you can create a loop that contains one instance of the code and that executes that code block on a range or list of values that you provide.
For example, the following lines could be replaced by a loop that iterates over a list of variable names and executes the
print() function until it reaches the end of the list:
# Create 5 objects list_of_values = [1, 2, 3, 4, 5] # Print each object separately print(list_of_values)
[1, 2, 3, 4, 5]
You can create a loop that will iterate through each value and print it.
# Cycle through each object and print for avalue in list_of_values: print(avalue)
1 2 3 4 5
Once you have your list of values, you can enhance the print statement by adding text.
# Cycle through each object and print with enchancement for avalue in list_of_values: print("the current value is:", avalue)
the current value is: 1 the current value is: 2 the current value is: 3 the current value is: 4 the current value is: 5
You can also use a loop to iteratively adjust each value. Below you add 1 to each value in the list, using your loop.
# Cycle through each object and print for avalue in list_of_values: print("the current value is:", avalue+1)
the current value is: 2 the current value is: 3 the current value is: 4 the current value is: 5 the current value is: 6
You can create lists of variables, filenames, or other objects like data structures upon which you want to execute the same code. These lists can then be used to execute the code iteratively on each item in the list. You can also provide a numerical range of values to control how many times the code is executed.
In Python, there are two primary structures for loops:
for. Below you will learn about each one and how they can help you to write DRY (Don’t Repeat Yourself) code.
Create For Loops in Python
for loop will iteratively execute code for each item in a pre-defined list. This list can be composed of numeric values, filenames, individual characters in a text string and objects such as data structures. Similar to the
while (discussed below) loop, the syntax of a
for loop includes rule (followed by a colon
:) and indentations of the code lines that will be iterated.
The main differences are that the loop begins with the word
for and that a pre-defined list is explicitly referenced in the loop (e.g.
item_list in the example below). The list to iterate upon is defined before the code for the
item_list = [item_1, item_2, item_3] for i in item_list: execute some code here
There is also a placeholder in the loop (e.g.
i) that represents the items of the list. The initial value of
i is equal to the first item in the list; as the loop iterates through the list, the value of
i changes to hold the value of the next item in the list.
In the first example below, a list with 4 non-sequential values is created. The loop executes the provided code (e.g.
print()) on each item in the list.
# Create list of integers num_list = [12, 5, 136, 47] # For each item in list, add 10 and print new value for i in num_list: i += 10 print(i)
22 15 146 57
Note that in this example, the values are not in sequential order or follow any kind of pattern. However, the loop is executed on each item, in the same order in which they are defined in the list. This is a unique characteristic of
for loops; the placeholder represents the value of whichever item is being accessed from the list in that iteration. Thus,
i is not a pre-defined variable, but rather a placeholder for the current item that the loop is working with in each iteration of the code.
This means that you could use any word or character to indicate the placeholder, with the exception of numeric values. You simply need to reuse that same word or character in the code lines that are being executed, in order to use the placeholder to access the items in the list. For example, the placeholder could be called
x, or even something completely unrelated like
# Create list of integers num_list = [12, 5, 136, 47] # For each item in list, add 10 and print new value for x in num_list: x += 10 print("The value of the variable 'x' is:", x)
The value of the variable 'x' is: 22 The value of the variable 'x' is: 15 The value of the variable 'x' is: 146 The value of the variable 'x' is: 57
# Create list of integers num_list = [12, 5, 136, 47] # For each item in list, add 10 and print new value for banana in num_list: banana += 10 print("The value of the variable 'banana' is:", banana)
The value of the variable 'banana' is: 22 The value of the variable 'banana' is: 15 The value of the variable 'banana' is: 146 The value of the variable 'banana' is: 57
In this first example,
num_list contains only numeric values, but you can also iterate on lists that contain other types, such as text strings for the names of files or data structures (including even the names of other lists!).
Create For Loops with Text Strings
In the example below, a list called
files is defined with two text strings that represent two filenames. The
for loop will run iteratively on each text string (represented by the placeholder
fname in each iteration).
In the first iteration of the loop,
fname is equal to the text string
"months.txt"; in the second iteration,
fname represents the text string
# List of filenames files = ["months.txt", "avg-monthly-precip.txt"] # For each item in list, print value for fname in files: print("The value of the variable 'fname' is:", fname)
The value of the variable 'fname' is: months.txt The value of the variable 'fname' is: avg-monthly-precip.txt
print() returns the text string that is the filename, but not the values in the file. This is because the list only contains text strings for the filenames, but does not actually contain the contents of the file. In fact, Python does not know that these text strings are filenames; it simply treats these as text string items in a list.
Taking another look at the definition of the list, you can notice that the items are just text strings identified by quotes
"". The list does not contain variables or objects that have been defined (e.g. numpy arrays, pandas dataframes, or even other lists). Thus, it is important to remember that the objects that you use in the loop will determine what output you will receive.
For Loops on Data Structures
For example, you can define a list that contains multiple objects (say, other lists). In the example below, the
print() returns the actual values in each item in the list because the items are defined objects (in this example, each item is its own list of data values).
In each iteration of the loop,
dlist represents the data lists that you defined: in first iteration,
avg_monthly_precip in the second iteration.
# Create list of abbreviated month names months = ["Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"] # Create list of average monthly precip (inches) in Boulder, CO avg_monthly_precip = [0.70, 0.75, 1.85, 2.93, 3.05, 2.02, 1.93, 1.62, 1.84, 1.31, 1.39, 0.84] # List of list names lists = [months, avg_monthly_precip] # For each item in list, print value for dlist in lists: print("The value of the variable 'dlist' is:", dlist)
The value of the variable 'dlist' is: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'July', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'] The value of the variable 'dlist' is: [0.7, 0.75, 1.85, 2.93, 3.05, 2.02, 1.93, 1.62, 1.84, 1.31, 1.39, 0.84]
When working with data structures such as lists, numpy arrays, or pandas dataframes, the code that is executed in the loop is still subject to the object type that is provided. For example, because the items in
lists are lists, then you can execute any code that can run on the list object type, such as querying the length of the list or for a specific index value.
# For each list in lists, print the length for dlist in lists: print("The length of the variable 'dlist' is:", len(dlist))
The length of the variable 'dlist' is: 12 The length of the variable 'dlist' is: 12
# For each list in lists, print the value at last index for dlist in lists: print(dlist[-1])
However, you would not be able to call a method or attribute that does not exist for that data structure. For example, while
.shape is an attribute of numpy arrays that provides the number of elements, or rows and columns,
.shape is not an attribute of lists. An attempt to call
.shape on a list would result in an error, and thus, a failure of the loop as it cannot execute that code.
# For each list in lists, print attribute shape for dlist in lists: print(dlist.shape)
AttributeError: 'list' object has no attribute 'shape'
while loop is used to iteratively execute code until a pre-defined condition is no longer satisfied (i.e. results in a value of
False). That condition could be a limit on how many times you want the code to run, or that results of the code reach a certain value (e.g. code will iteratively execute as long as the current value of the results is less than 5). After the pre-defined condition is no longer
True, the loop will not execute another iteration.
while x < 5: execute some code here
Notice that the loop begins with
while followed by a condition that ends with a colon
:. Also, notice that the code below the
while statement is indented. This indentation is important, as it indicates that the code will be executed as part of the loop within it is contained, not after the loop is completed.
Check out the examples below to see the
while loop in action. The first example uses a
while loop to iteratively add a value of
1 to a variable
x, as long as the current value of
x does not exceed a specified value. In this example, a comparison operator (e.g.
<) is used to compare the value of
x (which begins with value of
0) to the value
10, the designated end point. Within the loop, an assignment operator (
+=) is used to add a value of
x, and the current value of
x is printed. This process repeats each time that the code iterates, until the current value of
x is no longer less than
# Set x equal to 0 x = 0 # Add 1 to x until x is no longer less than 10 while x < 10: x += 1 print(x)
1 2 3 4 5 6 7 8 9 10
x reaches 10, the condition is no longer true (as
x is no longer less than
10), so the loop ends and does not execute another iteration of the code. Note that to use the value of the variable
x as a condition for the loop, you must use the correct variable name in the loop (e.g.
x), so that the condition can check the status of
x as the loop iterates.
Also, note that the code within the loop is executed in order, meaning that the
print() function executes after the value of
1 is added to
x. You can change that order if you want to see the value of
x, before the value of
1 has been added.
# Reset x equal to 0 x = 0 # Add 1 to x until x is no longer less than 10 while x < 10: print(x) x += 1 print("Final value:", x)
0 1 2 3 4 5 6 7 8 9 Final value: 10
In this version of the loop, the value of
x is printed first, and then, the assignment operator is executed to add a value of
x. Note that any code provided after the loop ends will be executed at that point (e.g. the printing of the final value).
Compare the first
while loop to the code below that you would have to write to accomplish the same task without a loop. The
while loop not only replaces many lines of code to remove repetition, but it also makes the code easier to read and write, making the workflow easier for everyone involved.
# Reset x equal to 0 x = 0 # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x) # Add 1 to x x += 1 print(x)
1 2 3 4 5 6 7 8 9 10
In addition to using comparison operators to compare values, you can also specify a range of values to limit the duration of the
while loop. The range is inclusive of the starting value, but not of the ending value.
# Set x equal to 1 x = 1 # Add 1 to x, while x is between 1 and 5 while x in range(1, 5): x += 1 print(x)
2 3 4 5
Note that the structure of the
while loop remains the same, regarding the use of a condition, colon, and indentations of the code that will be iterated. In this example, rather than using an explicit comparison operator, you are stating that the code below can continue to execute as long as the value of
x remains within a certain range.
Specifically, the code will execute as long as the value of
x exists within a numeric range that begins at
1 to and ends with
5, but is not inclusive of
5. Thus, the loop ends when
5 and does not execute another iteration of the code.
In the rest of this chapter, you will continue to apply loops to data structures, including lists, numpy arrays and pandas dataframes and learn how to automate data tasks using loops.
Leave a Comment