Lesson 1. Activity Data Structures


Practice Working With Data Structures in Python - Earth analytics bootcamp course module

Welcome to the first lesson in the Practice Working With Data Structures in Python module. This tutorial provides an opportunity to practice working with commonly used Python data structures for scientific data: lists, numpy arrays, and pandas dataframes.

Hands-on Practice With Data Structures

This hands-on activity provides you an opportunity to practice working with the data structures used in this course: lists, numpy arrays and pandas dataframes. You will also practice submitting pull requests to Github repositories.

While this activity will not be formally graded, you can earn participation points for submitting your completed Jupyter Notebook for this activity.

What You Need

Be sure that you have completed all of the lessons from Days 1-5 for the Earth Analytics Bootcamp. Completing the challenges at the end of the lessons will also help you with this assignment.

You will need to fork and clone a Github repository for this activity:

https://github.com/earthlab-education/ea-bootcamp-practice-data-structures

Part I: Create and Modify a Jupyter Notebook

Begin by creating a new Jupyter Notebook in your forked repository (ea-bootcamp-practice-data-structures).

Rename the file to firstinitial-lastname-practice-data-structures.ipynb (e.g. jpalomino-practice-data-structures.ipynb).

Note that Git will recognize this new Jupyter Notebook as a new file that can be added, committed, and pushed back to your forked repository on Github.com.

Practice Documentation

Add a Markdown cell before each code cell you create to describe the purpose of your code (e.g. what are you accomplishing by executing this code?).

Within code cells, be sure to also add Python comments to document each code block and use appropriate variable names that are short and concise but also clearly indicate the kind of data contained in the variable. Review the variable names that you have seen throughout the lessons.

Question 1: Markdown Titles

Use Markdown to add a title and author for your new Jupyter Notebook using Markdown (e.g. Earth Analytics Bootcamp - Practice Activity on Data Structures and Author: Jenny Palomino). Bold the word Author.

Question 2: Import Python Packages

You will be creating lists, numpy arrays, and pandas dataframes. You will also be creating plots and downloading data from Figshare.com

Import the necessary Python packages to accomplish these tasks. Review the lessons as needed to figure out which packages you need to import.

Question 3: Create List of Data Values

Create and print a Python list of the average monthly temperature (Celsius) in Boulder, CO:

MonthTemperature (Celsius)
Jan0.0
Feb2.00
Mar5.0
Apr9.56
May14.39
Aug21.72
Sept16.72
Oct11.61
Nov4.89
Dec0.99

Notice anything unusual about this table?

[0.0, 2.0, 5.0, 9.56, 14.39, 21.72, 16.72, 11.61, 4.89, 0.99]

Question 4: Insert Values Into a Python List

MonthTemperature (Celsius)
June19.56
July22.78

Insert missing values for June and July into your Python list with the following syntax:

listname.insert(index, value)

This means that you need to determine the index location at which you want to insert the value. For example, if you want to add a new value at the second place in a list, then you would use an index of [1].

It can also be helpful to identify the index of the existing value in front of which you want to add a new value.

Remember that Python indexing begins at [0] and that when you add a new item to the list, the index of the original items will update as well.

Print your Python list after each addition.

[0.0, 2.0, 5.0, 9.56, 14.39, 19.56, 21.72, 16.72, 11.61, 4.89, 0.99]
[0.0, 2.0, 5.0, 9.56, 14.39, 19.56, 22.78, 21.72, 16.72, 11.61, 4.89, 0.99]

Question 5: Manually Create Numpy Arrays

Using the average monthly temperature values, manually create and print a one-dimensional numpy array using the following syntax:

arrayname = np.array([value, value, value, etc])

[ 0.    2.    5.    9.56 14.39 19.56 22.78 21.72 16.72 11.61  4.89  0.99]

Using the completed Python list from the previous question, create and print another numpy array using the following syntax:

arrayname = np.array(listname)

[ 0.    2.    5.    9.56 14.39 19.56 22.78 21.72 16.72 11.61  4.89  0.99]

Quesion 6: Download Text File and Import Into Numpy Arrays

Use .urllib.request to download the following file of average monthly temperature (Celsius) for Boulder, Colorado, to your data directory:

avg-monthly-temp.txt from https://ndownloader.figshare.com/files/12732467

Recall that you need to set your working directory (e.g. /home/jpalomino/earth-analytics-bootcamp/) before running the commands to download data.

Use the appropriate function to import avg-monthly-temp.txt into a numpy array.

datasets downloaded successfully
[ 0.    2.    5.    9.56 14.39 19.56 22.78 21.72 16.72 11.61  4.89  0.99]

Quesion 7: Select and Summarize Data From Numpy Arrays

Using selections, create two new numpy arrays containing the data values for:

  1. Mar, Apr, May
  2. Sept, Oct, Nov

Run the appropriate function to calculate and print the mean of each new numpy array.

Mean of Spring Average Monthly Temperatures: 9.65
Mean of Fall Average Monthly Temperatures: 11.073333333333332

Question 8: Manually Create Pandas Dataframes

Manually create and print a pandas dataframe of average monthly temperature (Celsius) for Boulder, Colorado, with the following syntax:

dataframe_name = pd.DataFrame( columns=["column_name_textstring", "column_name_numeric"], data=[ ["Text", value], ["Text", value], ["Text", value], ["Text", value] ] )

Note that you do not need to include the line spaces displayed above. They are simply there to help you see the appropriate syntax.

MonthTemp
0January0.00
1February2.00
2March5.00
3April9.56
4May14.39
5June19.56
6July22.78
7August21.72
8September16.72
9October11.61
10November4.89
11December0.99

Question 9: Download CSV File and Import Into Pandas Dataframes

Use .urllib.request to download the following file of average monthly temperature (Celsius) for Boulder, Colorado, to your data directory:

avg-temp-months-seasons.csv from https://ndownloader.figshare.com/files/12739457

Recall that you need to set your working directory (e.g. /home/jpalomino/earth-analytics-bootcamp/) before running the commands to download data.

Use the appropriate function to import avg-temp-months-seasons.csv into a pandas dataframe.

datasets downloaded successfully
monthstempseasons
0Jan0.00Winter
1Feb2.00Winter
2Mar5.00Spring
3Apr9.56Spring
4May14.39Spring
5June19.56Summer
6July22.78Summer
7Aug21.72Summer
8Sept16.72Fall
9Oct11.61Fall
10Nov4.89Fall
11Dec0.99Winter

Question 10: Select and Summarize Data From Pandas Dataframes

Select the data for each season (e.g. Winter) and assign the results to a new pandas dataframe for each season.

Run the appropriate function to summarize each new pandas dataframe (e.g. Winter).

temp
count3.000000
mean0.996667
std1.000017
min0.000000
25%0.495000
50%0.990000
75%1.495000
max2.000000
temp
count3.000000
mean9.650000
std4.695647
min5.000000
25%7.280000
50%9.560000
75%11.975000
max14.390000
temp
count3.000000
mean21.353333
std1.641016
min19.560000
25%20.640000
50%21.720000
75%22.250000
max22.780000
temp
count3.000000
mean11.073333
std5.933231
min4.890000
25%8.250000
50%11.610000
75%14.165000
max16.720000

Question 11: Plot Data From Pandas Dataframes

Manually create a new pandas dataframe containing the calculated mean value for each season and the name of the season.

Plot this new pandas dataframe using the plot type and colors of your choosing.

Part 2: Submit Your Jupyter Notebook to GitHub

To submit your Jupyter Notebook for this activity, follow the Git/Github workflow from:

  1. Guided Activity on Version Control with Git/GitHub to add, commit, and push your Jupyter Notebook for this activity to your forked repository (https://github.com/yourusername/ea-bootcamp-practice-data-structures).

  2. Guided Activity to Submit Pull Request to submit a pull request of your Jupyter Notebook for this activity to the Earth Lab repository (https://github.com/earthlab-education/ea-bootcamp-practice-data-structures).

Leave a Comment