GEOG 4463 & 5463 - Earth Analytics Bootcamp: Homework 3


Homework 3

For this assignment, you will create a Jupyter Notebook with your answers to the questions below, and submit this Jupyter Notebook to a Github repository for Homework 3 following the instructions below Part 3: Submit Your Jupyter Notebook to GitHub.

You need to complete this assignment (Homework 3) by Friday, August 17th at 8:00 AM (U.S. Mountain Daylight Time). See this link to convert the due date/time to your local time.

This assignment will test your skills with data structures, loops, and conditional statements from Days 6, 7, and 8.

You will be asked to work with familiar data: temperature and precipitation for various months and years of data for Boulder, Colorado, provided by the U.S. National Oceanic and Atmospheric Administration (NOAA).

What You Need

Be sure that you have completed all of the lessons from Days 6, 7, and 8 for the Earth Analytics Bootcamp. Completing the challenges at the end of the lessons will also help you with this assignment. Review the lessons as needed to answer the questions.

You will need to fork and clone a Github repository for Homework 3 from https://github.com/earthlab-education/ea-bootcamp-hw-3-yourusername. You will receive an invitation to the Github repository for Homework 3 via CANVAS.

Note: the repository will be empty, as you will add a new Jupyter Notebook containing your answers to the questions below.

Part I: Create and Modify a Jupyter Notebook

Begin by creating a new Jupyter Notebook in your forked repository from https://github.com/yourusername/ea-bootcamp-hw-3.

Rename the file to firstinitial-lastname-ea-bootcamp-hw-3.ipynb (e.g. jpalomino-ea-bootcamp-hw-3.ipynb).

Note that Git will recognize this new Jupyter Notebook as a new file that can be added, committed, and pushed back to your forked repository on Github.com.

Be Sure to Add Documentation to Your Notebook (8 pts)

Start with Markdown cell containing a Markdown title for this assignment, plus an author name and date in list form. Bold the words for author and date, but do not bold your name and today’s date.

Add a Markdown cell before each code cell you create to describe the purpose of your code (e.g. what are you accomplishing by executing this code?). Think carefully about how many cells you should have to best organize your data (hint: review lessons for examples of how code can be grouped into cells).

Within code cells, be sure to also add Python comments to document each code block and use the PEP 8 guidelines to assign appropriate variable names that are short and concise but also clearly indicate the kind of data contained in the variable.

Question 1: Import Python Packages (2 pts)

In the questions below, you will be working with numpy arrays, and pandas dataframes.

You will also be downloading files using urllib.request, accessing directories and files on your computer using os, and retrieving filenames using glob. Last, you will also be creating plots of your data.

Import all of the necessary Python packages to accomplish these tasks.

Question 2: Use Glob and Conditional Statements to Check for Directories (3 pts)

Use glob.glob to get a list of all items in your earth-analytics-bootcamp directory.

Write and execute a conditional statement that prints a message to proceed if both of the following directories exist:

  1. the data directory
  2. the directory for your git repository for Homework 3 (e.g. ea-bootcamp-hw-3-yourusername).
Both directories exist. Proceed!

Question 3: Download Text Files and Import Into Numpy Arrays (5 pts)

Use .urllib.request to download the following .txt and .csv files of monthly temperature (Fahrenheit) between 2005 and 2017 for Boulder, Colorado, to your data directory:

  1. boulder-temp-2004-to-2009.csv from https://ndownloader.figshare.com/files/12767972

  2. boulder-temp-2010-to-2014.csv from https://ndownloader.figshare.com/files/12767960

  3. boulder-temp-2015.txt from https://ndownloader.figshare.com/files/12767963

  4. boulder-temp-2016.txt from https://ndownloader.figshare.com/files/12767969

  5. boulder-temp-2017.txt from https://ndownloader.figshare.com/files/12767966

Each dataset contains a row for each year specified in the dataset name and contains a column for each month (starting with January through December).

Use the appropriate function to import each file into a new numpy array.

Print your numpy arrays.

[[35.4 33.6 48.2 49.2 59.9 62.7 69.2 66.4 62.9 51.9 39.7 36.5]
 [35.5 37.9 42.  48.4 57.6 65.4 75.1 69.7 66.2 52.6 45.  33.3]
 [40.7 33.7 39.4 53.9 61.  71.6 74.4 71.6 58.4 51.  43.4 35.3]
 [27.2 34.6 47.6 47.9 58.  67.7 74.8 73.7 64.5 55.2 44.9 30.2]
 [31.6 36.1 40.8 47.8 57.1 66.1 75.  69.6 60.9 46.  46.  31.1]
 [38.2 39.7 44.3 47.3 59.3 63.  69.6 69.6 63.1 44.5 43.8 26.7]]

[[33.  30.1 42.7 48.8 53.9 66.9 72.5 72.4 66.6 55.  39.8 37.2]
 [33.1 32.  45.2 48.9 53.7 67.6 73.5 75.1 63.4 52.9 42.3 32.2]
 [38.9 32.3 50.8 54.3 60.2 74.2 74.8 73.2 66.  50.8 46.1 33.7]
 [33.  32.1 40.5 43.8 57.7 69.9 72.2 72.2 65.1 47.8 43.2 31.5]
 [34.6 32.  43.6 49.8 56.6 66.2 72.2 69.2 63.8 55.3 38.3 33.9]]


[36.5 36.6 46.1 50.1 52.4 68.3 70.3 70.5 69.4 56.2 40.8 33.1]

[34.1 40.9 43.  49.  54.1 70.5 74.  70.4 64.9 58.8 47.5 32. ]

[32.2 42.3 50.3 48.9 55.7 68.7 73.9 69.  63.5 51.5 47.5 36.4]

Question 4: Write Loop to Recalculate Numpy Arrays (10 pts)

Manually create a list of the numpy arrays imported from the .txt files only.

Write and execute a loop to recalculate the values in these numpy arrays from Fahrenheit to Celsius.

Recall that Celsius = (Fahrenheit - 32) / 1.8, and that you can print an empty line using print("") within your loop to create spaces between the results.

Be sure to print each numpy array as part of the loop.

[ 2.5         2.55555556  7.83333333 10.05555556 11.33333333 20.16666667
 21.27777778 21.38888889 20.77777778 13.44444444  4.88888889  0.61111111]

[ 1.16666667  4.94444444  6.11111111  9.44444444 12.27777778 21.38888889
 23.33333333 21.33333333 18.27777778 14.88888889  8.61111111  0.        ]

[ 0.11111111  5.72222222 10.16666667  9.38888889 13.16666667 20.38888889
 23.27777778 20.55555556 17.5        10.83333333  8.61111111  2.44444444]

Question 5: Write Loop to Summarize Numpy Array (10 pts)

Write and execute a loop to calculate and print the median values of each numpy array from the previous question.

Hints:

  1. Note what you did in the previous question before writing the loop.
  2. Review how to calculate summary statistics of numpy arrays.
median: 51.25

median: 51.55

median: 50.9

Question 6: Expand Your Loop to Capture Summary Statistics (10 pts)

Expand on your loop from the previous question to add each numpy array median value to a new list.

Print your final list of median values.

Hints:

  1. Create an empty list to receive the median values.
  2. Review how to add values in an existing list.
  3. Recall that the location of the print() function matters (i.e. what you receive will depend on where print() is placed in relationship to the loop).
[51.25, 51.55, 50.9]

Question 7: Use Glob To Create Lists of Filenames (10 pts)

Use glob.glob to create a list that contains the names of .csv files you downloaded for temperature and to create a second list that contains the names of the .txt. files you downloaded for temperature.

Think about how you can distinguish these files from the others in your data directory. If you find it helpful, feel free to include conditional statements.

Print these lists of filenames.

Hint:

  1. Review how to use glob.glob to search by keywords and by file types.
['data/boulder-temp-2016.txt', 'data/boulder-temp-2017.txt', 'data/boulder-temp-2015.txt']
['data/boulder-temp-2004-to-2009.csv', 'data/boulder-temp-2010-to-2014.csv']

Question 8: Download CSV Files and Import Into Pandas Dataframes (2 pts)

Use .urllib.request to download the following .csv files of monthly precipitation (already in millimeters) between 1996 and 2017 for Boulder, Colorado, to your data directory:

  1. boulder-precip-1996-to-2006-months.csv from https://ndownloader.figshare.com/files/12767930
    • This dataset contains a row for each year (1996 to 2006) and contains a column for each month (starting with January through December).
  2. boulder-precip-2007-to-2017-months-seasons.csv from https://ndownloader.figshare.com/files/12767933
    • This dataset contains a row for each month (starting with January through December) and contains a column for each year (2007 to 2017).

Use the appropriate function to import each file into a new pandas dataframe.

Print your pandas dataframes. Notice the structures of your pandas dataframes.

YearJanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
0199648.0067.36654.86437.846117.60270.35849.78416.00288.3927.11236.3229.398
1199722.09846.48223.114146.55855.62693.72628.956133.85848.76868.58038.60817.272
2199827.1785.84286.614115.82446.22846.990102.10824.63816.76428.44838.86226.670
3199916.5102.03227.686191.77046.73620.82864.516140.71666.54833.78220.57425.654
420007.36613.97065.02438.10040.64038.86253.08618.28863.75432.51222.60611.176
5200118.54221.84451.05476.70891.94827.68644.70441.65644.95810.16025.9089.144
6200227.17811.17638.1005.08081.28029.9722.28636.57638.60861.97619.8120.508
720032.28638.608138.17675.94666.54868.32618.03489.4088.89011.43020.32021.336
8200420.82833.27427.686143.76432.512100.58487.37673.15252.57858.92850.5468.890
9200535.5607.87430.98898.04448.51468.07210.66841.40213.20871.1208.63610.922
10200611.17617.27252.83226.41628.95633.52866.80231.24231.75094.23418.79677.470
monthsy2007y2008y2009y2010y2011y2012y2013y2014y2015y2016y2017seasons
0January42.67211.68415.7487.11224.3849.6526.85842.4189.6529.39835.814Winter
1February21.84416.0026.85834.79825.90849.27628.70217.27293.72636.57618.542Winter
2March42.92637.33848.00683.8208.3820.25443.68841.1489.65297.53636.830Spring
3April56.89628.702149.35292.20261.21433.274105.15647.498114.30084.83680.010Spring
4May45.466106.93478.23268.834131.06445.21267.564112.522198.62851.054159.766Spring
5June9.65240.13268.58085.34434.2909.65215.49421.33644.70460.19811.430Summer
6July20.3202.28636.06858.67472.898126.74626.162116.07875.69215.49433.020Summer
7Aug46.22875.4388.38227.17827.4329.14435.56040.6407.87426.92441.148Summer
8September48.76846.73610.6686.35065.02457.658461.26473.1523.55611.43048.768Fall
9October35.05229.97282.80424.13041.91036.57656.89629.46451.3089.65261.468Fall
10November11.9383.30223.62215.49424.8927.1127.36622.35246.48211.93814.478Fall
11December53.34033.78235.30612.19248.76812.95412.70034.79828.19423.11417.272Winter

Question 9: Create Index in Pandas Dataframe (5 pts)

Using your pandas dataframe for precip-1996-to-2006-months.csv, create a new label index based on Year.

JanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecember
Year
199648.0067.36654.86437.846117.60270.35849.78416.00288.3927.11236.3229.398
199722.09846.48223.114146.55855.62693.72628.956133.85848.76868.58038.60817.272
199827.1785.84286.614115.82446.22846.990102.10824.63816.76428.44838.86226.670
199916.5102.03227.686191.77046.73620.82864.516140.71666.54833.78220.57425.654
20007.36613.97065.02438.10040.64038.86253.08618.28863.75432.51222.60611.176
200118.54221.84451.05476.70891.94827.68644.70441.65644.95810.16025.9089.144
200227.17811.17638.1005.08081.28029.9722.28636.57638.60861.97619.8120.508
20032.28638.608138.17675.94666.54868.32618.03489.4088.89011.43020.32021.336
200420.82833.27427.686143.76432.512100.58487.37673.15252.57858.92850.5468.890
200535.5607.87430.98898.04448.51468.07210.66841.40213.20871.1208.63610.922
200611.17617.27252.83226.41628.95633.52866.80231.24231.75094.23418.79677.470

Question 10: Write Loop to Summarize Pandas Dataframe (10 pts)

Write and execute a loop to summarize and print each month’s data in the pandas dataframe for precip-1996-to-2006-months.csv.

Hints:

  1. It can help to create a list of month names to iterate upon.
  2. Recall the appropriate function to calculate summary statistics of pandas dataframes.
  3. Think about the placement of print() in order to the results of each iteration of your loop.
         January
count  11.000000
mean   21.520727
std    12.941090
min     2.286000
25%    13.843000
50%    20.828000
75%    27.178000
max    48.006000

        February
count  11.000000
mean   18.703636
std    14.697921
min     2.032000
25%     7.620000
50%    13.970000
75%    27.559000
max    46.482000

            March
count   11.000000
mean    54.194364
std     33.767344
min     23.114000
25%     29.337000
50%     51.054000
75%     59.944000
max    138.176000

            April
count   11.000000
mean    86.914182
std     58.408494
min      5.080000
25%     37.973000
50%     76.708000
75%    129.794000
max    191.770000

              May
count   11.000000
mean    59.690000
std     27.283904
min     28.956000
25%     43.434000
50%     48.514000
75%     73.914000
max    117.602000

             June
count   11.000000
mean    54.448364
std     27.357645
min     20.828000
25%     31.750000
50%     46.990000
75%     69.342000
max    100.584000

             July
count   11.000000
mean    48.029091
std     31.445868
min      2.286000
25%     23.495000
50%     49.784000
75%     65.659000
max    102.108000

           August
count   11.000000
mean    58.812545
std     44.695248
min     16.002000
25%     27.940000
50%     41.402000
75%     81.280000
max    140.716000

       September
count  11.000000
mean   43.110727
std    24.616280
min     8.890000
25%    24.257000
50%    44.958000
75%    58.166000
max    88.392000

         October
count  11.000000
mean   43.480182
std    29.070701
min     7.112000
25%    19.939000
50%    33.782000
75%    65.278000
max    94.234000

        November
count  11.000000
mean   27.362727
std    12.157073
min     8.636000
25%    20.066000
50%    22.606000
75%    37.465000
max    50.546000

        December
count  11.000000
mean   19.858182
std    20.693384
min     0.508000
25%     9.271000
50%    11.176000
75%    23.495000
max    77.470000

Question 11: Create Plots for Specific Years (15 pts)

Write and execute a loop or conditional statement that will produce two plots from your pandas dataframe for precip-2007-to-2017-months-seasons.csv: one for 2007 and one for 2013.

Choose plot type and colors to help you tell the story of these two years, and be sure to label each plot with the appropriate title including the year.

Hints:

  1. To select/plot from pandas dataframes using variable names that are implicit (i.e. not explicitly created by you), change the syntax from dataframe.column_name to dataframe[column_name] because you are no longer selecting/plotting an explicit column name in the dataframe.
  2. Recall that you can build a text string using the syntax "text here" + variable + "more text here".

png

png

Question 12: Discuss Plots (10 pts)

  1. Why did you choose the method you used to create the plots (i.e. loops or conditional statement)?

  2. Compare your plots. What do you notice about their y-axes and their precipitation values? Which year experienced the highest peak and when did it occur?

  3. Instead of creating two separate plots, what else could you with these data to best highlight the differences between the years and to best highlight the peak of the data?

Part II: Submit Your Jupyter Notebook to GitHub

To submit your Jupyter Notebook for Homework 3, follow the Git/Github workflow from:

  1. Guided Activity on Version Control with Git/GitHub to add, commit, and push your Jupyter Notebook for Homework 3 to your forked repository for Homework 3 (https://github.com/yourusername/ea-bootcamp-hw-3-yourusername).

  2. Guided Activity to Submit Pull Request to submit a pull request of your Jupyter Notebook for Homework 3 to the Earth Lab repository for Homework 3 (https://github.com/earthlab-education/ea-bootcamp-hw-3-yourusername).

Updated: