GEOG 4463 & 5463 - Earth Analytics Bootcamp: Homework 2


Homework 2

For this assignment, you will create a Jupyter Notebook with your answers to the questions below, and submit this Jupyter Notebook to a Github repository for Homework 2 following the instructions below Part 3: Submit Your Jupyter Notebook to GitHub.

You need to complete this assignment (Homework 2) by Tuesday, August 14th at 8:00 AM (U.S. Mountain Daylight Time). See this link to convert the due date/time to your local time.

This assignment will test your numpy arrays and pandas dataframes skills from Days 4 and 5.

You will be asked to download and use data from Figshare.com on monthly snowfall (inches) between the years 2007 and 2017 for Boulder, Colorado, provided by the U.S. National Oceanic and Atmospheric Administration (NOAA).

What You Need

Be sure that you have completed all of the lessons from Days 4 and 5 for the Earth Analytics Bootcamp. Completing the challenges at the end of the lessons will also help you with this assignment. Review the lessons as needed to answer the questions.

You will need to fork and clone a Github repository for Homework 2 from https://github.com/earthlab-education/ea-bootcamp-hw-2-yourusername. You will receive an invitation to the Github repository for Homework 2 via CANVAS.

Note: the repository will be empty, as you will add a new Jupyter Notebook containing your answers to the questions below.

Part I: Review PEP 8 Style Guide for Python

Review the Earth Analytics Bootcamp reference page on PEP8 Style Guide, which will provides more information on naming conventions within the Python community.

Reflect on the PEP 8 style guide as you write your code for this assignment and assign variable names. The last question in this assignment asks you to discuss the use of the PEP 8 style guide.

Part II: Create and Modify a Jupyter Notebook

Begin by creating a new Jupyter Notebook in your forked repository from https://github.com/yourusername/ea-bootcamp-hw-2.

Rename the file to firstinitial-lastname-ea-bootcamp-hw-2.ipynb (e.g. jpalomino-ea-bootcamp-hw-2.ipynb).

Note that Git will recognize this new Jupyter Notebook as a new file that can be added, committed, and pushed back to your forked repository on Github.com.

Be Sure to Add Documentation to Your Notebook (12 pts)

Add a Markdown cell before each code cell you create to describe the purpose of your code (e.g. what are you accomplishing by executing this code?). Think carefully about how many cells you should have to best organize your data (hint: review lessons for examples of how code can be grouped into cells).

Within code cells, be sure to also add Python comments to document each code block and use the PEP 8 guidelines to assign appropriate variable names that are short and concise but also clearly indicate the kind of data contained in the variable.

Question 1: Markdown Styling (1 pt)

Use Markdown to add:

  • A title for the notebook (e.g. Earth Analytics Bootcamp - Homework 1)
  • A bullet list with:
    • A bold word for Author: and then add text for your name.
    • A bold word for Date: and then add text for today‚Äôs date.

Question 2: Import Python Packages (5 pts)

In the questions below, you will be creating numpy arrays, and pandas dataframes. You will also be creating plots and downloading data from Figshare.com after setting the working directory.

Import the necessary Python packages to accomplish these tasks.

Question 3: Download CSV File and Import Into Numpy Arrays (6 pts)

Use .urllib.request to download the following CSV file of monthly snowfall (inches) between 2007 and 2017 for Boulder, Colorado, to your data directory:

snow-2007-to-2017.csv from https://ndownloader.figshare.com/files/12746039

This snowfall dataset (inches) contains a row for each year (starting with 2007 through 2017) and contains a column for each month (starting with January through December).

Use the appropriate function to import snow-2007-to-2017.csv into a numpy array.

Question 4: Print Data From Numpy Arrays Without Scientific Notation (2 pts)

Print your imported numpy array after setting the appropriate the options to supress the scientific notation.

[[27.5    15.3     4.5     2.2     0.0001  0.      0.      0.      0.
   0.1     5.9    30.    ]
 [10.3    10.4    17.6     7.9     0.7     0.      0.      0.      0.
   0.2     1.3    20.9   ]
 [13.      3.9    21.4    20.4     0.      0.      0.      0.      0.
  30.1     8.9    27.8   ]
 [ 4.6    22.9    28.7     5.8     5.6     3.5     0.      0.      0.
   0.0001  2.      9.5   ]
 [18.2    13.2     0.7     3.5     0.2     0.      0.      0.      0.
  11.5     8.6    33.1   ]
 [ 7.8    32.1     0.0001  1.6     0.0001  0.      0.      0.      0.
   7.9     0.8    11.7   ]
 [ 3.7    18.5    22.8    47.6    12.3     0.      0.      0.      0.
   5.4     6.3     9.    ]
 [27.2    11.7    11.2    12.2     6.8     0.      0.      0.      0.5
   0.     16.9    19.8   ]
 [ 6.     54.6     8.      7.4     3.9     0.0001  0.      0.      0.
   0.     11.5    17.4   ]
 [ 4.1    21.8    32.5    21.4     1.      0.      0.      0.      0.
   0.      4.4    13.    ]
 [18.7     9.9     0.     19.4     6.1     0.      0.      0.      0.
   8.      4.1    10.2   ]]

Question 5: Run Calculations on Numpy Arrays (4 pts)

Convert the values in your numpy array from inches to millimeters. Recall that one inch is equal to 25.4 millimeters.

Print your new numpy array, again supressing the scientific notation.

[[ 698.5      388.62     114.3       55.88       0.00254    0.
     0.         0.         0.         2.54     149.86     762.     ]
 [ 261.62     264.16     447.04     200.66      17.78       0.
     0.         0.         0.         5.08      33.02     530.86   ]
 [ 330.2       99.06     543.56     518.16       0.         0.
     0.         0.         0.       764.54     226.06     706.12   ]
 [ 116.84     581.66     728.98     147.32     142.24      88.9
     0.         0.         0.         0.00254   50.8      241.3    ]
 [ 462.28     335.28      17.78      88.9        5.08       0.
     0.         0.         0.       292.1      218.44     840.74   ]
 [ 198.12     815.34       0.00254   40.64       0.00254    0.
     0.         0.         0.       200.66      20.32     297.18   ]
 [  93.98     469.9      579.12    1209.04     312.42       0.
     0.         0.         0.       137.16     160.02     228.6    ]
 [ 690.88     297.18     284.48     309.88     172.72       0.
     0.         0.        12.7        0.       429.26     502.92   ]
 [ 152.4     1386.84     203.2      187.96      99.06       0.00254
     0.         0.         0.         0.       292.1      441.96   ]
 [ 104.14     553.72     825.5      543.56      25.4        0.
     0.         0.         0.         0.       111.76     330.2    ]
 [ 474.98     251.46       0.       492.76     154.94       0.
     0.         0.         0.       203.2      104.14     259.08   ]]

Question 6: Select Data From Numpy Arrays (6 pts)

Create (and print) a new numpy array containing all data values for the first year (2007). Be sure this new numpy array contains the converted values (mm).

array([[698.5    , 388.62   , 114.3    ,  55.88   ,   0.00254,   0.     ,
          0.     ,   0.     ,   0.     ,   2.54   , 149.86   , 762.     ]])

Question 7: Select Data From Numpy Arrays (6 pts)

Create (and print) a new numpy array containing all data values for January across all years. Be sure this new numpy array contains the converted values (mm).

array([[698.5 ],
       [261.62],
       [330.2 ],
       [116.84],
       [462.28],
       [198.12],
       [ 93.98],
       [690.88],
       [152.4 ],
       [104.14],
       [474.98]])

Question 8: Calculate Summary Statistics of Numpy Arrays (4 pts)

Calculate (and print) the maximum value of your numpy array for 2007.

Add a text string to your print to label your result.

maximum snowfall in 2007: 762.0

Question 9: Calculate Summary Statistics of Numpy Arrays (4 pts)

Calculate (and print) the mean (i.e. average) value for your numpy array for January.

Add a text string to your print to label your result.

mean of snow in January across all years: 325.8127272727273

Question 10: Download CSV File and Import Into Pandas Dataframe (6 pts)

Use .urllib.request to download the following CSV file of monthly snowfall (inches) between 2007 and 2017 for Boulder, Colorado, which includes month and season names, to your data directory:

snow-2007-to-2017-months-seasons.csv from https://ndownloader.figshare.com/files/12746042

Use the appropriate function to import snow-2007-to-2017-months-seasons.csv into a pandas dataframe.

Print your pandas dataframe.

monthsy2007y2008y2009y2010y2011y2012y2013y2014y2015y2016y2017seasons
0Jan27.5000010.313.04.6000018.27.800003.727.26.000004.118.7Winter
1Feb15.3000010.43.922.9000013.232.1000018.511.754.6000021.89.9Winter
2Mar4.5000017.621.428.700000.70.0000122.811.28.0000032.50.0Spring
3Apr2.200007.920.45.800003.51.6000047.612.27.4000021.419.4Spring
4May0.000010.70.05.600000.20.0000112.36.83.900001.06.1Spring
5June0.000000.00.03.500000.00.000000.00.00.000010.00.0Summer
6July0.000000.00.00.000000.00.000000.00.00.000000.00.0Summer
7Aug0.000000.00.00.000000.00.000000.00.00.000000.00.0Summer
8Sept0.000000.00.00.000000.00.000000.00.50.000000.00.0Fall
9Oct0.100000.230.10.0000111.57.900005.40.00.000000.08.0Fall
10Nov5.900001.38.92.000008.60.800006.316.911.500004.44.1Fall
11Dec30.0000020.927.89.5000033.111.700009.019.817.4000013.010.2Winter

Question 11: Run Calculations on Pandas Dataframes (6 pts)

Convert the values in your pandas dataframe from inches to millimeters. Recall that one inch is equal to 25.4 millimeters.

Print your new pandas dataframe.

monthsy2007y2008y2009y2010y2011y2012y2013y2014y2015y2016y2017seasons
0Jan698.500000261.62330.20116.840000462.28198.12000093.98690.88152.400000104.14474.98Winter
1Feb388.620000264.1699.06581.660000335.28815.340000469.90297.181386.840000553.72251.46Winter
2Mar114.300000447.04543.56728.98000017.780.000254579.12284.48203.200000825.500.00Spring
3Apr55.880000200.66518.16147.32000088.9040.6400001209.04309.88187.960000543.56492.76Spring
4May0.00025417.780.00142.2400005.080.000254312.42172.7299.06000025.40154.94Spring
5June0.0000000.000.0088.9000000.000.0000000.000.000.0002540.000.00Summer
6July0.0000000.000.000.0000000.000.0000000.000.000.0000000.000.00Summer
7Aug0.0000000.000.000.0000000.000.0000000.000.000.0000000.000.00Summer
8Sept0.0000000.000.000.0000000.000.0000000.0012.700.0000000.000.00Fall
9Oct2.5400005.08764.540.000254292.10200.660000137.160.000.0000000.00203.20Fall
10Nov149.86000033.02226.0650.800000218.4420.320000160.02429.26292.100000111.76104.14Fall
11Dec762.000000530.86706.12241.300000840.74297.180000228.60502.92441.960000330.20259.08Winter

Question 12: Calculate Summary Statistics of Pandas Dataframes (4 pts)

Calculate (and print) the summary statistics of your pandas dataframe with the converted values.

y2007y2008y2009y2010y2011y2012y2013y2014y2015y2016y2017
count12.00000012.00000012.00000012.00000012.00000012.00000012.00000012.00000012.00000012.00000012.000000
mean180.975021146.685000265.641667174.836688188.383333131.021709265.853333225.001667230.293354207.856667161.713333
std280.454143191.496843296.990393238.789050260.834927239.065509353.288897233.989661390.479162285.732467181.183070
min0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
25%0.0000000.0000000.0000000.0001900.0000000.0000000.0000000.0000000.0000000.0000000.000000
50%29.21000025.400000162.560000102.87000053.34000010.160127148.590000228.600000125.73000064.770000129.540000
75%209.550000262.255000524.510000170.815000302.895000198.755000351.790000339.725000225.425000383.540000253.365000
max762.000000530.860000764.540000728.980000840.740000815.3400001209.040000690.8800001386.840000825.500000492.760000

Question 13: Discuss Your Analysis of Numpy Arrays and Pandas Dataframe (12 pts)

Use Markdown to write a few paragraphs (1-2) addressing the following questions:

  1. Are any of your numpy arrays for the snowfall data one-dimensional arrays? How do you know? Explain your answer.

  2. Was one data structure easier to use than the other (i.e. numpy array vs pandas dataframe) to convert the units from inches to millimeters? Explain your answer.

  3. Was one data structure easier to use than the other (i.e. numpy array vs pandas dataframe) to calculate the maximum and mean values (i.e. averages)? Imagine that you completed the mean calculation for all years in the original numpy array. Explain your answer.

  4. How could you imagine using these two data structures together in same analysis workflow?

Question 14: Plot Pandas Dataframes (6 pts)

Create a plot of your choosing for the monthly snowfall in 2017. Be sure to add a title and label the axes with the appropriate units.

png

Question 15: Discuss Your Plot of the Pandas Dataframe (8 pts)

Use Markdown to write a few sentences (3-4) addressing the following questions:

  1. What motivated your plot choices for the snowfall data in pandas dataframe (e.g. plot type, color, etc)?

  2. What else could you do to improve on your plot? Think about the data values and brainstorm options for a better display of the overall dataset. (You do not have to implement your suggestions.)

Question 16: Discuss Use of PEP8 Naming Conventions (8 pts)

Use Markdown to write a few sentences (3-4) on the PEP 8 naming conventions:

  1. How did your review of PEP 8 naming conventions influence your choice of variable names in this assignment?

  2. What are some ways in which these conventions are promoted and/or enforced within the Python community?

Part III: Submit Your Jupyter Notebook to GitHub

To submit your Jupyter Notebook for Homework 2, follow the Git/Github workflow from:

  1. Guided Activity on Version Control with Git/GitHub to add, commit, and push your Jupyter Notebook for Homework 2 to your forked repository for Homework 2 (https://github.com/yourusername/ea-bootcamp-hw-2-yourusername).

  2. Guided Activity to Submit Pull Request to submit a pull request of your Jupyter Notebook for Homework 2 to the Earth Lab repository for Homework 2 (https://github.com/earthlab-education/ea-bootcamp-hw-2-yourusername).

Updated: