Lesson 2. Import CSV Files Into Pandas Dataframes

Learning Objectives

Import tabular data from .csv files into pandas dataframes.

CSV Files of Tabular Data as Inputs to Pandas Dataframes

Recall that scientific data can come in a variety of file formats and types, including comma-separated values files (.csv), which use delimiters such as commas (or some other delimiter like tab spaces or semi-colons) to indicate separate values.

CSV files also support labeled names for the columns, referred to as headers. This means that CSV files can easily support multiple columns of related data, and thus, are very useful for collecting and organizing datasets across multiple locations and/or timeframes.

As you learned previously in this chapter, you can manually define pandas dataframes as needed using the pandas.DataFrame() function. However, when working with larger datasets, you will want to import data directly into pandas dataframes from .csv files.

Get Data to Import Into Pandas Dataframes

To import data into pandas dataframes, you will need to import the pandas package, and you will use the earthpy package to download the data files from the Earth Lab data repository on Figshare.com.

# Import packages
import os

import matplotlib.pyplot as plt
import pandas as pd
import earthpy as et

Recall from the previous chapter on numpy arrays that you can use the function data.get_data() from the earthpy package (which you imported with the alias et) to download data from online sources such as the Figshare.com data repository. To use the function et.data.get_data(), you need to provide a parameter value for the url, which you define by providing a text string of the URL to the dataset.

Begin by downloading a .csv file for average monthly precipitation for Boulder, CO from the following URL:

https://ndownloader.figshare.com/files/12710618

# URL for .csv with avg monthly precip data
avg_monthly_precip_url = "https://ndownloader.figshare.com/files/12710618"

# Download file from URL
et.data.get_data(url=avg_monthly_precip_url)

Downloading from https://ndownloader.figshare.com/files/12710618

'/root/earth-analytics/data/earthpy-downloads/avg-precip-months-seasons.csv'

# Set working directory to earth-analytics
os.chdir(os.path.join(et.io.HOME, 
                      "earth-analytics", 
                      "data"))

Import Tabular Data from CSV Files into Pandas Dataframes

Using the read_csv() function from the pandas package, you can import tabular data from CSV files into pandas dataframe by specifying a parameter value for the file name (e.g. pd.read_csv("filename.csv")). Remember that you gave pandas an alias (pd), so you will use pd to call pandas functions.

# Import data from .csv file
fname = os.path.join("earthpy-downloads",
                     "avg-precip-months-seasons.csv")

avg_monthly_precip = pd.read_csv(fname)

avg_monthly_precip

	months	precip	seasons
0	Jan	0.70	Winter
1	Feb	0.75	Winter
2	Mar	1.85	Spring
3	Apr	2.93	Spring
4	May	3.05	Spring
5	June	2.02	Summer
6	July	1.93	Summer
7	Aug	1.62	Summer
8	Sept	1.84	Fall
9	Oct	1.31	Fall
10	Nov	1.39	Fall
11	Dec	0.84	Winter

As you can see, the months and precip data can exist together in the same pandas dataframe, which differs from numpy arrays. You can see that there is also a column for seasons containing text strings.

Once again, you can also see that the indexing still begins with [0], as it does for Python lists and numpy arrays, and that you did not have to use the print() function to see a nicely formatted version of the pandas dataframe.

Challenge 1: Plot a DataFrame

In the previous lesson you learned how to plot a pandas dataframe using matplotlib. Test your your skills by plotting the data frame that you opened above. Instead of using ax.plot(), use ax.bar() to create your plot.

HINT: to make this plot work, you will have to specify the x axis variable and then height instead of y. Your code will look something like this:

ax.bar(x=dataframe.column,
       height=dataframe.column)

Bar plot of average monthly precipitation using matplotlib.

You now know how to import data from .csv files into pandas dataframes, which will come in very handy as you begin to work with scientific data. On the next pages of this chapter, you will learn how to work with pandas dataframes to run calculations, summarize data, and more.

Recalculate and Summarize Pandas Dataframes Intro

Share on

Twitter Facebook Google+ LinkedIn

Earth Data Analytics Online Certificate

Intro to earth data science textbook

intro-to-earth-data-science-textbook Home

Lesson 2. Import CSV Files Into Pandas Dataframes

Learning Objectives

CSV Files of Tabular Data as Inputs to Pandas Dataframes

Get Data to Import Into Pandas Dataframes

Import Tabular Data from CSV Files into Pandas Dataframes

Challenge 1: Plot a DataFrame

Share on

Leave a Comment

You May Also Enjoy

Plot Data With Matplotlib

Calculate Seasonal Summary Values from Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

Calculate Summary Values Using Spatial Areas of Interest (AOIs) including Shapefiles for Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

How to Open and Process NetCDF 4 Data Format in Open Source Python