Lesson 2. Import CSV Files Into Pandas Dataframes


Learning Objectives

  • Import tabular data from .csv files into pandas dataframes.

CSV Files of Tabular Data as Inputs to Pandas Dataframes

Recall that scientific data can come in a variety of file formats and types, including comma-separated values files (.csv), which use delimiters such as commas (or some other delimiter like tab spaces or semi-colons) to indicate separate values.

CSV files also support labeled names for the columns, referred to as headers. This means that CSV files can easily support multiple columns of related data, and thus, are very useful for collecting and organizing datasets across multiple locations and/or timeframes.

As you learned previously in this chapter, you can manually define pandas dataframes as needed using the pandas.DataFrame() function. However, when working with larger datasets, you will want to import data directly into pandas dataframes from .csv files.

Get Data to Import Into Pandas Dataframes

To import data into pandas dataframes, you will need to import the pandas package, and you will use the earthpy package to download the data files from the Earth Lab data repository on Figshare.com.

# Import necessary packages
import os
import pandas as pd
import earthpy as et

Recall from the previous chapter on numpy arrays that you can use the function data.get_data() from the earthpy package (which you imported with the alias et) to download data from online sources such as the Figshare.com data repository.

To use the function et.data.get_data(), you need to provide a parameter value for the url, which you define by providing a text string of the URL to the dataset.

Begin by downloading a .csv file for average monthly precipitation for Boulder, CO from the following URL:

https://ndownloader.figshare.com/files/12710618

# URL for .csv with avg monthly precip data
avg_monthly_precip_url = "https://ndownloader.figshare.com/files/12710618"

# Download file from URL
et.data.get_data(url=avg_monthly_precip_url)
Downloading from https://ndownloader.figshare.com/files/12710618
'/root/earth-analytics/data/earthpy-downloads/avg-precip-months-seasons.csv'
# Set working directory to earth-analytics
os.chdir(os.path.join(et.io.HOME, "earth-analytics"))

Import Tabular Data from CSV Files into Pandas Dataframes

Using the read_csv() function from the pandas package, you can import tabular data from CSV files into pandas dataframe by specifying a parameter value for the file name (e.g. pd.read_csv("filename.csv")).

Remember that you gave pandas an alias (pd), so you will use pd to call pandas functions.

# Import data from .csv file
fname = os.path.join("data", "earthpy-downloads", 
                     "avg-precip-months-seasons.csv")

avg_monthly_precip = pd.read_csv(fname)

avg_monthly_precip
monthsprecipseasons
0Jan0.70Winter
1Feb0.75Winter
2Mar1.85Spring
3Apr2.93Spring
4May3.05Spring
5June2.02Summer
6July1.93Summer
7Aug1.62Summer
8Sept1.84Fall
9Oct1.31Fall
10Nov1.39Fall
11Dec0.84Winter

As you can see, the months and precip data can exist together in the same pandas dataframe, which differs from numpy arrays. You can see that there is also a column for seasons containing text strings.

Once again, you can also see that the indexing still begins with [0], as it does for Python lists and numpy arrays, and that you did not have to use the print() function to see a nicely formatted version of the pandas dataframe.

You now know how to import data from .csv files into pandas dataframes, which will come in very handy as you begin to work with scientific data.

On the next pages of this chapter, you will learn how to work with pandas dataframes to run calculations, summarize data, and more.

Leave a Comment