Lesson 2. Import Text Files Into Numpy Arrays


Learning Objectives

  • List two common text file formats for importing data into numpy arrays.
  • Import data from text files (.txt, .csv) into numpy arrays.

Common Text File Formats For Importing Data into Numpy Arrays

Scientific data can come in a variety of file formats and types. In this textbook, you will import data into numpy arrays from two commonly used text file formats for scientific data:

  • Plain text files (.txt)
  • Comma-separated values files (.csv)

Plain Text Files

Plain text files simply list out the values on separate lines without any symbols or delimiters to indicate separate values.

For example, average monthly precipitation (inches) for Boulder, CO, collected by the U.S. National Oceanic and Atmospheric Administration (NOAA), can be stored as a plain text file (.txt), with a separate line for each month’s value.

0.70
0.75
1.85
2.93
3.05
2.02
1.93
1.62
1.84
1.31
1.39
0.84

Due to their simplicity, text files (.txt) can be very useful for collecting very large datasets that are all the same type of observation or data type.

CSV Files

Unlike plain-text files which simply list out the values on separate lines without any symbols or delimiters, files containing comma-separated values (.csv) use commas (or some other delimiter like tab spaces or semi-colons) to indicate separate values.

This means that .csv files can easily support multiple rows and columns of related data.

For example, the monthly precipitation values for Boulder, CO for the years 2002 and 2013 can be stored together in a comma-separated values (.csv) file, with each year of data on a separate line and each month of data within a specific year separated by commas:

1.07, 0.44, 1.50, 0.20, 3.20, 1.18, 0.09, 1.44, 1.52, 2.44, 0.78, 0.02
0.27, 1.13, 1.72, 4.14, 2.66, 0.61, 1.03, 1.40, 18.16, 2.24, 0.29, 0.5

As you learned previously in this chapter, you can manually define numpy arrays as needed using the numpy.array() function. However, when working with larger datasets, you will want to import data directly into numpy arrays from data files (such as .txt and .csv).

Get Data To Import Into Numpy Arrays

Import Python Packages and Set Working Directory

In previous chapters, you learned how to import Python packages.

To import data into numpy arrays, you will need to import the numpy package, and you will use the earthpy package to download the data files from the Earth Lab data repository on Figshare.com.

# Import necessary packages
import os

import numpy as np
import earthpy as et

Download Data from URL Using EarthPy

You can use the function data.get_data() from the earthpy package (which you imported with the alias et) to download data from online sources such as the Figshare.com data repository.

To use the function et.data.get_data(), you need to provide a parameter value for the url, which you define by providing a text string of the URL to the dataset.

Begin by downloading a .txt file for average monthly precipitation (inches) for Boulder, CO collected by the U.S. National Oceanic and Atmospheric Administration (NOAA) from the following URL:

https://ndownloader.figshare.com/files/12565616

# Define variable for URL to .txt with avg monthly precip data
monthly_precip_url = 'https://ndownloader.figshare.com/files/12565616'

# Provide variable as parameter value for `url`
et.data.get_data(url=monthly_precip_url)
Downloading from https://ndownloader.figshare.com/files/12565616
'/root/earth-analytics/data/earthpy-downloads/avg-monthly-precip.txt'

Take a close look at the path to this file. By default, the data.get_data() function downloads data from URLs to the following directory:

/home/your-username/earth-analytics/data/earthpy-downloads/

If the directory does not already exist, the function will create it for you.

The month names are stored in a different .txt file, which you can download from the following URL:

https://ndownloader.figshare.com/files/12565619

# Download data from URL to .txt with month names
month_names_url = 'https://ndownloader.figshare.com/files/12565619'
et.data.get_data(url=month_names_url)
Downloading from https://ndownloader.figshare.com/files/12565619
'/root/earth-analytics/data/earthpy-downloads/months.txt'

Next, download a .csv file that contains the monthly precipitation (inches) for Boulder, CO for the years 2002 and 2013, collected by the U.S. National Oceanic and Atmospheric Administration (NOAA).

# Download data from URL to .csv of precip data for 2002 and 2013
precip_2002_2013_url = 'https://ndownloader.figshare.com/files/12707792'
et.data.get_data(url=precip_2002_2013_url)
Downloading from https://ndownloader.figshare.com/files/12707792
'/root/earth-analytics/data/earthpy-downloads/monthly-precip-2002-2013.csv'

Now that you have downloaded these files, you can take a look at them by opening the files from your file explorer. Recall that these files have been downloaded to:

/home/your-username/earth-analytics/data/earthpy-downloads/

Notice the structure of each file. While avg-monthly-precip.txt contains numeric values and months.txt contains text string values, both files are plain text files with a separate line for each month’s value.

On the other hand, monthly-precip-2002-2013.csv contains rows and columns of data, with each year of data on a separate line and each month of data within a specific year separated by commas.

Import Numeric Data from Text Files Into Numpy Arrays

You can easily create new numpy arrays by importing numeric data from text files (.txt and .csv) using the loadtxt() function from numpy (which you imported with the alias np) .

Begin by setting the working directory to your earth-analytics directory using the os package and the HOME attribute of the earthpy package.

As you learned in the chapter on working with paths and directories, this will provide you with the flexibility to specify files to import from various subdirectories that you might have within the earth-analytics directory.

# Set working directory to earth-analytics
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

Import Data From TXT File

To import data from a .txt file, you simply need to specify a value for the parameter called fname for the file name:

np.loadtxt(fname)

Recall from the chapter on working with paths and directories that you can use os.path.join() to create paths that will work on any operating system.

In the example below, the fname is defined using os.path.join() with a relative path to the avg-monthly-precip.txt file because you previously set the working directory to earth-analytics.

# Define path to file using os.path.join
fname = os.path.join("data", "earthpy-downloads",
                     "avg-monthly-precip.txt")

# Import average monthly precip to numpy array
avg_monthly_precip = np.loadtxt(fname)

print(avg_monthly_precip)
[0.7  0.75 1.85 2.93 3.05 2.02 1.93 1.62 1.84 1.31 1.39 0.84]

Notice that the data from the .txt file has been imported as a one-dimensional array (avg_monthly_precip), contained within a single set of brackets [].

Recall that you can use the type() function to check the type for variable. In this case, you can check that avg_monthly_precip is indeed a numpy array.

# Check type
type(avg_monthly_precip)
numpy.ndarray

Import Data From CSV File

You can also use np.loadtxt(fname) to import data from .csv files that contain rows and columns of data.

You will need to specify both the fname parameter as well as the delimiter parameter to indicate the character that is being used to separate values in the file (e.g. commas, semi-colons):

np.loadtxt(fname, delimiter = ",")

# Import monthly precip for 2002 and 2013 to numpy array
fname = os.path.join("data", "earthpy-downloads",
                     "monthly-precip-2002-2013.csv")
precip_2002_2013 = np.loadtxt(fname, delimiter=",")
# Check type
type(precip_2002_2013)
numpy.ndarray
print(precip_2002_2013)
[[ 1.07  0.44  1.5   0.2   3.2   1.18  0.09  1.44  1.52  2.44  0.78  0.02]
 [ 0.27  1.13  1.72  4.14  2.66  0.61  1.03  1.4  18.16  2.24  0.29  0.5 ]]

Notice that the data from the .csv file has been imported as a two-dimensional array (precip_2002_2013), contained within two set of brackets [].

Import Text String Data from Text Files Into Numpy Arrays

As needed, you can also import text files with text string values (such as month names) to numpy arrays using the genfromtxt() function from numpy.

You need to specify a parameter value for fname as well as a parameter value for the data type as dtype='str':

np.genfromtxt(fname, dtype='str')

# Import month names
fname = os.path.join("data", "earthpy-downloads", "months.txt")
months = np.genfromtxt(fname, dtype='str')

type(months)
numpy.ndarray
print(months)
['Jan' 'Feb' 'Mar' 'Apr' 'May' 'June' 'July' 'Aug' 'Sept' 'Oct' 'Nov'
 'Dec']

You now know how to import data from text files into numpy arrays, which will come in very handy as you begin to work with scientific data.

On the next pages of this chapter, you will learn how to work with numpy arrays to run calculations, summarize data, and more.

Leave a Comment