Lesson 4. Slice (or Select) Data From Numpy Arrays


Learning Objectives

After completing this page, you will be able to:

  • Explain the difference in indexing between one-dimensional and two-dimensional numpy arrays.
  • Use indexing to slice (i.e. select) data from one-dimensional and two-dimensional numpy arrays.

Indexing on Numpy Arrays

In a previous chapter that introduced Python lists, you learned that Python indexing begins with [0], and that you can use indexing to query the value of items within Python lists.

You can also access elements (i.e. values) in numpy arrays using indexing.

Indexing on One-dimensional Numpy Arrays

For one-dimensional numpy arrays, you only need to specific one index value, which is the position of the element in the numpy array (e.g. arrayname[index]).

As an example, take a look at the one-dimensional array below which has 3 elements.

avg_monthly_precip = numpy.array([0.70, 0.75, 1.85])

You can use avg_monthly_precip[2] to select the third element in (1.85) from this one-dimensional numpy array.

Recall that you are using use the index [2] for the third place because Python indexing begins with [0], not with [1].

Indexing on Two-dimensional Numpy Arrays

For two-dimensional numpy arrays, you need to specify both a row index and a column index for the element (or range of elements) that you want to access.

For example, review the two-dimensional array below with 2 rows and 3 columns.

precip_2002_2013 = numpy.array([[1.07, 0.44, 1.5],
                              [0.27, 1.13, 1.72]])

To select the element in the second row, third column (1.72), you can use:

precip_2002_2013[1, 2]

which specifies that you want the element at index [1] for the row and index [2] for the column.

Just like for the one-dimensional numpy array, you use the index [1,2] for the second row, third column because Python indexing begins with [0], not with [1]

On this page, you will use indexing to select elements within one-dimensional and two-dimensional numpy arrays, a selection process referred to as slicing.

Import Python Packages and Get Data

Begin by importing the necessary Python packages and downloading and importing the data into numpy arrays.

As you learned previously in this chapter, you will use the earthpy package to download the data files, os to set the working directory, and numpy to import the data files into numpy arrays.

# Import necessary packages
import os
import numpy as np
import earthpy as et
# Download data from URL to .txt with avg monthly precip data
monthly_precip_url = 'https://ndownloader.figshare.com/files/12565616'
et.data.get_data(url=monthly_precip_url)

# Download data from URL to .csv of precip data for 2002 and 2013
precip_2002_2013_url = 'https://ndownloader.figshare.com/files/12707792'
et.data.get_data(url=precip_2002_2013_url)
'/root/earth-analytics/data/earthpy-downloads/monthly-precip-2002-2013.csv'
# Set working directory to earth-analytics
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
# Import average monthly precip
fname = "data/earthpy-downloads/avg-monthly-precip.txt"
avg_monthly_precip = np.loadtxt(fname)

print(avg_monthly_precip)
[0.7  0.75 1.85 2.93 3.05 2.02 1.93 1.62 1.84 1.31 1.39 0.84]
# Import monthly precip for 2002 and 2013
fname = "data/earthpy-downloads/monthly-precip-2002-2013.csv"
precip_2002_2013 = np.loadtxt(fname, delimiter=",")

print(precip_2002_2013)
[[ 1.07  0.44  1.5   0.2   3.2   1.18  0.09  1.44  1.52  2.44  0.78  0.02]
 [ 0.27  1.13  1.72  4.14  2.66  0.61  1.03  1.4  18.16  2.24  0.29  0.5 ]]

Slice One-dimensional Numpy Arrays

By checking the shape of avg_monthly_precip using .shape, you know that it contains 12 elements along one dimension (e.g. [12,]).

# Check shape
avg_monthly_precip.shape
(12,)

If you to select the last element of the array, you can use index [11], as you know that indexing in Python begins with [0].

# Select the last element of 12 elements
avg_monthly_precip[11]
0.84

Check out what happens when you query for an index location that does not exist in the array, say the index [12,].

# This code results in the error below
avg_monthly_precip[12]

IndexError: index 12 is out of bounds for axis 0 with size 12

You are told explicitly that there are 12 elements but that the index [12] is not within the bounds of the data.

One way to get around having to explicit know the number of elements is to use shortcuts such as -1 which identifies the last index for you:

# Select the last element of the array
avg_monthly_precip[-1]
0.84

Slice a Range of Values from One-dimensional Numpy Arrays

You can slice a range of elements from one-dimensional numpy arrays such as the third, fourth and fifth elements, by specifying an index range: [starting_value, ending_value].

Note that the index structure is inclusive of the first index value, but not the second index value. So you provide a starting index value for the selection and an ending index value that is not included in the selection.

Thus, to select the third, fourth and fifth elements, you need to specify the index value for the third element [2] as the starting value and then index value for the sixth element [5] as the ending value (but it will not be including in the output).

# Slice range from 3rd to 5th elements
print(avg_monthly_precip[2:5])
[1.85 2.93 3.05]

Slice Two-dimensional Numpy Arrays

Using .shape, you can confirm that precip_2002_2013 is a two-dimensional array with a row count of 2 with a column count of 12.

# Check shape
precip_2002_2013.shape
(2, 12)

To slice elements from two-dimensional arrays, you need to specify both a row index and a column index as [row_index, column_index].

For example, you can use the index [1,2] to query the element at the second row, third column in precip_2002_2013.

# Select element in 2nd row, 3rd column
precip_2002_2013[1, 2]
1.72

If you want to select the last element in the array, you need to select the element at the last row, last column.

For precip_2002_2013 which has 2 rows and 12 columns, the last row index is [1], while the last column index is [11].

# Select element in 2nd row, 12th column
precip_2002_2013[1, 11]
0.5

As you become more familiar with slicing, you can start to apply shortcuts, such as -1 introduced earlier, which can be used to identify the last index for the row and/or column:

# Select element in last row, last column
precip_2002_2013[-1, -1]
0.5

Slice a Range of Values from Two-dimensional Numpy Arrays

You can also use a range for the row index and/or column index to slice multiple elements using:

[start_row_index:end_row_index, start_column_index:end_column_index]

Recall that the index structure for both the row and column range is inclusive of the first index, but not the second index.

For example, you can use the index [0:1, 0:2] to select the elements in first row, first two columns.

# Slice first row, first two columns
print(precip_2002_2013[0:1, 0:2])
[[1.07 0.44]]

You can flip these index values to select elements in the first two rows, first column.

# Slice first two rows, first column
print(precip_2002_2013[0:2, 0:1])
[[1.07]
 [0.27]]

If you wanted to slice the second row, second to third columns, you would need to use the index[1:2, 1:3], which again identifies the ending index range but does not include it in the output.

# Slice 2nd row, 2nd and 3rd columns
print(precip_2002_2013[1:2, 1:3])
[[1.13 1.72]]

As you become more familiar with slicing, you can start to use shortcuts, such as omitting the first index value 0 to start a slice at the beginning of an index range:

# Slice first two rows, first two columns
print(precip_2002_2013[:2, :2])
[[1.07 0.44]
 [0.27 1.13]]

Notice that the slices in the examples above provide output as two-dimensional arrays, as the original array that is being sliced is also two-dimensional.

precip_2002_2013[:2, :2].shape
(2, 2)

Use Shortcuts to Create New One-dimensional Array From Row or Column Slice

Recall that precip_2002_2013 contains two rows (or years) of data for average monthly precipitation (one row for 2002 and one row for 2013) and twelve columns (one for each month).

You can use shortcuts to easily select an entire row or column by simply specifying the index of the row or column (e.g. 0 for the first, 1 for the second, etc) and providing : for the other index (meaning all of the row or column).

The output of these shortcuts will be one-dimensional arrays, which is very useful if you want to easily plot the data.

For example, you can use [0, :] to select the entire first row of precip_2002_2013, which are all of the monthly values for 2002.

# Select 1st row
print(precip_2002_2013[0, :])
[1.07 0.44 1.5  0.2  3.2  1.18 0.09 1.44 1.52 2.44 0.78 0.02]

Or conversely, you can use [:, 0] to select the entire first column of precip_2002_2013, which are all of the values for January (in the case in 2002 and 2013).

# Select 1st column
print(precip_2002_2013[:, 0])
[1.07 0.27]

This means that you can create a new numpy array of the average monthly precipitation data in 2002 by slicing the first row of values from precip_2002_2013.

Note that the result is an one-dimensional array, which you can use to plot the average monthly precipitation data for 2002.

# Select 1st row of data for 2002
precip_2002 = precip_2002_2013[0, :]

print(precip_2002.shape)
print(precip_2002)
(12,)
[1.07 0.44 1.5  0.2  3.2  1.18 0.09 1.44 1.52 2.44 0.78 0.02]

Practice Your Numpy Array Skills

Test your Python skills to:

  1. Review how to download and import data files into numpy arrays to create an array of month names from months.txt which is available for download at “https://ndownloader.figshare.com/files/12565619”.

  2. Create a new numpy array for the average monthly precipitation in 2013 by selecting all data values in the last row in precip_2002_2013 (i.e. data for the year 2013).

  3. Convert the values in the numpy array from inches to millimeters (1 inch = 25.4 millimeters).

  4. Use the converted numpy array for 2013 and the numpy array of month names to create plot of Average Monthly Precipitation in 2013 for Boulder, CO.

Leave a Comment