# Lesson 1. Intro to Numpy Arrays

# Work with Scientific Data Using Numpy Arrays - Intro to earth data science textbook course module

Welcome to the first lesson in the**Work with Scientific Data Using Numpy Arrays**module. Numpy arrays are a commonly used scientific data structure in Python that store data as a grid, or a matrix. Learn how to import data into numpy arrays and how to run calculations, summarize, and select data from numpy arrays.

## Chapter Fourteen - Numpy Arrays

In this chapter, you will learn about a commonly used data structure in Python for scientific data: **numpy** arrays. You will write **Python** code to import text data (.txt and .csv) as **numpy** arrays and to run calculations and summarize data in **numpy** arrays.

After completing this chapter, you will be able to:

- Describe the key characteristics of
**numpy**arrays. - Import data from text files (.txt, .csv) into
**numpy**arrays. - Run calculations and summarize data in
**numpy**arrays. - Use indexing to slice (i.e. select) data from
**numpy arrays**.

## What You Need

You should have Conda setup on your computer and the Earth Analytics Python Conda environment. Follow the Set up Git, Bash, and Conda on your computer to install these tools.

Be sure that you have completed the chapters on Jupyter Notebook, working with packages in Python, and working with paths and directories in Python.

## What are Numpy Arrays

**Numpy** arrays are a commonly used scientific data structure in **Python** that store data as a grid, or a matrix.

In **Python**, data structures are objects that provide the ability to organize and manipulate data by defining the relationships between data values stored within the data structure and by providing a set of functionality that can be executed on the data structure.

Recall that in the previous chapters, you used lists (another data structure in **Python**) to store values of monthly precipitation for Boulder, CO.

Like **Python** lists, **numpy** arrays are also composed of ordered values (called elements) and also use indexing to organize and manipulate the elements in the **numpy** arrays.

A key characteristic of **numpy** arrays is that all elements in the array must be the same type of data (i.e. all integers, floats, text strings, etc).

Unlike lists which do not require a specific **Python** package to be defined (or worked with), **numpy** arrays are defined using the `array()`

function from the **numpy** package.

To this function, you can provide a list of values (i.e. the elements) as the input parameter:

`array = numpy.array([0.7 , 0.75, 1.85])`

The example above creates a **numpy** array with a simple grid structure along one dimension. However, the grid structure of **numpy** arrays allow them to store data along multiple dimensions (e.g. rows, columns) that are relative to each other. This dimensionality makes **numpy** arrays very efficient for storing large amounts of data of the same type and characteristic.

## Key Differences Between Python Lists and Numpy Arrays

While **Python** lists and **numpy** arrays have similarities in that they are both collections of values that use indexing to help you store and access data, there are a few key differences between these two data structures:

Unlike a

**Python**list, all elements in a**numpy**arrays must be the same data type (i.e. all integers, decimals, text strings, etc).Because of this requirement,

**numpy**arrays support arithmetic and other mathematical operations that run on each element of the array (e.g. element-by-element multiplication). Recall that lists cannot have these numeric calculations applied directly to them.Unlike a

**Python**list, a**numpy**array is not edited by adding/removing/replacing elements in the array. Instead, each time that the**numpy**array is manipulated in some way, it is actually deleted and recreated each time.**Numpy**arrays can store data along multiple dimensions (e.g. rows, columns) that are relative to each other. This makes**numpy**arrays a very efficient data structure for large datasets.

## Dimensionality of Numpy Arrays

**Numpy** arrays can be:

- one-dimensional composed of values along one dimension (resembling a
**Python**list). - two-dimensional composed of rows of individual arrays with one or more columns.
- multi-dimensional composed of nested arrays with one or more dimensions.

In this chapter, you will work with one-dimensional and two-dimensional **numpy** arrays.

For **numpy** arrays, brackets `[]`

are used to assign and identify the dimensions of the **numpy** arrays.

This first example below shows how a single set of brackets `[]`

are used to define a one-dimensional array.

```
# Import numpy with alias np
import numpy as np
```

```
# Monthly avg precip for Jan through Mar in Boulder, CO
avg_monthly_precip = np.array([0.70, 0.75, 1.85])
print(avg_monthly_precip)
```

```
[0.7 0.75 1.85]
```

Notice that the output of the one-dimensional **numpy** array is also contained within a single set of brackets `[]`

.

To create a two-dimensional array, you need to specify two sets of brackets `[]`

, the outer set that defines the entire array structure and inner sets that define the rows of the individual arrays.

```
# Monthly precip for Jan through Mar in 2002 and 2013
precip_2002_2013 = np.array([
[1.07, 0.44, 1.50],
[0.27, 1.13, 1.72]
])
print(precip_2002_2013)
```

```
[[1.07 0.44 1.5 ]
[0.27 1.13 1.72]]
```

Notice again that the output of the two-dimensional **numpy** array is contained with two sets of brackets `[]`

, which is an easy, visual way to identify whether the **numpy** array is two-dimensional.

Dimensionality will remain a key concept for working with **numpy** arrays, as you learn more throughout this chapter including how to use attributes of the **numpy** arrays to identify the number of dimensions and how to use indexing to slice (i.e. select) data from **numpy** arrays.

## Leave a Comment