Chapter Three - Fundamentals of Raster Data in Python
In this chapter, you will learn fundamental concepts related to working with raster data in Python, including understanding the spatial attributes of raster data, how to open raster data and access its metadata, and how to explore the distribution of values in a raster dataset.
After completing this chapter, you will be able to:
- Open raster data using Python.
- Be able to list and identify 3 spatial attributes of a raster dataset: extent, crs and resolution.
- Explore and plot the distribution of values within a raster using histograms.
- Access metadata stored within a GeoTIFF raster file via TIF tags in Python.
What You Need
You will need a computer with internet access to complete this lesson.
The data story on Lidar data reviews the basic principles behind what a Lidar raster dataset is and how point clouds are used to derive the raster.
In this chapter, you will learn how to open a plot a lidar raster dataset in Python. You will also learn about key attributes of a raster dataset:
- Spatial resolution
- Spatial extent and
- Coordinate reference systems
Open Raster Data in Python
You can use the rasterio library combined with numpy and matplotlib to open, manipulate and plot raster data in Python.
# Import necessary packages import os import matplotlib.pyplot as plt import seaborn as sns import numpy as np from shapely.geometry import Polygon, box import geopandas as gpd import rasterio as rio from rasterio.plot import show from rasterio.mask import mask # Package created for the earth analytics program import earthpy as et import earthpy.plot as ep # Prettier plotting with seaborn sns.set(font_scale=1.5, style="white")
# Get data and set working directory et.data.get_data("colorado-flood") os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
Downloading from https://ndownloader.figshare.com/files/16371473 Extracted output to /root/earth-analytics/data/colorado-flood/.
Note that you imported the rasterio library using the shortname
Now, you can use the
rio.open("path-to-raster-here") function to open a raster dataset.
# Define relative path to file lidar_dem_path = os.path.join("data", "colorado-flood", "spatial", "boulder-leehill-rd", "pre-flood", "lidar", "pre_DTM.tif") # Open raster data lidar_dem = rio.open(lidar_dem_path)
To check your data, you can query the spatial extent of the data using the attribute
You can also quickly plot the raster using the rasterio function called
show(). The function argument
title = "Plot title here" adds a title to the plot.
# Query the spatial extent of the data lidar_dem.bounds
BoundingBox(left=472000.0, bottom=4434000.0, right=476000.0, top=4436000.0)
# Plot the dem using raster.io fig, ax = plt.subplots(figsize = (8,3)) show(lidar_dem, title="Lidar Digital Elevation Model (DEM) \n Boulder Flood 2013", ax=ax) ax.set_axis_off()
Opening and Closing File Connections
The rasterio library is efficient as it establishes a connection with the raster file rather than directly reading it into memory. Because it creates a connection, it is important that you close the connection after it is opened AND after you’ve finished working with the data!
# Close the connection lidar_dem.close()
# this returns an error as you have closed the connection to the file. show(lidar_dem)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-7-dad244dfd7d3> in <module>() 1 # this returns an error as you have closed the connection to the file. ----> 2 show(lidar_dem) ~/anaconda3/envs/earth-analytics-python/lib/python3.6/site-packages/rasterio/plot.py in show(source, with_bounds, contour, contour_label_kws, ax, title, **kwargs) 80 elif isinstance(source, RasterReader): 81 if source.count == 1: ---> 82 arr = source.read(1, masked=True) 83 else: 84 try: rasterio/_io.pyx in rasterio._io.RasterReader.read (rasterio/_io.c:10647)() rasterio/_io.pyx in rasterio._io.RasterReader._read (rasterio/_io.c:15124)() ValueError: can't read closed raster file
Once the connection is closed, you can no longer work with the data. You’ll need to re-open the connection. Like this:
# Open raster data connection - again lidar_dem = rio.open(lidar_dem_path) fig, ax = plt.subplots(figsize = (8,3)) show(lidar_dem, title="Once the connection is re-opened \nyou can work with the raster data", ax=ax) ax.set_axis_off()
Context Manager to Open/Close Raster Data
A better way to work with raster data in rasterio is to use the context manager. This will handle opening and closing the raster file for you.
with rio.open(path-to-file') as src: src.rasteriofunctionname
with rio.open(lidar_dem_path) as src: print(src.bounds)
BoundingBox(left=472000.0, bottom=4434000.0, right=476000.0, top=4436000.0)
With a context manager, you create a connection to the file that you’d like to open. However, once your are outside of the
with statement, that connection closes. Thus you don’t have to worry about opening and closing files using this syntax.
# Note that the src object is now closed src
<closed DatasetReader name='data/colorado-flood/spatial/boulder-leehill-rd/pre-flood/lidar/pre_DTM.tif' mode='r'>
Raster Plots with Matplotlib
Above you used the
show() function to plot a rasterio object. Show “wraps” around the matplotlib plotting library to produce a plot.
However, you will explore plotting a numpy array with matplotlib directly. Using matplotlib allows you to fully customize your plots. Alongside matplotlib, you will also be exploring using another “wrapper” function to aide in the plotting,
To plot using matplotlib and earthpy directly you:
- open the raster
create a spatial_extentobject that contains the boundary information needed to plot your raster in space using
- Read in the raster data itself into a numpy array using
with rio.open(lidar_dem_path) as src: # Convert / read the data into a numpy array: lidar_dem_im = src.read() # Create a spatial extent object using rio.plot.plotting spatial_extent = rio.plot.plotting_extent(src) # Get bounds of object bounds = src.bounds
You can use the
rio.plot.plotting_extent function to create a spatial extent in the format that matplotlib needs to plot your raster.
Spatial Extents and Plotting
The bounding box output - which represents the spatial extent of your raster, is provided to use in a rasterio specific format. To plot with matplotlib, you need to provide a vector that contains the spatial extent in the following format:
[left, right, bottom, top]
However, if you just use the
.bounds object that rasterio provides, the numbers are not in the correct order. You can use
rio.plot.plotting_extent(rasterio-object-name-here) function to get a spatial extent in the format that matplotlib requires
# This is the format that matplotlib wants print("spatial extent:", spatial_extent) # This is the format that rasterio provides with the bounds attribute print("rasterio bounds:", bounds)
spatial extent: (472000.0, 476000.0, 4434000.0, 4436000.0) rasterio bounds: BoundingBox(left=472000.0, bottom=4434000.0, right=476000.0, top=4436000.0)
Read Files with Rasterio into Numpy
Next let’s explore how you read in a raster using rasterio. When you use
.read(), rasterio imports the data from your raster into a numpy array.
Remember that a numpy array is simply a matrix of values with no particular spatial attributes associated with them. Numpy arrays are, however, a very efficient structure for working with large and potentially multi-dimensional (layered) matrices.
with rio.open(lidar_dem_path) as src: # Convert / read the data into a numpy array # masked = True turns `nodata` values to nan lidar_dem_im = src.read(1, masked=True) # Create a spatial extent object using rio.plot.plotting spatial_extent = rio.plot.plotting_extent(src) print("object shape:", lidar_dem_im.shape) print("object type:", type(lidar_dem_im))
object shape: (2000, 4000) object type: <class 'numpy.ma.core.MaskedArray'>
Below you read in the data using
src is the name of the object that you defined within the context manager and
read(1) reads in just the first layer in your raster. Specifying the
1 is important as it will force rasterio to import the raster into a 2 dimensional vs a 3 dimensional array.
See the example below
with rio.open(lidar_dem_path) as src: # Convert / read the data into a numpy array: lidar_dem_im2 = src.read(1) with rio.open(lidar_dem_path) as src: # Convert / read the data into a numpy array: lidar_dem_im3 = src.read() print("Array Shape Using read(1):", lidar_dem_im2.shape) # Notice that without the (1), your numpy array has a third dimension print("Array Shape Using read():", lidar_dem_im3.shape)
Array Shape Using read(1): (2000, 4000) Array Shape Using read(): (1, 2000, 4000)
Also notice that you used the argument
masked=True in your
.read() statement. This sets all
nodata values in your data to
nan which you will want for plotting!
Plot Numpy Array
Finally, you can plot your data using
ep.plot_bands(). Notice that you provide
ep.plot_bands() with the
spatial_extent object that you created above to ensure that the x and y axis represent the pixel locations of your raster data.
ep.plot_bands(lidar_dem_im, cmap='Greys', extent=spatial_extent, title="Digital Elevation Model - Pre 2013 Flood", cbar=False) plt.show()
Let’s plot again but this time you will:
- add a colorbar legend
- increase the title font size using the
as.set_titlefunction and the
plot_bands() function adds a colorbar to your plot automatically. In the last plot, you’ll notice the arguement called
cbar is set to
False. This turns off the colorbar. The default value for the
cbar arguement is
True. This means if you don’t modify that arguement, the colorbar will automatically appear! However, you may also notice a new arguement in this plot,
scale=False. By default,
plot_bands() will scale values in a raster from 0 to 255. Since this is elevation data, you can avoid this by setting
Additionally, you will be using matplotlib and
earthpy.plot together in this plot, in order to modify the title font size.
plot_bands() can be added into any normal matplotlib plot by just giving it an axis object in the
fig, ax = plt.subplots(figsize=(12, 10)) ep.plot_bands(lidar_dem_im, cmap='Greys', extent=spatial_extent, scale=False, ax=ax) ax.set_title("Lidar Digital Elevation Model \n Pre 2013 Boulder Flood | Lee Hill Road", fontsize=24) plt.show()
To plot you can select pre-determined color ramps from matplotlib, you can reverse a color ramp by adding
_r at the end of the color ramps name, for example
cmap = 'viridis_r'.
Explore Raster Data Values with Histograms
Next, you will explore a histogram of your data. A histogram is useful to help you better understand the distribution of values within your data. In this case given you are looking at elevation data, if there are all small elevation values and the histogram looks uniform (not too much variation in values) you can assume that your study area is relative “flat” - not too hilly. If there is a different distribution of elevation values you can begin to understand the range of elevation values in your study area and the degree of difference between low and high regions (i.e. is it flat or hilly?). Is it high elevation vs low elevation?
masked_array( data=[[--, --, --, ..., 1695.6300048828125, 1695.419921875, 1695.429931640625], [--, --, --, ..., 1695.5999755859375, 1695.5399169921875, 1695.3599853515625], [--, --, --, ..., 1695.3800048828125, 1695.43994140625, 1695.3699951171875], ..., [--, --, --, ..., 1681.449951171875, 1681.3900146484375, 1681.25], [--, --, --, ..., 1681.719970703125, 1681.5699462890625, 1681.5599365234375], [--, --, --, ..., 1681.8900146484375, 1681.8099365234375, 1681.739990234375]], mask=[[ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], ..., [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]], fill_value=-3.4028235e+38, dtype=float32)
# Plot histogram ep.hist(lidar_dem_im[~lidar_dem_im.mask].ravel(), bins=100, title="Lee Hill Road - Digital elevation (terrain) model - \nDistribution of elevation values") plt.show()
Adjust Plot Extent to “Zoom in” on Your Raster Data
If you want to quickly zoom in on a portion of your raster data, you can adjust the x and y spatial extents of your matplotlib plot. To do this, you will create a new spatial extent that is smaller than the original spatial extent of the data.
# Define a spatial extent that is "smaller" # minx, miny, maxx, maxy, ccw=True zoomed_extent = [472500, 4434000, 473030, 4435030]
Next you’ll define a box which you’ll focus on. You’ve provided a small helper function that lets you give the x and y limits of a box, and it returns the
x,y points corresponding to four corners of this box. It then returns a
shapely polygon object.
# Turn extent into geodataframe zoom_ext_gdf = gpd.GeoDataFrame() zoom_ext_gdf.loc[0, 'geometry'] = box(*zoomed_extent)
# Plot the original data with the boundary box fig, ax = plt.subplots(figsize=(8, 3)) ep.plot_bands(lidar_dem_im, extent=spatial_extent, title="Lidar Raster Full Spatial Extent w Zoom Box Overlayed", ax=ax, scale=False) zoom_ext_gdf.plot(ax=ax) ax.set_axis_off()
# Plot the data but set the x and y lim fig, ax = plt.subplots(figsize=(8, 3)) ep.plot_bands(lidar_dem_im, extent=spatial_extent, title="Lidar Raster Zoomed on a Smaller Spatial Extent", ax=ax, scale=False) # Set x and y limits of the plot ax.set_xlim(zoomed_extent, zoomed_extent) ax.set_ylim(zoomed_extent, zoomed_extent) ax.set_axis_off() plt.show()