Lesson 7. Crop a Spatial Raster Dataset Using a Shapefile in Python Spatial data open source python Workshop


Learning Objectives

After completing this lesson, you will be able to:

  • Crop a raster dataset in Python using a vector extent object derived from a shapefile.
  • Open a shapefile in Python.

What You Need

Download spatial-vector-lidar data subset (~172 MB)

You will need a computer with internet access to complete this lesson. If you are following along online and not using our cloud environment:

Get data and software setup instructions here

You will need the Python 3.x Anaconda distribution, git and bash to set things up.

In this lesson, you will learn how to crop a raster dataset in Python.

What Does Crop a Raster Mean?

Cropping (sometimes also referred to as clipping), is when you subset or make a dataset smaller, by removing all data outside of the crop area or spatial extent. In this case you have a large raster - but let’s pretend that you only need to work with a smaller subset of the raster.

You can use the crop_extent function to remove all of the data outside of your study area. This is useful as it:

  1. Makes the data smaller and
  2. Makes processing and plotting faster

In general when you can, it’s often a good idea to crop your raster data!

To begin let’s load the libraries that you will need in this lesson.

Load Libraries

Be sure to set your working directory

os.chdir("path-to-you-dir-here/earth-analytics/data")

import os
import numpy as np
import rasterio as rio
from rasterio.plot import show
from rasterio.mask import mask
from shapely.geometry import mapping
import matplotlib.pyplot as plt
import geopandas as gpd
import earthpy as et
import earthpy.plot as ep
import earthpy.spatial as es
import cartopy as cp

# set home directory and download data
et.data.get_data("spatial-vector-lidar")
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

# optional - turn off warnings
import warnings
warnings.filterwarnings('ignore')

Open Raster and Vector Layers

Next, you will use rio.open() to open a raster layer. Open and plot the canopy height model (CHM) that you created in the previous lesson. Or you can use the CHM provided to you in the data directory here:

data/spatial-vector-lidar/california/neon-soap-site/2013/lidar/SOAP_lidarCHM.tif

soap_chm_path = 'data/spatial-vector-lidar/california/neon-soap-site/2013/lidar/SOAP_lidarCHM.tif'
# open the lidar chm
with rio.open(soap_chm_path) as src:
    lidar_chm_im = src.read(masked=True)[0]
    extent = rio.plot.plotting_extent(src)
    soap_profile = src.profile

ep.plot_bands(lidar_chm_im,
               cmap='terrain',
               extent=extent,
               title="Lidar Canopy Height Model (CHM)\n NEON SOAP Field Site",
               cbar=False);
A canopy height model plotted using the plot_bands earthpy function.
A canopy height model plotted using the plot_bands earthpy function.

Open Vector Layer

Next, open up a vector layer that contains the crop extent that you want to use to crop your data. To open a shapefile you use the gpd.read_file() function from geopandas.

# open crop extent
crop_extent_soap = gpd.read_file('data/spatial-vector-lidar/california/neon-soap-site/vector_data/SOAP_crop2.shp')

Next, explore the coordinate reference system (CRS) of both of your datasets. Remember that in order to perform any analysis with these two datasets together, they will need to be in the same CRS.

print('crop extent crs: ', crop_extent_soap.crs)
print('lidar crs: ', soap_profile['crs'])
crop extent crs:  {'init': 'epsg:32611'}
lidar crs:  EPSG:32611
# plot the data
fig, ax = plt.subplots(figsize = (6, 6))
crop_extent_soap.plot(ax=ax)
ax.set_title("Shapefile Imported into Python \nCrop Extent for Soaproot Saddle Field Site", 
             fontsize = 16)
ax.set_axis_off();
A plot of the clipping extent layer that you will use to crop your raster data.
A plot of the clipping extent layer that you will use to crop your raster data.
The spatial extent of a shapefile the geographic edge or location that is the furthest north, south east and west.
The spatial extent of a shapefile represents the geographic "edge" or location that is the furthest north, south east and west. Thus is represents the overall geographic coverage of the spatial object. Image Source: Colin Williams, NEON.

Now that you have imported the shapefile. Plot the two layers together to ensure the overlap each other. If the shapefile does not overlap the raster, then you can not use it to crop!

fig, ax = plt.subplots(figsize=(10, 10))
ep.plot_bands(lidar_chm_im,
              cmap='terrain',
              extent=extent,
              ax=ax,
              cbar=False)
crop_extent_soap.plot(ax=ax, alpha=.6, color='g');
The clipping extent overlayed on top of your raster. When you crop the raster, all of the data outside of the clipping extent will be removed.
The clipping extent overlayed on top of your raster. When you crop the raster, all of the data outside of the clipping extent will be removed.

To crop the data,use the crop_image function in earthpy.spatial.

with rio.open(soap_chm_path) as src:
    lidar_chm_crop, soap_lidar_meta = es.crop_image(src, crop_extent_soap)
# Update the metadata to have the new shape (x and y and affine information)
soap_lidar_meta.update({"driver": "GTiff",
                 "height": lidar_chm_crop.shape[0],
                 "width": lidar_chm_crop.shape[1],
                 "transform": soap_lidar_meta["transform"]})

# generate an extent for the newly cropped object for plotting
cr_ext = rio.transform.array_bounds(soap_lidar_meta['height'], 
                                            soap_lidar_meta['width'], 
                                            soap_lidar_meta['transform'])

bound_order = [0,2,1,3]
cr_extent = [cr_ext[b] for b in bound_order]
cr_extent, crop_extent_soap.total_bounds
([297349.0, 298062.0, 4101114.0, 4101115.0],
 array([ 297349.82145413, 4100402.84616318,  297923.2856659 ,
        4101114.9500745 ]))
# mask the nodata and plot the newly cropped raster layer
lidar_chm_crop_ma = np.ma.masked_equal(lidar_chm_crop[0], -9999.0) 
ep.plot_bands(lidar_chm_crop_ma, cmap='terrain', cbar=False);
Here you can see the results of your crop function.
Here you can see the results of your crop function.

# Save to disk so you can use the file later.
path_out = "data/spatial-vector-lidar/outputs/soap_lidar_chm_crop.tif"
with rio.open(path_out, 'w', **soap_lidar_meta) as ff:
    ff.write(lidar_chm_crop[0], 1)

Optional Challenge: Crop the SJER Lidar Data

Above your cropped the data for the SOaproot Saddle fieldsite. Crop the data using the same approach for the sjer field site located in this folder: data/spatial-vector-lidar/california/neon-sjer-site.

Leave a Comment