Lesson 7. Crop a Spatial Raster Dataset Using a Shapefile in Python

Spatial data open source python Workshop

Learning Objectives

After completing this lesson, you will be able to:

  • Crop a raster dataset in Python using a vector extent object derived from a shapefile.
  • Open a shapefile in Python.

What You Need

Download spatial-vector-lidar data subset (~172 MB)

You will need a computer with internet access to complete this lesson. If you are following along online and not using our cloud environment:

Get data and software setup instructions here

You will need the Python 3.x Anaconda distribution, git and bash to set things up.

In this lesson, you will learn how to crop a raster dataset in Python.

What Does Crop a Raster Mean?

Cropping (sometimes also referred to as clipping), is when you subset or make a dataset smaller, by removing all data outside of the crop area or spatial extent. In this case you have a large raster - but let’s pretend that you only need to work with a smaller subset of the raster.

You can use the crop_extent function to remove all of the data outside of your study area. This is useful as it:

  1. Makes the data smaller and
  2. Makes processing and plotting faster

In general when you can, it’s often a good idea to crop your raster data!

To begin let’s load the libraries that you will need in this lesson.

Load Libraries

Be sure to set your working directory


import rasterio as rio
from rasterio.plot import show
from rasterio.mask import mask
from shapely.geometry import mapping
import numpy as np
import os
import matplotlib.pyplot as plt
import geopandas as gpd
import earthpy as et
import cartopy as cp

os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
# optional - turn off warnings
import warnings

Open Raster and Vector Layers

Next, you will use rio.open() to open a raster layer. Open and plot the canopy height model (CHM) that you created in the previous lesson. Or you can use the CHM provided to you in the data directory here:


soap_chm_path = 'data/spatial-vector-lidar/california/neon-soap-site/2013/lidar/SOAP_lidarCHM.tif'
# open the lidar chm
with rio.open(soap_chm_path) as src:
    lidar_chm_im = src.read(masked = True)[0]
    extent = rio.plot.plotting_extent(src)
    soap_profile = src.profile

fig, ax = plt.subplots(figsize = (10,10))
      extent = extent)
ax.set_title("Lidar Canopy Height Model (CHM)\n NEON SOAP Field Site", 
             fontsize = 16);


Open Vector Layer

Next, open up a vector layer that contains the crop extent that you want to use to crop your data. To open a shapefile you use the gpd.read_file() function from geopandas.

# open crop extent
crop_extent_soap = gpd.read_file('data/spatial-vector-lidar/california/neon-soap-site/vector_data/SOAP_crop2.shp')

Next, explore the coordinate reference system (CRS) of both of your datasets. Remember that in order to perform any analysis with these two datasets together, they will need to be in the same CRS.

print('crop extent crs: ', crop_extent_soap.crs)
print('lidar crs: ', soap_profile['crs'])
crop extent crs:  {'init': 'epsg:32611'}
lidar crs:  +init=epsg:32611
# plot the data
fig, ax = plt.subplots(figsize = (6, 6))
ax.set_title("Shapefile Imported into Python \nCrop Extent for Soaproot Saddle Field Site", 
             fontsize = 16)


The spatial extent of a shapefile the geographic edge or location that is the furthest north, south east and west.
The spatial extent of a shapefile represents the geographic "edge" or location that is the furthest north, south east and west. Thus is represents the overall geographic coverage of the spatial object. Image Source: Colin Williams, NEON.

Now that you have imported the shapefile. Plot the two layers together to ensure the overlap each other. If the shapefile does not overlap the raster, then you can not use it to crop!

fig, ax = plt.subplots(figsize = (10,10))
crop_extent_soap.plot(ax=ax, alpha=.6, color='g');


To crop the data,use the mask function in rasterio.

from rasterio.mask import mask
from shapely.geometry import mapping
with rio.open(soap_chm_path) as src:
    extent_geojson = mapping(crop_extent_soap['geometry'][0])
    lidar_chm_crop, crop_affine = mask(src, 
    # metadata for writing or exporting the data
    soap_lidar_meta = src.meta.copy()
# Update the metadata to have the new shape (x and y and affine information)
soap_lidar_meta.update({"driver": "GTiff",
                 "height": lidar_chm_crop.shape[0],
                 "width": lidar_chm_crop.shape[1],
                 "transform": crop_affine})

# generate an extent for the newly cropped object for plotting
cr_ext = rio.transform.array_bounds(soap_lidar_meta['height'], 

bound_order = [0,2,1,3]
cr_extent = [cr_ext[b] for b in bound_order]
cr_extent, crop_extent_soap.total_bounds
([297349.0, 298062.0, 4101114.0, 4101115.0],
 array([ 297349.82145413, 4100402.84616318,  297923.2856659 ,
        4101114.9500745 ]))
# mask the nodata and plot the newly cropped raster layer
lidar_chm_crop_ma = np.ma.masked_equal(lidar_chm_crop[0], -9999.0) 
fig, ax = plt.subplots(figsize = (8,8))
ax.imshow(lidar_chm_crop_ma, extent = cr_extent)
#crop_extent_soap.plot(ax=ax, alpha=.6, color='g');

<matplotlib.image.AxesImage at 0x12c18fb38>


# Save to disk so you can use the file later.
path_out = "data/spatial-vector-lidar/outputs/soap_lidar_chm_crop.tif"
with rio.open(path_out, 'w', **soap_lidar_meta) as ff:
    ff.write(lidar_chm_crop[0], 1)

Optional Challenge: Crop the SJER Lidar Data

Above your cropped the data for the SOaproot Saddle fieldsite. Crop the data using the same approach for the sjer field site located in this folder: data/spatial-vector-lidar/california/neon-sjer-site.

Leave a Comment