Lesson 6. Clip a spatial vector layer in python using shapely & geopandas: GIS in Python


Learning Objectives

After completing this tutorial, you will be able to:

  • Clip a spatial vector point and line layer to the spatial extent of a polygon layer in Python using geopandas.
  • Plot data with custom legends

What You Need

You will need a computer with internet access to complete this lesson and the spatial-vector-lidar data subset created for the course.

Download Spatial Lidar Teaching Data Subset data

or using the earthpy package:

et.data.get_data("spatial-vector-lidar")

In this lesson, you will learn how to spatially clip data for easier plotting and analysis of smaller spatial areas. You will use the geopandas library and the box module from the shapely library.

How to Clip Vector Data in Python

What Is Clipping or Cropping Data?

When you clip or crop spatial data you are removing the data outside of the clip extent. This means that your clipped dataset will be SMALLER (have a smaller spatial extent) than the original dataset. This also means that objects in the data such as polygons or lines will be CUT based on the boundary of the clip object.

the spatial extent represents the spatial area that a particular dataset covers.
The spatial extent of a shapefile or `Python` spatial object like a `geopandas` `geodataframe` represents the geographic "edge" or location that is the furthest north, south east and west. Thus is represents the overall geographic coverage of the spatial object. Image Source: National Ecological Observatory Network (NEON)

When Do You Want to Clip Data?

You may want to clip your data for several reasons:

  1. You have more data than you need. For example you have data outside of your study area that you don’t need to process. Clipping it to the study area boundary will make it smaller and easier to manage!
  2. If you have data outside of your study area and you clip it, you can perform analysis on only that region - thus you won’t need to subset the data further when you perform summary statistics for example.
  3. When you plot the data you will only see the study region.

You will learn how to both crop your data and zoom in on an extent below.
Get started by loading your libraries. And be sure that your working directory is set.

In this lesson you will find examples of how to clip point and line data using geopandas. At the bottom of the lesson you will see a set of functions that can be used to clip the data in just one line of code. This lesson explains how those functions are built.

# Import libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.colors import ListedColormap
import geopandas as gpd
# Load the box module from shapely to create box objects
from shapely.geometry import box
import earthpy as et
# Plot figures inline
plt.ion()

os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
# Import clip_data.py as a module so you can access the clip data functions
import clip_data as cl

How to Clip Shapefiles in Python

In your dataset for this week you have 3 layers:

  1. A country boundary for the USA and
  2. A state boundary layer for the USA and
  3. Populated places in the USA

The data are imported and plotted below. Notice that there are points outside of your study area which is the continental USA. Your goal is the clip the points out that you NEED for your project - the points that overlay on top of the continental United States.

# Import all of your data at the top of your notebook to keep things organized.
country_boundary_us = gpd.read_file(
    'data/spatial-vector-lidar/usa/usa-boundary-dissolved.shp')
state_boundary_us = gpd.read_file(
    'data/spatial-vector-lidar/usa/usa-states-census-2014.shp')
pop_places = gpd.read_file(
    'data/spatial-vector-lidar/global/ne_110m_populated_places_simple/ne_110m_populated_places_simple.shp')

# Are the data all in the same crs?
print("country_boundary_us", country_boundary_us.crs)
print("state_boundary_us", state_boundary_us.crs)
print("pop_places", pop_places.crs)
country_boundary_us {'init': 'epsg:4326'}
state_boundary_us {'init': 'epsg:4326'}
pop_places {'init': 'epsg:4326'}
# Plot the data
fig, ax = plt.subplots(figsize=(12, 8))
country_boundary_us.plot(alpha=.5,
                         ax=ax)
state_boundary_us.plot(cmap='Greys',
                       ax=ax,
                       alpha=.5)
pop_places.plot(ax=ax)

plt.axis('equal')
ax.set_axis_off()
plt.show()
Plot showing populated places for the entire globe overlayed on top of the Continental United States. If your study area is the USA, then you might not need all of the additional points. In this instance, you can clip or crop your data.
Plot showing populated places for the entire globe overlayed on top of the Continental United States. If your study area is the USA, then you might not need all of the additional points. In this instance, you can clip or crop your data.

Clip The Points Shapefile in Python Using Geopandas

To remove the points that are outside of your study area, you can clip the data. Removing or clipping data can make the data smaller and inturn plotting and analysis faster.

Clip vector data.
When you clip a vector data set with another layer, you remove points, lines or polygons that are outside of the spatial extent of the area that you use to clip the data. This images shows a circular clip region - you will be using a rectangular region in this example. Image Source: ESRI

One way to clip a points layer is to:

  1. Create a mask where every point that overlaps the polygon that you wish to clip to is set to true
  2. Apply that mask to filter the geopandas dataframe.

To clip the data you first create a unified polygon object that represents the total area covered by your clip layer. If your study area contains only one polygon you can use boundary.geometry[0] to select the first (and only) polygon n the layer. You can also use .unary_union if you have many polygons in your clip boundary. unary.union will combine all of the polygons in your boundary layer into on vector object to use for clipping. Next you can use the .intersects() method to select just the points within the pop_places object that fall within the geometry in the poly object.

The .intersects() method returns a boolean mask. Every point that is within the poly object is set to True. Points that do not fall within the boundary are set to False. Finally, you subset the pop_places object

pop_places[pop_places.geometry.intersects(poly)]

What will be returned are just the points that fall within the polygon region.

The process for clipping points, lines and polygons is different. However, to streamline things, for this class, your instructor has created a clip_shp() function that you imported above as a module to use in this lesson.

import clip_data

If you wanted to clip data using geopandas you use the .intersection() method as follows:

# "clip" a points layer to the boundary of a polygon
poly = country_boundary_us.geometry.unary_union
points_clip = pop_places[pop_places.geometry.intersects(poly)]

However if you use the clip_shp() shape function, it will take care of all of these steps for you. Clip shape takes two arguments:

  • shp: the vector layer (point, line or polygon) that you wish to clip and
  • clip_obj: the polygon that you wish to use to clip your data

clip_shp() will clip the data to the boundary of the polygon layer that ou select. If there are multiple polygons in your clip_obj object, clip_shp() will clip the data to the total boundary of all polygons in the layer.

# Clip the data using the clip_data module
points_clip = cl.clip_shp(pop_places, country_boundary_us)
# View the first 6 rows and a few select columns
points_clip[['name', 'geometry', 'scalerank', 'natscale', ]].head()
namegeometryscaleranknatscale
175San FranciscoPOINT (-122.4171687735522 37.76919562968743)1300
176DenverPOINT (-104.9859618109682 39.7411339069655)1300
177HoustonPOINT (-95.34192514914599 29.82192024318886)1300
178MiamiPOINT (-80.22605193945003 25.78955655502153)1300
179AtlantaPOINT (-84.40189524187565 33.83195971260585)1300

Now you can plot the data to see the newly “clipped” points layer.

# Plot the data
fig, ax = plt.subplots(figsize=(12, 8))
country_boundary_us.plot(alpha=1,
                         color="white",
                         edgecolor="black",
                         ax=ax)
state_boundary_us.plot(cmap='Greys',
                       ax=ax,
                       alpha=.5)
points_clip.plot(ax=ax,
                 column='name')
ax.set_axis_off()
plt.axis('equal')

# Label each point - note this is just shown here optionally but is not required for your homework
points_clip.apply(lambda x: ax.annotate(s=x['name'],
                                        xy=x.geometry.coords[0],
                                        xytext=(6, 6), textcoords="offset points",
                                        backgroundcolor="white"),
                  axis=1)
plt.show()
Once you have clipped the points layer to your USA extent, you have fewer points to work with. This will make processing your data more efficient.
Once you have clipped the points layer to your USA extent, you have fewer points to work with. This will make processing your data more efficient.

Crop a Line or Polygon Layer to An Extent

The process for clipping a line or polygon layer is slightly different than clipping a set of points. To clip a line of polygon feature you will do the following:

  1. Ensure that your polygon and line layer are in the same coordinate reference system
  2. Identify what features in the lines layer fall WITHIN the boundary of the polygon layer
  3. Subset the features within the geometry and reset the geometry of the newly clipped layer to be equal to the clipped data.

This last step may seem unusual. When you clip data using shapely and geopandas the default behaviour is for it to only return the clipped geometry. However you may with to also retain the attributes associated with the geometry. This is where the set_geometry() methods comes into play.

For this example you will use the country_boundary layer and a clipped version of the natural earth 10m roads layer. * Import ne_10m_n_america_roads.shp into python.

  • Next, check to ensure that the roads and country boundary are in the same CRS. You may need to reproject the data.
  • Because spatial operations take time, it’s best if you subset your data as much as possible prior to clipping.
# Open the roads layer
ne_roads = gpd.read_file(
    'data/spatial-vector-lidar/global/ne_10m_roads/ne_10m_roads.shp')
# Are both layers in the same CRS?

if (ne_roads.crs == country_boundary_us.crs):
    print("Both layers are in the same crs!",
          ne_roads.crs, country_boundary_us.crs)
Both layers are in the same crs! {'init': 'epsg:4326'} {'init': 'epsg:4326'}

How to Clip Lines and Polygons in Python

In your dataset for this week you have 2 layers.

  1. A global, natural earth roads layer and
  2. A boundary for the United States.

The roads data are imported below. You imported the boundary layer above.

If both layers are in the same CRS, you are ready to clip your data. Because the clip functions are new and little testing has been done with them, you will see all of the lines of code required to clip your data. However you can use the clip_shp() function to clip your data for this week’s class!

Be patient while your clip code below runs.

To make your code run faster, you can simplify the geometry of your country boundary. Simplify should be used with caution as it does modify your data.

# Simplify the geometry of the clip extent for faster processing
# Use this with caution as it modifies your data.
country_boundary_us_sim = country_boundary_us.simplify(
    .2, preserve_topology=True)

Clip and plot the data. Be patient. It may take up to a minute to clip the data.

ne_roads_clip = cl.clip_shp(ne_roads, country_boundary_us_sim)
print("The clipped data have fewer line objects (represented by rows):",
      ne_roads_clip.shape, ne_roads.shape)
The clipped data have fewer line objects (represented by rows): (7346, 32) (56601, 32)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
ne_roads.plot(ax=ax1)
ne_roads_clip.plot(ax=ax2)

ax1.set_title("Unclipped roads")
ax2.set_title("Clipped roads")

ax1.set_axis_off()
ax2.set_axis_off()

plt.axis('equal')
plt.show()
Clipped vs unclipped roads layer.
Clipped vs unclipped roads layer.

Plot the cropped data.

# plot the data
fig, ax = plt.subplots(figsize=(12, 8))
country_boundary_us.plot(alpha=1,
                         color="white",
                         edgecolor="black",
                         ax=ax)
ne_roads_clip.plot(ax=ax)
ax.set_axis_off()
plt.axis('equal')
plt.show()
Final plot showing the clipped roads drawn on top of the country boundary. Notice that there are no road segments outside of the country boundary.
Final plot showing the clipped roads drawn on top of the country boundary. Notice that there are no road segments outside of the country boundary.

How Clip_shp() works:

Here are the steps involved with clipping data in geopandas - these steps are completed when you use the clip_shp() function which is provided to you as a .py script that you can import into this lesson as a module. They are simply described below just in case you ever need to clip data in python and that function doesn’t work for you.

  1. Subset the roads data using a spatial index.
  2. Clip the geometry using .intersection()
  3. Remove all rows in the geodataframe that have no geometry (this is explained below).
  4. Update the original roads layer to contained only the clipped geometry
# Create a single polygon object for clipping
poly = clip_obj.geometry.unary_union
spatial_index = shp.sindex

# Create a box for the initial intersection
bbox = poly.bounds
# Get a list of id's for each road line that overlaps the bounding box and subset the data to just those lines
sidx = list(spatial_index.intersection(bbox))
shp_sub = shp.iloc[sidx]

# Clip the data - with these data
clipped = shp_sub.copy()
clipped['geometry'] = shp_sub.intersection(poly)

# Return the clipped layer with no null geometry values
final_clipped = clipped[clipped.geometry.notnull()]

What Does Simplify Do?

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))

# a larger tolerance yields a blockier polygon
country_boundary_us.simplify(2, preserve_topology=True).plot(ax=ax1)

# a larger tolerance yields a blockier polygon
country_boundary_us.simplify(.2, preserve_topology=True).plot(ax=ax2)

ax1.set_title(
    "Data with a higher tolerance value will become visually blockier as there are fewer vertices")
ax2.set_title(
    "Data with a very low tolerance value will look smoother but will take longer to process")

ax1.set_axis_off()
ax2.set_axis_off()
plt.show()
Simplify removes vertices or points from a complex object effectively smoothing the data. Use this method with caution as it will impact any outcome statistics of your data.
Simplify removes vertices or points from a complex object effectively smoothing the data. Use this method with caution as it will impact any outcome statistics of your data.
# Plot the data by attribute
fig, ax = plt.subplots(figsize=(12, 8))
country_boundary_us.plot(alpha=1, 
                         color="white", 
                         edgecolor="black", 
                         ax=ax)
ne_roads_clip.plot(ax=ax, 
                   column='type', 
                   legend=True)
ax.set_axis_off()
plt.axis('equal')
plt.show()
The final clipped roads layer. When you use the default pandas plotting, your legend will have circles for each attribute type. If you want to represent line data using lines in the legend, then you will need to create a custom plot and legend as demonstrated below.
The final clipped roads layer. When you use the default pandas plotting, your legend will have circles for each attribute type. If you want to represent line data using lines in the legend, then you will need to create a custom plot and legend as demonstrated below.

Below, you create a custom legend. There are many different approaches to this. One is below.

To begin you create a python dictionary for each attribute value in your legend. Below you will see each road type has a dictionary entry and two associated values - a color and a number representing the width of the line in your legend.

'Beltway': ['black', 2] Color the line for beltway black with a line width of 2.

Next you loop through the dictionary to plot the data. In the loop below, you select each attribute value, and plot it using the color and line width assigned in the dictionary above.

for ctype, data in ne_roads_clip.groupby('type'):
    data.plot(color=road_attrs[ctype][0],
              label=ctype,
              linewidth=road_attrs[ctype][1],
              ax=ax)

Finally, a call to ax.legend() renders the legend using the colors applied in the loop above.

# Plot with a custom legend

# First, create a dictionary with the attributes of each legend item
road_attrs = {'Beltway': ['black', 2],
              'Secondary Highway': ['grey', .5],
              'Road': ['grey', .5],
              'Bypass': ['grey', .5],
              'Ferry Route': ['grey', .5],
              'Major Highway': ['black', 1]}

# plot the data
fig, ax = plt.subplots(figsize=(12, 8))

for ctype, data in ne_roads_clip.groupby('type'):
    data.plot(color=road_attrs[ctype][0],
              label=ctype,
              ax=ax,
              linewidth=road_attrs[ctype][1])

country_boundary_us.plot(alpha=1, color="white", edgecolor="black", ax=ax)
ax.legend(frameon=False)
ax.set_title("United States Roads by Type", fontsize=25)
ax.set_axis_off()
plt.axis('equal')
plt.show()
Currently geopandas does not support too much customization of plot legends. Thus if you want to create a custom legend, you will need to create a dictionary of colors and other legend attributes that you'll need to create the custom legend.
Currently geopandas does not support too much customization of plot legends. Thus if you want to create a custom legend, you will need to create a dictionary of colors and other legend attributes that you'll need to create the custom legend.
# Create function to clip point data using geopandas


def clip_points(shp, clip_obj):
    '''
    Docs Here
    '''

    poly = clip_obj.geometry.unary_union
    return(shp[shp.geometry.intersects(poly)])

# Create function to clip line and polygon data using geopandas


def clip_line_poly(shp, clip_obj):
    '''
    docs
    '''

    # Create a single polygon object for clipping
    poly = clip_obj.geometry.unary_union
    spatial_index = shp.sindex

    # Create a box for the initial intersection
    bbox = poly.bounds
    # Get a list of id's for each road line that overlaps the bounding box and subset the data to just those lines
    sidx = list(spatial_index.intersection(bbox))
    shp_sub = shp.iloc[sidx]

    # Clip the data - with these data
    clipped = shp_sub.copy()
    clipped['geometry'] = shp_sub.intersection(poly)

    # Return the clipped layer with no null geometry values
    return(clipped[clipped.geometry.notnull()])


# Final clip function that handles points, lines and polygons


def clip_shp(shp, clip_obj):
    '''
    '''
    if shp["geometry"].iloc[0].type == "Point":
        return(clip_points(shp, clip_obj))
    else:
        return(clip_line_poly(shp, clip_obj))

Updated:

Leave a Comment