Lesson 8. How to Join Attributes From One Shapefile to Another in Open Source Python Using Geopandas: GIS in Python


Learning Objectives

After completing this tutorial, you will be able to:

  • Join spatial attributes from one shapefile to another using geopandas in python.
  • Calculate line segment length using geopandas

What You Need

You will need a computer with internet access to complete this lesson and the spatial-vector-lidar data subset created for the course.

Download Spatial Lidar Teaching Data Subset data

or using the earthpy package:

et.data.get_data("spatial-vector-lidar")

Get started by loading your libraries and setting your working directory.

# Import libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.colors import ListedColormap
import pandas as pd
import geopandas as gpd
import earthpy as et 
# Plot figures  inline
plt.ion()
from shapely.geometry import box

os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

# Import clip_data.py as a module so you can access the clip data functions
import clip_data as cl
# Import data
country_bound_us = gpd.read_file('data/spatial-vector-lidar/usa/usa-boundary-dissolved.shp')
state_bound_us = gpd.read_file('data/spatial-vector-lidar/usa/usa-states-census-2014.shp')
pop_places = gpd.read_file('data/spatial-vector-lidar/global/ne_110m_populated_places_simple/ne_110m_populated_places_simple.shp')
ne_roads = gpd.read_file('data/spatial-vector-lidar/global/ne_10m_roads/ne_10m_roads.shp')

Next dissolve the state data by region like you did in the previous lesson.

# Simplify the country boundary just a little bit to make this run faster
country_bound_us_simp = country_bound_us.simplify(.2, preserve_topology=True)
# Clip the roads to the US boundary - this will take about a minute to execute
roads_cl = cl.clip_shp(ne_roads, country_bound_us_simp)
# Dissolve states by region
regions_agg = state_bound_us.dissolve(by="region")

Spatial Joins in Python

Just like you might do in ArcMap or QGIS you can perform spatial joins in Python too. A spatial join is when you append the attributes of one layer to another based upon its spatial relationship.

So - for example if you have a roads layer for the United States, and you want to apply the “region” attribute to every road that is spatially in a particular region, you would use a spatial join. To apply a join you can use the geopandas.sjoin() function as following:

.sjoin(layer-to-add-region-to, region-polygon-layer)

Sjoin Arguments:

The op argument specifies the type of join that will be applied

  • intersects: Returns True if the boundary and interior of the object intersect in any way with those of the other.
  • within: Returns True if the object’s boundary and interior intersect only with the interior of the other (not its boundary or exterior).
  • contains: Returns True if the object’s interior contains the boundary and interior of the other object and their boundaries do not touch at all.

You can read more about each type here.

How allows the following options: (this is taken directly from the geopandas code on github!

  • ‘left’: use keys from left_df; retain only left_df geometry column
  • ‘right’: use keys from right_df; retain only right_df geometry column
  • ‘inner’: use intersection of keys from both dfs; retain only left_df geometry column
# Roads within region
roads_region = gpd.sjoin(roads_cl, 
                         regions_agg, 
                         how="inner", 
                         op='intersects')
# Notice once you have joins the data - you have attributes from the regions_object (ie the region) 
# attached to each road feature
roads_region[["featurecla", "index_right", "ALAND"]].head()
featureclaindex_rightALAND
1453RoadWest403483823181
1434RoadWest403483823181
1492RoadWest403483823181
50512RoadWest403483823181
49677RoadWest403483823181

Plot the data.

# Reproject to Albers for plotting
country_albers = country_bound_us.to_crs({'init': 'epsg:5070'})
roads_albers = roads_region.to_crs({'init': 'epsg:5070'})
# Plot the data
fig, ax = plt.subplots(figsize=(12, 8))
country_albers.plot(alpha=1,
                    facecolor="none",
                    edgecolor="black",
                    zorder=10,
                    ax=ax)
roads_albers.plot(column='index_right',
                  ax=ax,
                  legend=True)
ax.set_axis_off()
plt.axis('equal')
plt.show()
Plot of roads colored by region with a standard geopandas legend.
Plot of roads colored by region with a standard geopandas legend.

If you want to customize your legend even further, you can once again use loops to do so.

# First, create a dictionary with the attributes of each legend item
road_attrs = {'Midwest': ['black'],
              'Northeast': ['grey'],
              'Southeast': ['m'],
              'Southwest': ['purple'],
              'West': ['green']}

# Plot the data
fig, ax = plt.subplots(figsize=(12, 8))
regions_agg.plot(edgecolor="black",
                 ax=ax)
country_albers.plot(alpha=1,
                    facecolor="none",
                    edgecolor="black",
                    zorder=10,
                    ax=ax)

for ctype, data in roads_albers.groupby('index_right'):
    data.plot(color=road_attrs[ctype][0],
              label=ctype,
              ax=ax)
plt.legend(bbox_to_anchor=(1.0, 1), loc=2)
ax.set_axis_off()
plt.axis('equal')
plt.show()
Plot of roads colored by region with a custom legend.
Plot of roads colored by region with a custom legend.

Calculate Line Segment Length

# Turn off scientific notation
pd.options.display.float_format = '{:.4f}'.format

# Calculate the total length of road 
road_albers_length = roads_albers[['index_right', 'length_km']]
# Sum existing columns
roads_albers.groupby('index_right').sum()

roads_albers['rdlength'] = roads_albers.length
sub = roads_albers[['rdlength', 'index_right']].groupby('index_right').sum()
sub
rdlength
index_right
Midwest86575020.6373
Northeast33786036.8608
Southeast84343077.8904
Southwest49373104.8209
West61379830.5534

Leave a Comment