Lesson 7. How to Dissolve Polygons Using Geopandas: GIS in Python


Learning Objectives

After completing this tutorial, you will be able to:

  • Dissolve polygons based upon an attribute using geopandas in Python.

What You Need

You will need a computer with internet access to complete this lesson and the spatial-vector-lidar data subset created for the course.

Download Spatial Lidar Teaching Data Subset data

or using the earthpy package:

et.data.get_data("spatial-vector-lidar")

Get started by loading your libraries. And be sure that your working directory is set.

# import libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.colors import ListedColormap
import geopandas as gpd
import earthpy as et
from shapely.geometry import box

plt.ion()

os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

Import Data

To begin, import the following foud shapefiles using geopandas.

# Define base path as it is repeated below
base_path = os.path.join("data", "spatial-vector-lidar")

# Define file paths
country_boundary_path = base_path + "/usa/usa-boundary-dissolved.shp"
state_boundary_path = base_path + "/usa/usa-states-census-2014.shp"
pop_places_path = base_path + \
    "/global/ne_110m_populated_places_simple/ne_110m_populated_places_simple.shp"

# Import the data
country_boundary_us = gpd.read_file(country_boundary_path)
state_boundary_us = gpd.read_file(state_boundary_path)
pop_places = gpd.read_file(pop_places_path)

Dissolve Polygons Based On an Attribute with Geopandas

Next, you will learn how to dissolve polygon data. Dissolving polygons entails combining polygons based upon a unique attribute value and removing the interior geometry.

When you dissolve polygons you remove interior boundaries of a set of polygons with the same attribute value and create one new merged or combined polygon for each attribute value. In the case above US states are dissolved to regions in the United States. Source: ESRI
When you dissolve polygons you remove interior boundaries of a set of polygons with the same attribute value and create one new "merged" or combined polygon for each attribute value. In the case above US states are dissolved to regions in the United States. Source: ESRI

Below you will dissolve the US states polygons by the region that each state is in. When you dissolve, you will create a new set polygons - one for each regions in the United States.

To begin, explore your data. Using .geom_type you can see that you have a mix of single and multi polygons in your data. Sometimes multi-polygons can cause problems when processing. It’s always good to check your geometry before you begin to better know what you are working with.

state_boundary_us.head()
STATEFPSTATENSAFFGEOIDGEOIDSTUSPSNAMELSADALANDAWATERregiongeometry
006017797780400000US0606CACalifornia0040348382318120483271881West(POLYGON Z ((-118.593969 33.467198 0, -118.484...
111017023820400000US1111DCDistrict of Columbia0015835057818633500NortheastPOLYGON Z ((-77.119759 38.934343 0, -77.041017...
212002944780400000US1212FLFlorida0013890320085531407883551Southeast(POLYGON Z ((-81.81169299999999 24.568745 0, -...
313017053170400000US1313GAGeorgia001489635033994947080103SoutheastPOLYGON Z ((-85.605165 34.984678 0, -85.474338...
416017797830400000US1616IDIdaho002140454255492397728105WestPOLYGON Z ((-117.243027 44.390974 0, -117.2150...
state_boundary_us.geom_type.head()
0    MultiPolygon
1         Polygon
2    MultiPolygon
3         Polygon
4         Polygon
dtype: object

Next, select the columns that you with to use for the dissolve and keep. In this case we want to retain the:

  • LSAD
  • geometry

columns.

state_boundary = state_boundary_us[['LSAD', 'geometry']]
cont_usa = state_boundary.dissolve(by='LSAD')
# View the resulting geodataframe
cont_usa
geometry
LSAD
00(POLYGON Z ((-81.81169299999999 24.568745 0, -...

And finally, plot the data. Note that when you dissolve, the column used to perform the dissolve becomes an index for the resultant geodataframe. Thus you will have to use the reset_index() method when you plot, to access the region “column”.

So this will work:

us_regions.reset_index().plot(column = 'region', ax=ax)

This will return an error as region is no longer a column, it is an index! us_regions.plot(column = 'region', ax=ax)

cont_usa.reset_index()
LSADgeometry
000(POLYGON Z ((-81.81169299999999 24.568745 0, -...
# plot the data
fig, ax = plt.subplots(figsize=(10, 6))
cont_usa.reset_index().plot(column='LSAD',
                            ax=ax)
ax.set_axis_off()
plt.axis('equal')
plt.show() 
The LSAD attribute value for every polygon in the data is 00. Thus when you dissolve by that attribute, you get one resulting polygon.
The LSAD attribute value for every polygon in the data is 00. Thus when you dissolve by that attribute, you get one resulting polygon.

Dissolve and Aggregate Data

In the example above, you dissolved the state level polygons to a region level. However you did not aggregate or summarize the attributes associated with each polygon. Next, you will learn how to aggregate quantitative values in your attribute table when you perform a dissolve. To do this, you will add

aggfunc = 'fun-here'

to your dissolve call. You can choice a suite of different summary functions including:

  • first
  • last
  • mean
  • max

And more. Read more about the dissolve function here.

Below the data are aggregated by the ‘sum’ method. this means that the values for ALAND are added up for all of the states in a region. That summary sum value will be returned in the new dataframe.

# Select the columns that you wish to retain in the data
state_boundary = state_boundary_us[['region', 'geometry', 'ALAND', 'AWATER']]
# Then summarize the quantative columns by 'sum'
regions_agg = state_boundary.dissolve(by='region', aggfunc='sum')
regions_agg
geometryALANDAWATER
region
Midwest(POLYGON Z ((-82.863342 41.693693 0, -82.82571...1943869253244184383393833
Northeast(POLYGON Z ((-76.04621299999999 38.025533 0, -...869066138232108922434345
Southeast(POLYGON Z ((-81.81169299999999 24.568745 0, -...1364632039655103876652998
SouthwestPOLYGON Z ((-94.48587499999999 33.637867 0, -9...146263153099724217682268
West(POLYGON Z ((-118.594033 33.035951 0, -118.540...243233644473057568049509
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
regions_agg.plot(column='ALAND',
                 legend=True,
                 scheme='quantiles',
                 ax=ax1)
regions_agg.plot(column='AWATER',
                 scheme='quantiles',
                 legend=True,
                 ax=ax2)

plt.suptitle('Census Data - Total Land and Water by Region', fontsize=16)
ax1.set_axis_off()
ax2.set_axis_off()

plt.axis('equal')

plt.show()
/Users/lewa8222/anaconda3/envs/earth-analytics-python/lib/python3.6/site-packages/pysal/__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://pysal.org/about
  ), VisibleDeprecationWarning)
In this example, you dissolved by region. There are 5 unique region values in the attributes. Thus you end up with 5 polygons. You also summarized attributes by ALAND and AWATER calculating the total value for each.
In this example, you dissolved by region. There are 5 unique region values in the attributes. Thus you end up with 5 polygons. You also summarized attributes by ALAND and AWATER calculating the total value for each.

Leave a Comment