# Lesson 4. How to Dissolve Polygons Using Geopandas: GIS in Python

## Learning Objectives

• Dissolve polygons based upon an attribute using geopandas in Python.

### Import Packages and Set Working Directory

# Import libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.colors import ListedColormap
import seaborn as sns
from shapely.geometry import box
import geopandas as gpd
import earthpy as et

sns.set(font_scale=1.5)
sns.set_style("white")

# Set working dir & get data
data = et.data.get_data('spatial-vector-lidar')
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))


### Import Data

To begin, import the following foud shapefiles using geopandas.

# Define base path as it is repeated below
base_path = os.path.join("data", "spatial-vector-lidar")

# Define file paths
country_boundary_path = os.path.join(base_path, "usa",
"usa-boundary-dissolved.shp")

state_boundary_path = os.path.join(base_path, "usa",
"usa-states-census-2014.shp")

pop_places_path = os.path.join(base_path, "global", "ne_110m_populated_places_simple",
"ne_110m_populated_places_simple.shp")

# Import the data


## Dissolve Polygons Based On an Attribute with Geopandas

Next, you will learn how to dissolve polygon data. Dissolving polygons entails combining polygons based upon a unique attribute value and removing the interior geometry.

Below you will dissolve the US states polygons by the region that each state is in. When you dissolve, you will create a new set polygons - one for each regions in the United States.

To begin, explore your data. Using .geom_type you can see that you have a mix of single and multi polygons in your data. Sometimes multi-polygons can cause problems when processing. It’s always good to check your geometry before you begin to better know what you are working with.

state_boundary_us.head()

006017797780400000US0606CACalifornia0040348382318120483271881West(POLYGON Z ((-118.593969 33.467198 0, -118.484...
111017023820400000US1111DCDistrict of Columbia0015835057818633500NortheastPOLYGON Z ((-77.119759 38.934343 0, -77.041017...
212002944780400000US1212FLFlorida0013890320085531407883551Southeast(POLYGON Z ((-81.81169299999999 24.568745 0, -...
313017053170400000US1313GAGeorgia001489635033994947080103SoutheastPOLYGON Z ((-85.605165 34.984678 0, -85.474338...
416017797830400000US1616IDIdaho002140454255492397728105WestPOLYGON Z ((-117.243027 44.390974 0, -117.2150...
state_boundary_us.geom_type.head()

0    MultiPolygon
1         Polygon
2    MultiPolygon
3         Polygon
4         Polygon
dtype: object


Next, select the columns that you with to use for the dissolve and keep. In this case we want to retain the:

• geometry

columns.

state_boundary = state_boundary_us[['LSAD', 'geometry']]

# View the resulting geodataframe
cont_usa

geometry
00(POLYGON Z ((-81.81169299999999 24.568745 0, -...

And finally, plot the data. Note that when you dissolve, the column used to perform the dissolve becomes an index for the resultant geodataframe. Thus you will have to use the reset_index() method when you plot, to access the region “column”.

So this will work:

us_regions.reset_index().plot(column = 'region', ax=ax)

This will return an error as region is no longer a column, it is an index! us_regions.plot(column = 'region', ax=ax)

cont_usa.reset_index()

000(POLYGON Z ((-81.81169299999999 24.568745 0, -...
# Plot the data
fig, ax = plt.subplots(figsize=(10, 6))
ax=ax)
ax.set_axis_off()
plt.axis('equal')
plt.show()


## Dissolve and Aggregate Data

In the example above, you dissolved the state level polygons to a region level. However you did not aggregate or summarize the attributes associated with each polygon. Next, you will learn how to aggregate quantitative values in your attribute table when you perform a dissolve. To do this, you will add

aggfunc = 'fun-here'

to your dissolve call. You can choice a suite of different summary functions including:

• first
• last
• mean
• max

Below the data are aggregated by the ‘sum’ method. this means that the values for ALAND are added up for all of the states in a region. That summary sum value will be returned in the new dataframe.

# Select the columns that you wish to retain in the data
state_boundary = state_boundary_us[['region', 'geometry', 'ALAND', 'AWATER']]

# Then summarize the quantative columns by 'sum'
regions_agg = state_boundary.dissolve(by='region', aggfunc='sum')
regions_agg

geometryALANDAWATER
region
Midwest(POLYGON Z ((-82.863342 41.693693 0, -82.82571...1943869253244184383393833
Northeast(POLYGON Z ((-76.04621299999999 38.025533 0, -...869066138232108922434345
Southeast(POLYGON Z ((-81.81169299999999 24.568745 0, -...1364632039655103876652998
SouthwestPOLYGON Z ((-94.48587499999999 33.637867 0, -9...146263153099724217682268
West(POLYGON Z ((-118.594033 33.035951 0, -118.540...243233644473057568049509
# Convert area units from square meters to hectares (divide by 10,000)
regions_agg["land_ha"] = regions_agg["ALAND"] / 10000
regions_agg["water_ha"] = regions_agg["AWATER"] / 10000
regions_agg

geometryALANDAWATERland_hawater_ha
region
Midwest(POLYGON Z ((-82.863342 41.693693 0, -82.82571...19438692532441843833938331.943869e+081.843834e+07
Northeast(POLYGON Z ((-76.04621299999999 38.025533 0, -...8690661382321089224343458.690661e+071.089224e+07
Southeast(POLYGON Z ((-81.81169299999999 24.568745 0, -...13646320396551038766529981.364632e+081.038767e+07
SouthwestPOLYGON Z ((-94.48587499999999 33.637867 0, -9...1462631530997242176822681.462632e+082.421768e+06
West(POLYGON Z ((-118.594033 33.035951 0, -118.540...2432336444730575680495092.432336e+085.756805e+06
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))

regions_agg.plot(column='land_ha',
legend=True,
scheme="quantiles",
ax=ax1)

regions_agg.plot(column='water_ha',
scheme="quantiles",
legend=True,
ax=ax2)

plt.suptitle('Census Data - Total Land and Water by Region (Hectares)', fontsize=16)
leg = ax1.get_legend()
leg.set_bbox_to_anchor((1.5,1))

leg = ax2.get_legend()
leg.set_bbox_to_anchor((1.5,1))
ax1.set_axis_off()
ax2.set_axis_off()

plt.axis('equal')
plt.show()