Lesson 9. Handle missing spatial attribute data Python: GIS in Python


Learning Objectives

After completing this tutorial, you will be able to:

  • work with data sets that have missing data

  • replace missing data values

What You Need

You will need a computer with internet access to complete this lesson and the spatial-vector-lidar data subset created for the course.

Download Spatial Lidar Teaching Data Subset data

or using the earthpy package:

et.data.get_data("spatial-vector-lidar")

import os
import pandas as pd
import numpy as np
import geopandas as gpd
import earthpy as et 
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

This lesson covers how to rename and clean up attribute data using geopandas.

# Import roads shapefile
sjer_roads = gpd.read_file("data/spatial-vector-lidar/california/madera-county-roads/tl_2013_06039_roads.shp")
type(sjer_roads)
geopandas.geodataframe.GeoDataFrame

Explore Data Values

There are several ways to use pandas to explore your data and determine if you have any missing values.

  • To find the number of missing values per column in a DataFrame you can run dfname.is_null().sum()
  • Look at the unique values for a specific column of a DataFrame dfname['column'].unique()
sjer_roads.isnull().sum()
LINEARID       0
FULLNAME    5149
RTTYP       5149
MTFCC          0
geometry       0
dtype: int64

Based on this method there are no NaN or None type obejcts as values in the geodataframe. Double check the unique values in the road type column.

# View data type 
print(type(sjer_roads['RTTYP']))
# View unique attributes for each road in the data
print(sjer_roads['RTTYP'].unique())
<class 'pandas.core.series.Series'>
['M' None 'S' 'C']

Replacing Values

  • If the value you want to replace is a Nan or Nonetype you can use dfname.loc[dfname['column'].isnull(), 'column' = 'newvaluu'

  • Or you can use the pandas .fillna() method and .fullna takes in the value that you want to replace.

Hmmmm there’s a road type that’s given an empty string as a name. It would be helpful to fix this before doing more analyis or mapping with this dataset.

There are several ways to deal with this issue. One is to use the .replace method to replace all instances of None in the attribute data with some new value. In this case, you will use - ‘Unknown’.

# Map each value to a new value 
sjer_roads["RTTYP"] = sjer_roads["RTTYP"].fillna("Unknown")
print(sjer_roads['RTTYP'].unique())
['M' 'Unknown' 'S' 'C']

Alternatively you can use the .isnull() function to select all attribute cells with a value equal to null and set those to ‘Unknown’.

If the value you want to change is not NaN or a Nonetype then you will have to specify the origina value that you want to change, as shown below.

sjer_roads.head()
LINEARIDFULLNAMERTTYPMTFCCgeometry
0110454239066N 14th StMS1400LINESTRING (-120.272267 37.116151, -120.27244 ...
1110454239052N 11th StMS1400LINESTRING (-120.267877 37.116672, -120.268072...
2110454239056N 12th StMS1400LINESTRING (-120.27053 37.117494, -120.270448 ...
3110454239047N 10th StMS1400LINESTRING (-120.267028 37.11734599999999, -12...
4110454243091N Westberry BlvdMS1400LINESTRING (-120.101219 36.96524099999999, -12...

Removing Values

In some specific instances you will want to remove NaN values from your DataFrame, to do this you can use the pandas .dropna function, note that this function will remove all rows from the dataframe that have a Nan value in any of the columns.

Updated:

Leave a Comment