After completing this tutorial, you will be able to:
work with data sets that have missing data
replace missing data values
What You Need
You will need a computer with internet access to complete this lesson and the spatial-vector-lidar data subset created for the course.
or using the
import os import pandas as pd import numpy as np import geopandas as gpd import earthpy as et os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
This lesson covers how to rename and clean up attribute data using
# Import roads shapefile sjer_roads = gpd.read_file("data/spatial-vector-lidar/california/madera-county-roads/tl_2013_06039_roads.shp") type(sjer_roads)
Explore Data Values
There are several ways to use
pandas to explore your data and determine if you have any missing values.
- To find the number of missing values per column in a DataFrame you can run
- Look at the unique values for a specific column of a DataFrame
LINEARID 0 FULLNAME 5149 RTTYP 5149 MTFCC 0 geometry 0 dtype: int64
Based on this method there are no
None type obejcts as values in the
geodataframe. Double check the unique values in the road type column.
# View data type print(type(sjer_roads['RTTYP'])) # View unique attributes for each road in the data print(sjer_roads['RTTYP'].unique())
<class 'pandas.core.series.Series'> ['M' None 'S' 'C']
If the value you want to replace is a
Nonetypeyou can use
dfname.loc[dfname['column'].isnull(), 'column' = 'newvaluu'
Or you can use the
.fillna()method and .
fullnatakes in the value that you want to replace.
Hmmmm there’s a road type that’s given an empty
string as a name. It would be helpful to fix this before doing more analyis or mapping with this dataset.
There are several ways to deal with this issue. One is to use the
.replace method to replace all instances of None in the attribute data with some new value. In this case, you will use - ‘Unknown’.
# Map each value to a new value sjer_roads["RTTYP"] = sjer_roads["RTTYP"].fillna("Unknown") print(sjer_roads['RTTYP'].unique())
['M' 'Unknown' 'S' 'C']
Alternatively you can use the
.isnull() function to select all attribute cells with a value equal to
null and set those to ‘Unknown’.
If the value you want to change is not
NaN or a
Nonetype then you will have to specify the origina value that you want to change, as shown below.
|0||110454239066||N 14th St||M||S1400||LINESTRING (-120.272267 37.116151, -120.27244 ...|
|1||110454239052||N 11th St||M||S1400||LINESTRING (-120.267877 37.116672, -120.268072...|
|2||110454239056||N 12th St||M||S1400||LINESTRING (-120.27053 37.117494, -120.270448 ...|
|3||110454239047||N 10th St||M||S1400||LINESTRING (-120.267028 37.11734599999999, -12...|
|4||110454243091||N Westberry Blvd||M||S1400||LINESTRING (-120.101219 36.96524099999999, -12...|
In some specific instances you will want to remove
NaN values from your
DataFrame, to do this you can use the
.dropna function, note that this function will remove all rows from the dataframe that have a
Nan value in any of the columns.