Lesson 4. Programmatically Accessing Geospatial Data Using APIs

Leah Wasser, Max Joseph, Martha Morrissey, Jenny Palomino, Carson Farmer

Learning Objectives

After completing this tutorial, you will be able to:

Extract geospatial (x,y) coordinate information embedded within a JSON hierarchical data structure.
Convert data imported in JSON format into a Geopandas DataFrame.
Create a map of geospatial data.

What You Need

You will need a computer with internet access to complete this lesson.

In this lesson, you work with JSON data accessed via the Colorado information warehouse. The data will contain geospatial information nested within it that will allow us to create a map of the data.

Working with Geospatial Data

Check out the map Colorado DWR Current Surface Water Conditions map.

Remember from the previous lesson, APIs can be used for many different things. Web developers (people who program and create web sites and cool applications) can use APIs to create user friendly interfaces - like the map in the previous example that allows us to look at and interact with data. These APIs are similar to, if not the same as, the ones that you often use to access data in Python.

In this lesson, you will access the data used to create the map at the link above using Python.

The data that you will use are located here: View JSON format data used to create surface water map.
And you can learn more about the data here: View CO Current water surface .

import requests
import folium
import urllib
from pandas.io.json import json_normalize
import pandas as pd
import folium
from geopandas import GeoDataFrame
from shapely.geometry import Point

# Get URL
water_base_url = "https://data.colorado.gov/resource/j5pc-4t32.json?"
water_full_url = water_base_url + "station_status=Active" + "&county=BOULDER"

ATTENTION WINDOWS USERS: We have noticed a bug where on windows machines, sometimes the https URL doesn’t work. Instead try the same url as above but without the s - like this: water_base_url = "http://data.colorado.gov/resource/j5pc-4t32.json?" This change has resolved many issues on windows machines so give it a try if you are having problems with the API.

water_full_url

'https://data.colorado.gov/resource/j5pc-4t32.json?station_status=Active&county=BOULDER'

data = requests.get(water_full_url)

type(data.json())

list

Remember that the JSON structure supports hierarchical data and can be NESTED. If you look at the structure of the .json file below, you can see that the location object, is nested with three sub objects:

latitude
longitude
needs_recoding

Since data.json() is a list you can print out just the first few items of the list to look at your data as a sanity check.

data.json()[:2]

[{'station_name': 'SAINT VRAIN CREEK AT HYGIENE, CO',
  'div': '1',
  'location': {'latitude': '40.177423',
   'needs_recoding': False,
   'longitude': '-105.178145'},
  'dwr_abbrev': 'SVCHGICO',
  'data_source': 'Co. Division of Water Resources',
  'amount': '29.80',
  'station_type': 'Stream',
  'wd': '5',
  'http_linkage': {'url': 'https://dwr.state.co.us/Tools/Stations/SVCHGICO'},
  'date_time': '2020-09-11T10:45:00.000',
  'county': 'BOULDER',
  'variable': 'DISCHRG',
  'stage': '2.18',
  'station_status': 'Active'},
 {'station_name': 'HIGHLAND DITCH AT LYONS, CO',
  'div': '1',
  'location': {'latitude': '40.215043',
   'needs_recoding': False,
   'longitude': '-105.256017'},
  'dwr_abbrev': 'HIGHLDCO',
  'data_source': 'Co. Division of Water Resources',
  'amount': '82.00',
  'station_type': 'Diversion',
  'wd': '5',
  'http_linkage': {'url': 'https://dwr.state.co.us/Tools/Stations/HIGHLDCO'},
  'date_time': '2020-09-11T10:30:00.000',
  'county': 'BOULDER',
  'variable': 'DISCHRG',
  'stage': '1.18',
  'station_status': 'Active'}]

Convert JSON to Pandas DataFrame

Now that you have pulled down the data from the website, you have it in the JSON format. For the next step, you will use the json_normalize() function from the Pandas library to convert this data into a Pandas DataFrame.

This function helps organize and flatten data into a semi-structed table. To learn more, check out the documentation!

from pandas.io.json import json_normalize

result = json_normalize(data.json())

<ipython-input-8-0f1497b7b5ff>:1: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead
  result = json_normalize(data.json())

result.head()

	station_name	div	dwr_abbrev	data_source	amount	station_type	wd	date_time	county	variable	stage	station_status	location.latitude	location.needs_recoding	location.longitude	http_linkage.url	usgs_station_id
0	SAINT VRAIN CREEK AT HYGIENE, CO	1	SVCHGICO	Co. Division of Water Resources	29.80	Stream	5	2020-09-11T10:45:00.000	BOULDER	DISCHRG	2.18	Active	40.177423	False	-105.178145	https://dwr.state.co.us/Tools/Stations/SVCHGICO	NaN
1	HIGHLAND DITCH AT LYONS, CO	1	HIGHLDCO	Co. Division of Water Resources	82.00	Diversion	5	2020-09-11T10:30:00.000	BOULDER	DISCHRG	1.18	Active	40.215043	False	-105.256017	https://dwr.state.co.us/Tools/Stations/HIGHLDCO	NaN
2	SOUTH BOULDER CREEK BELOW GROSS RESERVOIR	1	BOCBGRCO	Co. Division of Water Resources	87.80	Stream	6	2020-09-11T11:15:00.000	BOULDER	DISCHRG	0.94	Active	39.938324	False	-105.347953	https://dwr.state.co.us/Tools/Stations/BOCBGRCO	06729450
3	LEYNER COTTONWOOD DITCH	1	LCWDITCO	Co. Division of Water Resources	0.00	Diversion	6	2020-09-11T11:00:00.000	BOULDER	DISCHRG	NaN	Active	40.02168	False	-105.166113	https://dwr.state.co.us/Tools/Stations/LCWDITCO	NaN
4	BOULDER CREEK SUPPLY CANAL TO BOULDER CREEK NE...	1	BCSCBCCO	Northern Water	104.35	Diversion	6	2020-09-11T10:00:00.000	BOULDER	DISCHRG	1.81	Active	40.053035	False	-105.193048	https://dwr.state.co.us/Tools/Stations/BCSCBCCO	ES1917

type(result)

pandas.core.frame.DataFrame

result.columns

Index(['station_name', 'div', 'dwr_abbrev', 'data_source', 'amount',
       'station_type', 'wd', 'date_time', 'county', 'variable', 'stage',
       'station_status', 'location.latitude', 'location.needs_recoding',
       'location.longitude', 'http_linkage.url', 'usgs_station_id'],
      dtype='object')

Data Cleaning for Visualization

Now you can clean up the data. Notice that your longitude and latitude values are stored as strings. Do you think can create a map if these values are stored as strings?

result['location.latitude'][0]

'40.177423'

You can convert the strings to type float as follows.

result['location.latitude'] = result['location.latitude'].astype(float)

result['location.latitude'][0]

40.177423

result['location.longitude'] = result['location.longitude'].astype(float)

result['location.longitude'][0]

-105.178145

Now that you have numeric values for mapping, make sure that are are no missing values.

result.shape

(72, 17)

result['location.longitude'].isna().any()

False

result['location.latitude'].isna().any()

False

There are no nan values in this data. However, if there were, you could remove rows where a column has a nan value in a specific column with the following: result_nonan = result.dropna(subset=['location.longitude', 'location.latitude'])

Data Visualization

You will use the folium package to visualize the data. One approach you could take would be to convert your Pandas DataFrame to a Geopandas DataFrame for easy mapping.

geometry = [Point(xy) for xy in zip(result['location.longitude'], result['location.latitude'])]
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(result, crs=crs, geometry=geometry)

/opt/conda/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

gdf.head()

	station_name	div	dwr_abbrev	data_source	amount	station_type	wd	date_time	county	variable	stage	station_status	location.latitude	location.needs_recoding	location.longitude	http_linkage.url	usgs_station_id	geometry
0	SAINT VRAIN CREEK AT HYGIENE, CO	1	SVCHGICO	Co. Division of Water Resources	29.80	Stream	5	2020-09-11T10:45:00.000	BOULDER	DISCHRG	2.18	Active	40.177423	False	-105.178145	https://dwr.state.co.us/Tools/Stations/SVCHGICO	NaN	POINT (-105.17815 40.17742)
1	HIGHLAND DITCH AT LYONS, CO	1	HIGHLDCO	Co. Division of Water Resources	82.00	Diversion	5	2020-09-11T10:30:00.000	BOULDER	DISCHRG	1.18	Active	40.215043	False	-105.256017	https://dwr.state.co.us/Tools/Stations/HIGHLDCO	NaN	POINT (-105.25602 40.21504)
2	SOUTH BOULDER CREEK BELOW GROSS RESERVOIR	1	BOCBGRCO	Co. Division of Water Resources	87.80	Stream	6	2020-09-11T11:15:00.000	BOULDER	DISCHRG	0.94	Active	39.938324	False	-105.347953	https://dwr.state.co.us/Tools/Stations/BOCBGRCO	06729450	POINT (-105.34795 39.93832)
3	LEYNER COTTONWOOD DITCH	1	LCWDITCO	Co. Division of Water Resources	0.00	Diversion	6	2020-09-11T11:00:00.000	BOULDER	DISCHRG	NaN	Active	40.021680	False	-105.166113	https://dwr.state.co.us/Tools/Stations/LCWDITCO	NaN	POINT (-105.16611 40.02168)
4	BOULDER CREEK SUPPLY CANAL TO BOULDER CREEK NE...	1	BCSCBCCO	Northern Water	104.35	Diversion	6	2020-09-11T10:00:00.000	BOULDER	DISCHRG	1.81	Active	40.053035	False	-105.193048	https://dwr.state.co.us/Tools/Stations/BCSCBCCO	ES1917	POINT (-105.19305 40.05304)

Then, you can plot the data using the folium functions GeoJson() and add_to() to add the data from the Geopandas DataFrame to the map object.

m = folium.Map([40.01, -105.27], zoom_start= 10, tiles='cartodbpositron')
folium.GeoJson(gdf).add_to(m)

m

Great! You now have an interactive map in your notebook!

You can also cluster the markers, and add a popup to each marker, so you can give your viewers more information about station: such as its name and the amount of precipitation measured.

For this example below, you will work with the Pandas DataFrame you originally created from the JSON, instead of the Geopandas GeoDataFrame.

# Get the latitude and longitude from result as a list
locations = result[['location.latitude', 'location.longitude']]
coords = locations.values.tolist()

from folium.plugins import MarkerCluster

m = folium.Map([40.01, -105.27], zoom_start= 10, tiles='cartodbpositron')

marker_cluster = MarkerCluster().add_to(m)

for point in range(0, len(coords)):
    folium.Marker(location = coords[point], popup= 'Name: ' + result['station_name'][point] + ' ' + 'Precip: ' + str(result['amount'][point])).add_to(marker_cluster)

m

Additional Resources

JSON Data in Python

Share on

Twitter Facebook Google+ LinkedIn

Earth Data Analytics Online Certificate

Intermediate earth data science textbook

intermediate-earth-data-science-textbook Home

Lesson 4. Programmatically Accessing Geospatial Data Using APIs

Learning Objectives

What You Need

Working with Geospatial Data

Convert JSON to Pandas DataFrame

Data Cleaning for Visualization

Data Visualization

Additional Resources

Share on

Leave a Comment

You May Also Enjoy

Plot Data With Matplotlib

Calculate Seasonal Summary Values from Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

Calculate Summary Values Using Spatial Areas of Interest (AOIs) including Shapefiles for Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

How to Open and Process NetCDF 4 Data Format in Open Source Python