Lesson 3. How to Open and Process NetCDF 4 Data Format in Open Source Python


Open NETCDF 4 Climate Data in Open Source Python Using Xarray

In this chapter, you will learn how to work with Climate Data Sets (MACA v2 for the United states) stored in netcdf 4 format using open source Python.

Learning Objectives

After completing this chapter, you will be able to:

  • Download MACA v2 climate data in netcdf 4 format
  • Open and process netcdf4 data using xarray
  • Export climate data in tabular format to .csv format

What You Need

OPTIONAL: If you want to explore the netcdf 4 files in a graphics based tool, you can download th free HDF viewer from the HDF Group website.

import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# netcdf4 needs to be installed in your environment for this to work
import xarray as xr
import rioxarray
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import seaborn as sns
import geopandas as gpd
import earthpy as et

# Plotting options
sns.set(font_scale=1.3)
sns.set_style("white")

# Optional - set your working directory if you wish to use the data
# accessed lower down in this notebook (the USA state boundary data)
os.chdir(os.path.join(et.io.HOME,
                      'earth-analytics',
                      'data'))

Get Started wtih MACA Version 2 Data Using Open Source Python

In this lesson you will work with historic projected MACA 2 data that represents maximum monthly temperature for the Continental United States (CONUS).

agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_historical_1950_2005_CONUS_monthly

The file name itself tells you a lot about the data.

  1. macav2metdata: the data are the MACA version two data which are downsampled to the extent of the continental United S tates
  2. tasmax: Max temperature is the parameter contained within the data
  3. BNU-ESM: This is the climate (CLM) model used to generate the data
  4. historical: these data are the modeled historical values for the years 1950 - 2005
  5. CONUS: These data are for the CONtinental United States boundary
  6. monthly: These data are aggregated monthly (rather than daily)

Below, you will learn how to open and work with MACA 2 data using open source Python tools. You will use the xarray package which requires the netcdf4 package to work with netcdf data. The most current earth-analytics-python environment contains all of the packages that you need to complete this tutorial.

To begin, you open up the data using xarray.open_dataset.

# The (online) url for the MACAv2 data
data_path = "http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_historical_1950_2005_CONUS_monthly.nc"

# Open the data using a context manager
with xr.open_dataset(data_path) as file_nc:
    # You can open the data using the code below, however it you use rio.write_crs
    # it will ensure that crs for your data persist throughout your analysis
    # max_temp_xr = file_nc
    max_temp_xr = file_nc.rio.write_crs(file_nc.rio.crs, inplace=True)

# View xarray object
max_temp_xr
<xarray.Dataset>
Dimensions:          (lat: 585, lon: 1386, time: 672)
Coordinates:
  * lat              (lat) float64 25.06 25.1 25.15 25.19 ... 49.31 49.35 49.4
  * lon              (lon) float64 235.2 235.3 235.3 235.4 ... 292.9 292.9 292.9
  * time             (time) object 1950-01-15 00:00:00 ... 2005-12-15 00:00:00
    crs              int64 0
Data variables:
    air_temperature  (time, lat, lon) float32 ...
Attributes:
    description:                     Multivariate Adaptive Constructed Analog...
    id:                              MACAv2-METDATA
    naming_authority:                edu.uidaho.reacch
    Metadata_Conventions:            Unidata Dataset Discovery v1.0
    Metadata_Link:                   
    cdm_data_type:                   FLOAT
    title:                           Monthly aggregation of downscaled daily ...
    summary:                         This archive contains monthly downscaled...
    keywords:                        monthly, precipitation, maximum temperat...
    keywords_vocabulary:             
    standard_name_vocabulary:        CF-1.0
    history:                         No revisions.
    comment:                         
    geospatial_bounds:               POLYGON((-124.7722 25.0631,-124.7722 49....
    geospatial_lat_min:              25.0631
    geospatial_lat_max:              49.3960
    geospatial_lon_min:              -124.7722
    geospatial_lon_max:              -67.0648
    geospatial_lat_units:            decimal degrees north
    geospatial_lon_units:            decimal degrees east
    geospatial_lat_resolution:       0.0417
    geospatial_lon_resolution:       0.0417
    geospatial_vertical_min:         0.0
    geospatial_vertical_max:         0.0
    geospatial_vertical_resolution:  0.0
    geospatial_vertical_positive:    up
    time_coverage_start:             2000-01-01T00:0
    time_coverage_end:               2004-12-31T00:00
    time_coverage_duration:          P5Y
    time_coverage_resolution:        P1M
    date_created:                    2014-05-15
    date_modified:                   2014-05-15
    date_issued:                     2014-05-15
    creator_name:                    John Abatzoglou
    creator_url:                     http://maca.northwestknowledge.net
    creator_email:                   jabatzoglou@uidaho.edu
    institution:                     University of Idaho
    processing_level:                GRID
    project:                         
    contributor_name:                Katherine C. Hegewisch
    contributor_role:                Postdoctoral Fellow
    publisher_name:                  REACCH
    publisher_email:                 reacch@uidaho.edu
    publisher_url:                   http://www.reacchpna.org/
    license:                         Creative Commons CC0 1.0 Universal Dedic...
    coordinate_system:               WGS84,EPSG:4326
    grid_mapping:                    crs

By default xarray does not handle spatial operations. However, if you load rioxarray it adds additional spatial functionality (supported by rasterio that supports handling:

  • coordinate reference systems
  • reprojection
  • clipping

and more. Below you grab the coordinate reference system of the climate data using the .rio.crs method that is available because you have rioxarray loaded in this notebook.

# For later - grab the crs of the data using rioxarray
climate_crs = max_temp_xr.rio.crs
climate_crs
CRS.from_wkt('GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6378137,298.257223563]],PRIMEM["undefined",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Longitude",EAST],AXIS["Latitude",NORTH]]')

Work With the NetCDF Data Structure - A Hierarchical Data Format

The object above is hierarchical and contains metadata making it self-describing. There are three dimensions to consider when working with this data which represent the x,y and z dimensions of the data:

  1. latitude
  2. longitude
  3. time

The latitude and longitude dimension values reprensent an arrya containing the point location value of each pixel in decimal degrees. These are location of each pixel in your data. The time array similarly represents the time location for each array in the data cube. This particular dataset contains historical modeleled monthly max temperature values for the Continental United States.

# View first 5 latitude values
max_temp_xr["air_temperature"]["lat"].values[:5]

print("The min and max latitude values in the data is:", 
      max_temp_xr["air_temperature"]["lat"].values.min(), 
      max_temp_xr["air_temperature"]["lat"].values.max())
print("The min and max longitude values in the data is:", 
      max_temp_xr["air_temperature"]["lon"].values.min(), 
      max_temp_xr["air_temperature"]["lon"].values.max())

The min and max latitude values in the data is: 25.063077926635742 49.39602279663086
The min and max longitude values in the data is: 235.22784423828125 292.93524169921875
# View first 5 and last. 5 time values - notice the span of
# dates range from 1950 to 2005
print("The earliest date in the data is:", max_temp_xr["air_temperature"]["time"].values.min())
print("The latest date in the data is:", max_temp_xr["air_temperature"]["time"].values.max())    

The earliest date in the data is: 1950-01-15 00:00:00
The latest date in the data is: 2005-12-15 00:00:00
max_temp_xr["air_temperature"]["time"].values.shape
(672,)

Time is the z dimension. In this dataset you have 672 months worth of data.

These data are spatial and thus have a coordinate reference system associated with them. You can access this information using .crs

HINT max_temp_xr["crs"] should also work!

Below, you use .rio.crs to access the CRS information rather than .crs. Rioxarray generally does a better job of accessing data attributes

# The data are lat/long (not projected)
max_temp_xr.rio.crs
CRS.from_wkt('GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6378137,298.257223563]],PRIMEM["undefined",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Longitude",EAST],AXIS["Latitude",NORTH]]')

Hierarchical Formats Are Self Describing

Hierarchical data formats are self-describing. This means that the metadata for the data are contained within the file itself. xarray stores metadata in the .attrs part of the file structure using a Python dictionary structure. You can view the metadata using .attrs.

# View metadata
metadata = max_temp_xr.attrs
metadata
{'description': 'Multivariate Adaptive Constructed Analogs (MACA) method, version 2.3,Dec 2013.',
 'id': 'MACAv2-METDATA',
 'naming_authority': 'edu.uidaho.reacch',
 'Metadata_Conventions': 'Unidata Dataset Discovery v1.0',
 'Metadata_Link': '',
 'cdm_data_type': 'FLOAT',
 'title': 'Monthly aggregation of downscaled daily meteorological data of Monthly Average of Daily Maximum Near-Surface Air Temperature from College of Global Change and Earth System Science, Beijing Normal University (BNU-ESM) using the run r1i1p1 of the historical scenario.',
 'summary': 'This archive contains monthly downscaled meteorological and hydrological projections for the Conterminous United States at 1/24-deg resolution. These monthly values are obtained by aggregating the daily values obtained from the downscaling using the Multivariate Adaptive Constructed Analogs (MACA, Abatzoglou, 2012) statistical downscaling method with the METDATA (Abatzoglou,2013) training dataset. The downscaled meteorological variables are maximum/minimum temperature(tasmax/tasmin), maximum/minimum relative humidity (rhsmax/rhsmin),precipitation amount(pr), downward shortwave solar radiation(rsds), eastward wind(uas), northward wind(vas), and specific humidity(huss). The downscaling is based on the 365-day model outputs from different global climate models (GCMs) from Phase 5 of the Coupled Model Inter-comparison Project (CMIP3) utlizing the historical (1950-2005) and future RCP4.5/8.5(2006-2099) scenarios. ',
 'keywords': 'monthly, precipitation, maximum temperature, minimum temperature, downward shortwave solar radiation, specific humidity, wind velocity, CMIP5, Gridded Meteorological Data',
 'keywords_vocabulary': '',
 'standard_name_vocabulary': 'CF-1.0',
 'history': 'No revisions.',
 'comment': '',
 'geospatial_bounds': 'POLYGON((-124.7722 25.0631,-124.7722 49.3960, -67.0648 49.3960,-67.0648, 25.0631, -124.7722,25.0631))',
 'geospatial_lat_min': '25.0631',
 'geospatial_lat_max': '49.3960',
 'geospatial_lon_min': '-124.7722',
 'geospatial_lon_max': '-67.0648',
 'geospatial_lat_units': 'decimal degrees north',
 'geospatial_lon_units': 'decimal degrees east',
 'geospatial_lat_resolution': '0.0417',
 'geospatial_lon_resolution': '0.0417',
 'geospatial_vertical_min': 0.0,
 'geospatial_vertical_max': 0.0,
 'geospatial_vertical_resolution': 0.0,
 'geospatial_vertical_positive': 'up',
 'time_coverage_start': '2000-01-01T00:0',
 'time_coverage_end': '2004-12-31T00:00',
 'time_coverage_duration': 'P5Y',
 'time_coverage_resolution': 'P1M',
 'date_created': '2014-05-15',
 'date_modified': '2014-05-15',
 'date_issued': '2014-05-15',
 'creator_name': 'John Abatzoglou',
 'creator_url': 'http://maca.northwestknowledge.net',
 'creator_email': 'jabatzoglou@uidaho.edu',
 'institution': 'University of Idaho',
 'processing_level': 'GRID',
 'project': '',
 'contributor_name': 'Katherine C. Hegewisch',
 'contributor_role': 'Postdoctoral Fellow',
 'publisher_name': 'REACCH',
 'publisher_email': 'reacch@uidaho.edu',
 'publisher_url': 'http://www.reacchpna.org/',
 'license': 'Creative Commons CC0 1.0 Universal Dedication(http://creativecommons.org/publicdomain/zero/1.0/legalcode)',
 'coordinate_system': 'WGS84,EPSG:4326',
 'grid_mapping': 'crs'}
# Above you grabbed the metadata (which is a dictionary)
# Here you can print any part of the dictionary that you wish to
metadata["title"]
'Monthly aggregation of downscaled daily meteorological data of Monthly Average of Daily Maximum Near-Surface Air Temperature from College of Global Change and Earth System Science, Beijing Normal University (BNU-ESM) using the run r1i1p1 of the historical scenario.'

Subsetting or “Slicing” Your Data

You can quickly and efficiently slice and subset your data using xarray. Below, you will learn how to slice the data using the .sel() method.

This will return exactly two data points at the location - the temperature value for each day in your temporal slice.

# Select a single x,y combination from the data
key=400
longitude = max_temp_xr["air_temperature"]["lon"].values[key]
latitude = max_temp_xr["air_temperature"]["lat"].values[key]

print("Long, Lat values:", longitude, latitude)
Long, Lat values: 251.89422607421875 41.72947692871094

Below, you can see the x, y location that for the latitude/longitude location that you will use to slice the data above plotted on a map. The code below is an example of creating a spatial plot using the cartopy package that shows the locatio nthat you selected. This step is optional.

# Create a spatial map of your selected location with cartopy
extent = [-120, -70, 24, 50.5]
central_lon = np.mean(extent[:2])
central_lat = np.mean(extent[2:])

f, ax = plt.subplots(figsize=(12, 6),
                     subplot_kw={'projection': ccrs.AlbersEqualArea(central_lon, central_lat)})
ax.coastlines()
# Plot the selected location 
ax.plot(longitude-360, latitude, 
        'r*', 
        transform=ccrs.PlateCarree(),
       color="purple", markersize=10)

ax.set_extent(extent)
ax.set(title="Location of the Latitude / Longitude Being Used To to Slice Your netcdf Climate Data File")

# Adds continent boundaries to the map
ax.add_feature(cfeature.LAND, edgecolor='black')

ax.gridlines()
plt.show()
/opt/conda/lib/python3.8/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/50m/physical/ne_50m_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/opt/conda/lib/python3.8/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/50m/physical/ne_50m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)

Now it is time to subset the data for the point location that you are interested in (the lat / lon value that you plotted above). You can slice the data for one single latitude, longitude location using the .sel() method.

When you slice the data using one single point, your output for every (monthly in this case) time step in the data will be a single pixel value representing max temperature.

# Slice the data spatially using a single lat/lon point
one_point = max_temp_xr["air_temperature"].sel(lat=latitude,
                                               lon=longitude)
one_point
<xarray.DataArray 'air_temperature' (time: 672)>
array([271.11615, 274.05585, 279.538  , ..., 286.85074, 279.6746 , 271.4411 ],
      dtype=float32)
Coordinates:
    lat      float64 41.73
    lon      float64 251.9
  * time     (time) object 1950-01-15 00:00:00 ... 2005-12-15 00:00:00
    crs      int64 0
Attributes:
    long_name:      Monthly Average of Daily Maximum Near-Surface Air Tempera...
    units:          K
    grid_mapping:   crs
    standard_name:  air_temperature
    height:         2 m
    cell_methods:   time: maximum(interval: 24 hours);mean over days
    _ChunkSizes:    [ 10  44 107]

When you slice the data by a single point, notice that output data only has a single array of values. In this case these values represent air temperature (in K) over time.

# Notice the shape of the output array
one_point.shape
(672,)

The data stored in the xarray object is a numpy array. You can process the data in the same way you would process any other numpy array.

# View the first 5 values for that single point
one_point.values[:5]
array([271.11615, 274.05585, 279.538  , 284.42365, 294.1337 ],
      dtype=float32)

Plot A Time Series For a Single Location

Above you used a single point location to slice your data. Because the data are for only one location, but over time, you can quickly create a scatterplot of the data using objectname.plot().

# Use xarray to create a quick time series plot
one_point.plot.line()
plt.show()
Standard plot output from xarray .plot() with no plot elements customized. You can make this plot look much nicer by customizing the line and point markers and colors.
Standard plot output from xarray .plot() with no plot elements customized. You can make this plot look much nicer by customizing the line and point markers and colors.

You can make the plot a bit prettier if you’d like using the standard Python matplotlib plot parameters. Below you change the marker color to purple and the lines to grey. figsize is used to adjust the size of the plot.

# You can clean up your plot as you wish using standard matplotlib approaches
f, ax = plt.subplots(figsize=(12, 6))
one_point.plot.line(hue='lat',
                    marker="o",
                    ax=ax,
                    color="grey",
                    markerfacecolor="purple",
                    markeredgecolor="purple")
ax.set(title="Time Series For a Single Lat / Lon Location")

# Uncomment the line below if you wish to export the figure as a .png file
# plt.savefig("single_point_timeseries.png")
plt.show()
Plot showing historic max temperature climate data plotted using xarray .plot().
Plot showing historic max temperature climate data plotted using xarray .plot().

Convert Climate Time Series Data to a Pandas DataFrame & Export to a .csv File

You can convert your data to a DataFrame and export to a .csv file using to_dataframe() and to_csv().

# Convert to dataframe -- then this can easily be exported to a csv
one_point_df = one_point.to_dataframe()
# View just the first 5 rows of the data
one_point_df.head()
latloncrsair_temperature
time
1950-01-15 00:00:0041.729477251.8942260271.116150
1950-02-15 00:00:0041.729477251.8942260274.055847
1950-03-15 00:00:0041.729477251.8942260279.537994
1950-04-15 00:00:0041.729477251.8942260284.423645
1950-05-15 00:00:0041.729477251.8942260294.133698

Once you have a dataframe object, you can export it directly to a .csv file format.

# Export data to .csv file
one_point_df.to_csv("one-location.csv")

Slice Climate MACAv2 Data By Time and Location

Above you sliced the data by spatial location, selecting only one point location at a specific latitude and longitude. You can also slice the data by time. Below, you slice the data at the selected lat / long location and for a 5-year time period in the year 2000.

Notice that the output shape is 60. 60 represents 12 months a year over 5 years for the selected lat / lon location. Again because you are slicing out data for a single point location, you have a total of 60 data points in the output numpy array.

start_date = "2000-01-01"
end_date = "2005-01-01"
temp_2000_2005 = max_temp_xr["air_temperature"].sel(time=slice(start_date, end_date),
                                                    lat=45.02109146118164,
                                                    lon=243.01937866210938)
temp_2000_2005
<xarray.DataArray 'air_temperature' (time: 60)>
array([273.74884, 277.1434 , 280.0702 , 280.6042 , 289.23993, 290.86853,
       297.70242, 301.36722, 296.4382 , 291.29434, 275.8434 , 273.0037 ,
       275.154  , 278.77863, 282.29105, 283.87787, 290.53387, 296.701  ,
       297.4607 , 301.16116, 295.5421 , 287.3863 , 276.85397, 274.35953,
       278.0117 , 275.68024, 282.12674, 286.41693, 292.4094 , 297.30606,
       300.84628, 302.10593, 288.83698, 288.44   , 276.38992, 276.5348 ,
       275.86145, 276.5708 , 280.8476 , 284.35712, 292.14087, 293.66147,
       301.09534, 302.7656 , 294.99573, 287.5425 , 277.80093, 273.25894,
       274.09232, 276.94736, 280.25446, 283.47614, 289.63776, 290.5187 ,
       301.8519 , 300.40402, 290.73633, 289.60886, 277.12424, 275.7836 ],
      dtype=float32)
Coordinates:
    lat      float64 45.02
    lon      float64 243.0
  * time     (time) object 2000-01-15 00:00:00 ... 2004-12-15 00:00:00
    crs      int64 0
Attributes:
    long_name:      Monthly Average of Daily Maximum Near-Surface Air Tempera...
    units:          K
    grid_mapping:   crs
    standard_name:  air_temperature
    height:         2 m
    cell_methods:   time: maximum(interval: 24 hours);mean over days
    _ChunkSizes:    [ 10  44 107]
temp_2000_2005.shape
(60,)
# View the 60 data points (raster cell values) associated with the spatial and temporal subset
temp_2000_2005.values
array([273.74884, 277.1434 , 280.0702 , 280.6042 , 289.23993, 290.86853,
       297.70242, 301.36722, 296.4382 , 291.29434, 275.8434 , 273.0037 ,
       275.154  , 278.77863, 282.29105, 283.87787, 290.53387, 296.701  ,
       297.4607 , 301.16116, 295.5421 , 287.3863 , 276.85397, 274.35953,
       278.0117 , 275.68024, 282.12674, 286.41693, 292.4094 , 297.30606,
       300.84628, 302.10593, 288.83698, 288.44   , 276.38992, 276.5348 ,
       275.86145, 276.5708 , 280.8476 , 284.35712, 292.14087, 293.66147,
       301.09534, 302.7656 , 294.99573, 287.5425 , 277.80093, 273.25894,
       274.09232, 276.94736, 280.25446, 283.47614, 289.63776, 290.5187 ,
       301.8519 , 300.40402, 290.73633, 289.60886, 277.12424, 275.7836 ],
      dtype=float32)

Notice that in this instance you have much less data for that specific point (5 years worth of data to be exact).

# Plot the data just like you did above
f, ax = plt.subplots(figsize=(12, 6))
temp_2000_2005.plot.line(hue='lat',
                         marker="o",
                         ax=ax,
                         color="grey",
                         markerfacecolor="purple",
                         markeredgecolor="purple")
ax.set(title="A 5 Year Time Series of Temperature Data For A Single Location")
plt.show()
Plot showing 5 years of monthly max temperature data. Here the data are cleaned up a bit with marker colors and lines adjusted.
Plot showing 5 years of monthly max temperature data. Here the data are cleaned up a bit with marker colors and lines adjusted.

Convert Subsetted to a DataFrame & Export to a .csv File

Similar to what you did above, you can also convert your data to a DataFrame. Once your data are in a DataFrame format, you can quickly export a .csv file.

# Convert to dataframe -- then this can be exported to a csv if you want that
temp_2000_2005_df = temp_2000_2005.to_dataframe()
# View just the first 5 rows of the data
temp_2000_2005_df.head()
latloncrsair_temperature
time
2000-01-15 00:00:0045.021091243.0193790273.748840
2000-02-15 00:00:0045.021091243.0193790277.143402
2000-03-15 00:00:0045.021091243.0193790280.070190
2000-04-15 00:00:0045.021091243.0193790280.604187
2000-05-15 00:00:0045.021091243.0193790289.239929
# Create filename from subset info
file_name = "monthly-temp-" + start_date + "-" + end_date + ".csv"
file_name
'monthly-temp-2000-01-01-2005-01-01.csv'
# Export to a csv file to share with your friends!
temp_2000_2005_df.to_csv(file_name)

Slice The Data Across a Spatial Extent For A Specific Time Period

In the above example you sliced out a subset of the data for a specific point location over time. Below you select data for the entire CONUS (Continental United States) study area and analyze it as a spatial raster object.

Once again you use .sel() combined with slice() subset the data.

Notice below that if you don’t specify the lat and lon extent, it will by default return all of the pixels available in the data for the specified time period.

Notice that you are plotting the entire spatial extent of the data which is in this case called CONUS (CONtinental United States)

start_date = "1950-01-15"
end_date = "1950-02-15"

two_months_conus = max_temp_xr["air_temperature"].sel(
    time=slice(start_date, end_date))
# Notice that time has a value of **2** below representing two time steps or months worth of data
two_months_conus
<xarray.DataArray 'air_temperature' (time: 2, lat: 585, lon: 1386)>
[1621620 values with dtype=float32]
Coordinates:
  * lat      (lat) float64 25.06 25.1 25.15 25.19 ... 49.27 49.31 49.35 49.4
  * lon      (lon) float64 235.2 235.3 235.3 235.4 ... 292.8 292.9 292.9 292.9
  * time     (time) object 1950-01-15 00:00:00 1950-02-15 00:00:00
    crs      int64 0
Attributes:
    long_name:      Monthly Average of Daily Maximum Near-Surface Air Tempera...
    units:          K
    grid_mapping:   crs
    standard_name:  air_temperature
    height:         2 m
    cell_methods:   time: maximum(interval: 24 hours);mean over days
    _ChunkSizes:    [ 10  44 107]
two_months_conus.shape
(2, 585, 1386)

The default return when you plot an xarray object with multipple years worth of data across a spatial extent is a histogram.

two_months_conus.shape
(2, 585, 1386)

When you call .plot() on the data, the default plot is a histogram representing the range of raster pixel values in your data for all time periods (2 months in this case).

# Directly plot just a single day using time in "years"
two_months_conus.plot()
plt.show()
The default output of .plot() when you have more than a single point location selected will be a histogram. Here the histogram represents the spread of raster data values in the 2 months of modeled historical data.
The default output of .plot() when you have more than a single point location selected will be a histogram. Here the histogram represents the spread of raster data values in the 2 months of modeled historical data.

Spatial Raster Plots of MACA v2 Climate Data

If you want to plot the data spatially as a raster, you can use .plot() but specify the lon and lat values as the x and y dimensions to plot. You can add the following parameters to your .plot() call to make sure each time step in your data plots spatially:

col_wrap=2: adjust how how many columns the each subplot is spread across col=: what dimension is being plotted in each subplot.

In this case, you want a single raster for each month (time step) in the data so you specify col='time'. col_wrap=1 forces the plots to stack on top of each other in your matplotlib figure.

# Quickly plot the data using xarray.plot()
two_months_conus.plot(x="lon",
                      y="lat",
                      col="time",
                      col_wrap=1)
plt.suptitle("Two Time Steps of Monthly Average Temp", y=1.03)
plt.show()
Plot showing two months of the historic max temperature climate data plotted using xarray .plot(). If you set col_wrap to 1, then you end up with one column and two rows of subplots.
Plot showing two months of the historic max temperature climate data plotted using xarray .plot(). If you set col_wrap to 1, then you end up with one column and two rows of subplots.

If you set col_wrap to 2 you end up with two columns and one subplot in each column.

# Plot the data using 2 columns
two_months_conus.plot(x="lon",
                      y="lat",
                      col="time",
                      col_wrap=2)
plt.show()
Plot showing two months of the historic max temperature climate data plotted using xarray .plot(). If you set col_wrap to 2, then you end up with two columns.
Plot showing two months of the historic max temperature climate data plotted using xarray .plot(). If you set col_wrap to 2, then you end up with two columns.

Plot Multiple MACA v2 Climate Data Raster Files With a Spatial Projection

Below you plot the same data using cartopy which support spatial projectsion. The coastlines() basemap is also added to the plot.

central_lat = 37.5
central_long = 96
extent = [-120, -70, 20, 55.5]  # CONUS

map_proj = ccrs.AlbersEqualArea(central_longitude=central_lon,
                                central_latitude=central_lat)

aspect = two_months_conus.shape[2] / two_months_conus.shape[1]
p = two_months_conus.plot(transform=ccrs.PlateCarree(),  # the data's projection
                          col='time', col_wrap=1,
                          aspect=aspect,
                          figsize=(10, 10),
                          subplot_kws={'projection': map_proj})  # the plot's projection

plt.suptitle("Two Time Steps of CONUS Historic Temperature Data", y=1)
# Add the coastlines to each axis object and set extent
for ax in p.axes.flat:
    ax.coastlines()
    ax.set_extent(extent)
/opt/conda/lib/python3.8/site-packages/xarray/plot/facetgrid.py:373: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. 
  self.fig.tight_layout()
Map showing air temperature data on a map using the PlateCarree CRS for two months.
Map showing air temperature data on a map using the PlateCarree CRS for two months.

Export Raster to Geotiff File

In the above workflows you converted your data into a DataFrame and exported it to a .csv file. This approach works well if you only need the summary values and don’t need any spatial information. However sometimes you may need to export spatial raster files.

You can export your data to a geotiff file format using rioxarray. To do this you will need to:

  1. ensure that your xarray object has a crs defined and define it if it’s missing.
  2. call xarray to export your data

Notice below that your two month subset no long contains CRS information.

# Double check the crs still exist
two_months_conus.rio.crs
CRS.from_wkt('GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6378137,298.257223563]],PRIMEM["undefined",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Longitude",EAST],AXIS["Latitude",NORTH]]')

At the very beginning of this lesson, you saved the crs information from the original xarray object. You can use that to export your geotiff data below.

Once the crs is set, you can export to a geotiff file format.

# Export to geotiff
file_path = "two_months_temp_data.tif"
two_months_conus.rio.to_raster(file_path)

Now test our your data. Reimport it using xarray.open_rasterio.

# Open the data up as a geotiff
two_months_tiff = xr.open_rasterio(file_path)
two_months_tiff
<xarray.DataArray (band: 2, y: 585, x: 1386)>
[1621620 values with dtype=float32]
Coordinates:
  * band     (band) int64 1 2
  * y        (y) float64 25.06 25.1 25.15 25.19 25.23 ... 49.27 49.31 49.35 49.4
  * x        (x) float64 235.2 235.3 235.3 235.4 ... 292.8 292.9 292.9 292.9
Attributes:
    transform:      (0.04166599094652527, 0.0, 235.207011242808, 0.0, 0.04166...
    crs:            +proj=longlat +ellps=WGS84 +no_defs=True
    res:            (0.04166599094652527, -0.04166600148971766)
    is_tiled:       0
    nodatavals:     (-9999.0, -9999.0)
    scales:         (1.0, 1.0)
    offsets:        (0.0, 0.0)
    descriptions:   ('Monthly Average of Daily Maximum Near-Surface Air Tempe...
    AREA_OR_POINT:  Area
    cell_methods:   time: maximum(interval: 24 hours);mean over days
    grid_mapping:   crs
    height:         2 m
    long_name:      Monthly Average of Daily Maximum Near-Surface Air Tempera...
    standard_name:  air_temperature
    units:          K
    _ChunkSizes:    [ 10  44 107]

Plot the data. Do you notice anything about the values in the histogram below?

two_months_tiff.plot()
plt.show()
Histogram showing temperature values for the Continental United States. What do you notice about the range of values?
Histogram showing temperature values for the Continental United States. What do you notice about the range of values?
# Plot the data - this doesn't look right!
two_months_tiff.plot(col="band")
plt.show()
Plot showing unmasked temperature data (containing nodata values) for the Continental United States.
Plot showing unmasked temperature data (containing nodata values) for the Continental United States.

The data above are plotting oddly because there are no data values in your array. You can mask those values using xarray.where().

two_months_tiff.rio.nodata
-9999.0

Below you use .where() to mask all values in your data that are equal to -9999, the nodata value for this data.

# Remove no data values and try to plot again
two_months_tiff = xr.open_rasterio(file_path)

two_months_clean = two_months_tiff.where(
    two_months_tiff != two_months_tiff.rio.nodata)

two_months_clean.plot(col="band")
plt.show()
Plot showing masked temperature data for the Continental United States.
Plot showing masked temperature data for the Continental United States.

Above you learned how to begin to work with NETCDF4 format files containing climate focused data. You learned how to open and subset the data by time and location. In the next lesson you will learn how to implement spatial subsets using shapefiles of your data.

Leave a Comment