Lesson 3. How to Download MACA2 Climate Data Using Python

Open MACA v2 Climate data Programmatically using Open Source Python and Xarray

In this lesson, you will learn how to work with Climate Data Sets (MACA v2 for the Continental United States - CONUS) stored in netcdf 4 format using open source Python.

Learning Objectives

After completing this chapter, you will be able to:

Download different types of MACA v2 climate data in netcdf 4 format
Open and process netcdf4 data using xarray

Get Started Downloading MACA v2 Climate Data in Python

To begin, load the libraries below.

# Import packages
import numpy as np
import netCDF4
import matplotlib.pyplot as plt
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import seaborn as sns

# Plotting options
sns.set(font_scale=1.3)
sns.set_style("white")

Get Started With Downloading Data

The data you will use in this lesson are “Monthly aggregation of downscaled daily meteorological data of Monthly Precipitation Amount from College of Global Change and Earth System Science, Beijing Normal University”. In short, the data contain a monthly summary of lots of meteorological data, such as precipitation, air temperature, and more. The data are derived from a climate model that predicts future trendsin these variables over time.

Below, you will create and assign three Python variables that allow you to programatically select which data you wish to download in this notebook. This workflow could then be convered into an automated workflow that accesses and slices MACA v2 data for an analysis.

The variables including:

Select a Climate Model

model = This Python variable can be set to any number between 0 and 19 which represents the 20 climate models that are available for MACA v2 data. The model represents how (the methods used) the climate data were created. You can learn more about each model by clicking here

# Models to chose from
model_name = ('bcc-csm1-1',
              'bcc-csm1-1-m',
              'BNU-ESM',
              'CanESM2',
              'CCSM4',
              'CNRM-CM5',
              'CSIRO-Mk3-6-0',
              'GFDL-ESM2G',
              'GFDL-ESM2M',
              'HadGEM2-CC365',
              'HadGEM2-ES365',
              'inmcm4',
              'IPSL-CM5A-MR',
              'IPSL-CM5A-LR',
              'IPSL-CM5B-LR',
              'MIROC5',
              'MIROC-ESM',
              'MIROC-ESM-CHEM',
              'MRI-CGCM3',
              'NorESM1-M')

Climate Data Variables

var = is the variable in the dataset you want to work with. There are 9 options for variables and they are listed in both short and long name versions below. You can assign var = to any number between 0 and 8, where 0 is the first option in the list, and 8 is the last. In the list below note that is var = 0 you would be selecting tax_max or max temperature.

# These are the variable options for the met data
variable_name = ('tasmax',
                 'tasmin',
                 'rhsmax',
                 'rhsmin',
                 'pr',
                 'rsds',
                 'uas',
                 'vas',
                 'huss')

# These are var options in long form
var_long_name = ('air_temperature',
                 'air_temperature',
                 'relative_humidity',
                 'relative_humidity',
                 'precipitation',
                 'surface_downwelling_shortwave_flux_in_air',
                 'eastward_wind',
                 'northward_wind',
                 'specific_humidity')

Climate Data Scenarios

scenario = can be chosen to pick which climate scenario you want to you. 0 is the historical actual data. This data is based on actual data and is not modeled. 1 is the rcp45 scenario, which is described as an intermediate climate scenario. 2 is the rcp85 scenario, which is a worst case (strongest immissions) climate scenario.

Data Tip: You can learn more about the various variables and scenario options by going to the toolbox and clicking on the small yellow question mark next to “variable” or “scenario”. Note that the scenario options are only available when you try to download future predicted data.

Select Data Download Options

Below you first create lists containing the the options that you wish to use to download your data.

# This is the base url required to download data from the thredds server.
dir_path = 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/'

# These are the variable options for the met data
variable_name = ('tasmax',
                 'tasmin',
                 'rhsmax',
                 'rhsmin',
                 'pr',
                 'rsds',
                 'uas',
                 'vas',
                 'huss')

# These are var options in long form
var_long_name = ('air_temperature',
                 'air_temperature',
                 'relative_humidity',
                 'relative_humidity',
                 'precipitation',
                 'surface_downwelling_shortwave_flux_in_air',
                 'eastward_wind',
                 'northward_wind',
                 'specific_humidity')

# Models to chose from
model_name = ('bcc-csm1-1',
              'bcc-csm1-1-m',
              'BNU-ESM',
              'CanESM2',
              'CCSM4',
              'CNRM-CM5',
              'CSIRO-Mk3-6-0',
              'GFDL-ESM2G',
              'GFDL-ESM2M',
              'HadGEM2-CC365',
              'HadGEM2-ES365',
              'inmcm4',
              'IPSL-CM5A-MR',
              'IPSL-CM5A-LR',
              'IPSL-CM5B-LR',
              'MIROC5',
              'MIROC-ESM',
              'MIROC-ESM-CHEM',
              'MRI-CGCM3',
              'NorESM1-M')

# Scenarios
scenario_type = ('historical', 'rcp45', 'rcp85')

# Year start and ends (historical vs projected)
year_start = ('1950', '2006', '2006')
year_end = ('2005', '2099', '2099')
run_num = [1] * 20
run_num[4] = 6  # setting CCSM4 with run 6
domain = 'CONUS'

Next, select the options that you want to use for your data download.

# Model options between 0-19
model = 2
# Options 0-8 will work for var. Var maps to the variable name below
var = 0
# Options range from 0-2
scenario = 2

try: 
    print("Great! You have selected: \n \u2705 Variable: {} \n \u2705 Model: {}, "
      "\n \u2705 Scenario: {}".format(variable_name[var], 
                                      model_name[model],
                                      scenario_type[scenario]))
except IndexError as e:
    raise IndexError("Oops, it looks like you selected value that is "
                     "not within the range of values which is 0-2. please look"
                     "closely at your selected values.")

Great! You have selected: 
 ✅ Variable: tasmax 
 ✅ Model: BNU-ESM, 
 ✅ Scenario: rcp85

Finally, use the scenario variable to select the time period associated with the options selected above.

try:
    time = year_start[scenario]+'_' + year_end[scenario]
    print("\u2705 Your selected time period is:", time)
except IndexError as e:
    raise IndexError("Oops, it looks like you selected a scenario value that is \
                     not within the range of values which is 0-2")
    

✅ Your selected time period is: 2006_2099

Below you create a path to the correct MACA data using the Python variables created abive. The file name containing both agg_macav2metdata_ and _monthly.nc represents monthly data. You will use that data for this lesson over the daily data because it will be a smaller file to download.

Data Access Tip

Monthly vs. Daily Data

The example below creates a path to the non aggregated monthly CONUS (Continental United States) data. However you can also access the daily or aggregated data using a similar approach

Here is a slightly dated but good examples of accessing MACA v2 data using Python. The demo further shows you how to access data for specific locations rather than needing to download the entire file.

# This code creates a path to the monthly MACA v2 data
file_name = ('agg_macav2metdata_' +
             str(variable_name[var]) +
             '_' +
             str(model_name[model]) +
             '_r' +
             str(run_num[model])+'i1p1_' +
             str(scenario_type[scenario]) +
             '_' +
             time + '_' +
             domain + '_monthly.nc')

print("\u2705 You are accessing:\n", file_name, "\n data in netcdf format")

✅ You are accessing:
 agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_rcp85_2006_2099_CONUS_monthly.nc 
 data in netcdf format

full_file_path = dir_path + file_name
print("The full path to your data is: \n", full_file_path)

The full path to your data is: 
 http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_rcp85_2006_2099_CONUS_monthly.nc

Open Your Data

Below you open your data with xarray. The open data code is wrapped in a try/except block to ensure that it fails gracefully if the data can’t be accessed. Remember that when you are opening data here, you are hitting a server online. Thus you need internet access to run the code below.

# Open the data from the thredds server
try:
    max_temp_xr = xr.open_dataset(full_file_path)
except OSError as oe:
    print("Oops, it looks like the file that you are trying to connect to, "
          "{}, doesn't exist. Try to revisit your model options to ensure "
          "the data exist on the server.  ".format(full_file_path))

# View your temperature data
max_temp_xr

<xarray.Dataset>
Dimensions:          (lat: 585, crs: 1, lon: 1386, time: 1128)
Coordinates:
  * lat              (lat) float64 25.06 25.1 25.15 25.19 ... 49.31 49.35 49.4
  * crs              (crs) int32 1
  * lon              (lon) float64 235.2 235.3 235.3 235.4 ... 292.9 292.9 292.9
  * time             (time) object 2006-01-15 00:00:00 ... 2099-12-15 00:00:00
Data variables:
    air_temperature  (time, lat, lon) float32 ...
Attributes: (12/46)
    description:                     Multivariate Adaptive Constructed Analog...
    id:                              MACAv2-METDATA
    naming_authority:                edu.uidaho.reacch
    Metadata_Conventions:            Unidata Dataset Discovery v1.0
    Metadata_Link:                   
    cdm_data_type:                   FLOAT
    ...                              ...
    contributor_role:                Postdoctoral Fellow
    publisher_name:                  REACCH
    publisher_email:                 reacch@uidaho.edu
    publisher_url:                   http://www.reacchpna.org/
    license:                         Creative Commons CC0 1.0 Universal Dedic...
    coordinate_system:               WGS84,EPSG:4326

Subset Your Data

Currently, the dataset you have is too big to work with. You can fix this by subsetting the data. There are two ways you can subset the data: spatially, and temporally.

To spatially subset the data, you will only look at data from one point in the xarray Dataset. Below, assign a new number for latitude and longitude to pick a new point. The data’s latitude values range from about 25 to 50, and the data’s longitude values range from 235 to 292. So try and pick new values within those ranges.

To temporally subset the data, you can pick a start date and end date to trim the data to. Below, assign new values for the data to start and end at. Make sure the values you assign stay in the quotes provided. The format should be 'yyyy-mm'. Keep in mind that depending on which scenario you chose above, the years of your data will be different. So pick dates that are within the scenario you chose.

Scenario Number	Date Range
0	1950-2005
1	2006-2099
2	2006-2099

# Select the latitude, longitude, and timeframe to subset the data to

# Ensure your latitude value is between 25 and 50, and your longitude value is between 235 and 292
# latitude = 35
# longitude = 270
start_date = '2008-01'
end_date = '2012-09'

# Select a lat / lon location that you wish to use to extract the data
latitude = max_temp_xr.lat.values[300]
longitude = max_temp_xr.lon.values[150]
print("You selected the following x,y location:", longitude, latitude)

You selected the following x,y location: 241.4777374267578 37.5628776550293

# Slice one lat/lon data point
temp_single_point = max_temp_xr["air_temperature"].sel(
    lat=latitude,
    lon=longitude)

temp_single_point

<xarray.DataArray 'air_temperature' (time: 1128)>
array([282.93192, 285.54318, 291.04315, ..., 301.6674 , 290.809  , 288.78992],
      dtype=float32)
Coordinates:
    lat      float64 37.56
    lon      float64 241.5
  * time     (time) object 2006-01-15 00:00:00 ... 2099-12-15 00:00:00
Attributes:
    long_name:      Monthly Average of Daily Maximum Near-Surface Air Tempera...
    units:          K
    grid_mapping:   crs
    standard_name:  air_temperature
    height:         2 m
    cell_methods:   time: maximum(interval: 24 hours);mean over days
    _ChunkSizes:    [ 10  44 107]

Below you quickly plot the data. You will learn more about working with these data (and creating nicer plots) in the following lessons.

# Quick plot of the data
temp_single_point.plot.line()
plt.show()

Netcdf in Python Intro to Climate Data

Share on

Twitter Facebook Google+ LinkedIn

Earth Data Analytics Online Certificate

Intermediate earth data science textbook

intermediate-earth-data-science-textbook Home