Reproducible Science and Programming Lessons

Learn Methods for Reproducible Science, Automated Workflows and Version Control -

Reproducible science refers to sharing methods and workflows used in a project. One aspect of making your science reproducible, is automating your workflow using scientific programming. If your code is automated and well documented, then someone else could run the same analysis on your data and thus build upon your work. Reproducibility in Earth data science encourages sharing of knowledge and techniques so that scientific efforts can build off each other. In the lessons below, learn how to write clean, reproducible code. Also learn how to share your code and collaborate effectively using version control tools like Git and GitHub.

Loops in Python Exercise

Loops can be used to automate data tasks in Python by iteratively executing the same code on multiple data structures. Practice using loops to automate certain functionality in Python.

last updated: 03 Sep 2020

Introduction to the HDF4 Data Format

MODIS is remote sensing data that is stored in the HDF4 file format. Learn how to view and explore HDF4 files (and their metadata) using the free HDF viewer provided by the HDF group.

last updated: 11 Sep 2020

Work with Landsat Remote Sensing Data in Python

Landsat 8 data are downloaded in tif file format. Learn how to open and manipulate Landsat 8 data in Python. Also learn how to create RGB and color infrared Landsat image composites.

last updated: 11 Sep 2020

Calculate Vegetation Indices in Python

A vegetation index is a value that quantifies vegetation health or structure. Learn how to calculate the NDVI and NBR vegetation indices to study vegetation health and wildfire impacts in Python.

last updated: 11 Sep 2020

Customize Dates on Time Series Plots in Python Using Matplotlib

When you plot time series data using the matplotlib package in Python, you often want to customize the date format that is presented on the plot. Learn how to customize the date format on time series plots created using matplotlib.

last updated: 11 Sep 2020

Work With Datetime Format in Python - Time Series Data

Python provides a datetime object for storing and working with dates. Learn how you can convert columns in a pandas dataframe containing dates and times as strings into datetime objects for more efficient analysis and plotting.

last updated: 11 Sep 2020

Work With Datetime Format in Python - Time Series Data

Python provides a datetime object for storing and working with dates. Learn how you can convert columns in a pandas dataframe containing dates and times as strings into datetime objects for more efficient analysis and plotting.

last updated: 11 Sep 2020

Write Functions with Multiple Parameters in Python

A function is a reusable block of code that performs a specific task. Learn how to write functions that can take multiple as well as optional parameters in Python to eliminate repetition and improve efficiency in your code.

last updated: 03 Sep 2020

Write Functions in Python

A function is a reusable block of code that performs a specific task. Learn how to write functions in Python to eliminate repetition and improve efficiency in your code.

last updated: 03 Sep 2020

Introduction to Writing Functions in Python

A function is a reusable block of code that performs a specific task. Learn how functions can be used to write efficient and DRY (Do Not Repeat Yourself), code in Python.

last updated: 03 Sep 2020

Create Data Workflows with Loops

Loops can be an important part of creating a data workflow in Python. Use loops to go from raw data to a finished project more effeciently.

last updated: 03 Sep 2020

Automate Data Tasks With Loops in Python

Loops can be used to automate data tasks in Python by iteratively executing the same code on multiple data structures. Learn how to automate data tasks in Python using data structures such as lists, numpy arrays, and pandas dataframes.

last updated: 03 Sep 2020

Conditional Statements with Alternative or Combined Conditions

Conditional statements in Python can be written to check for alternative conditions or combinations of multiple conditions. Learn how to write conditional statements in Python that choose betweeen alternative conditions or check for combinations of conditions before executing code.

last updated: 03 Sep 2020

Intro to Conditional Statements in Python

Conditional statements help you to control the flow of code by executing code only when certain conditions are met. Learn about the structure of conditional statements in Python and how they can be used to write Do Not Repeat Yourself, or DRY, code in Python.

last updated: 03 Sep 2020

Guided Activity to Submit Pull Requests

A pull request allows anyone to suggest changes to a repository on GitHub that can be easily reviewed by others. Learn how to submit pull requests on GitHub.com to suggest changes to a GitHub repository.

last updated: 03 Sep 2020

Github Open Source Workflow....

GitHub.com can be used to store and access files in the cloud using GitHub repositories. Learn how to submit pull requests on GitHub.com to suggest changes to a GitHub repository.

last updated: 03 Sep 2020

Select Data From Pandas Dataframes

Pandas dataframes are a commonly used scientific data structure in Python that store tabular data using rows and columns with headers. Learn how to use indexing and filtering to select data from pandas dataframes.

last updated: 15 Sep 2020

Run Calculations and Summary Statistics on Pandas Dataframes

Pandas dataframes are a commonly used scientific data structure in Python that store tabular data using rows and columns with headers. Learn how to run calculations and summary statistics (such as mean or maximum) on columns in pandas dataframes.

last updated: 15 Sep 2020

Import CSV Files Into Pandas Dataframes

Pandas dataframes are a commonly used scientific data structure in Python that store tabular data using rows and columns with headers. Learn how to import text data from .csv files into numpy arrays.

last updated: 15 Sep 2020

Intro to Pandas Dataframes

Pandas dataframes are a commonly used scientific data structure in Python that store tabular data using rows and columns with headers. Learn about the key characteristics of pandas dataframes that make them a useful data structure for storing and working with labeled scientific datasets.

last updated: 15 Sep 2020

Slice (or Select) Data From Numpy Arrays

Numpy arrays are an efficient data structure for working with scientific data in Python. Learn how to use indexing to slice (or select) data from one-dimensional and two-dimensional numpy arrays.

last updated: 15 Sep 2020

Run Calculations and Summary Statistics on Numpy Arrays

Numpy arrays are an efficient data structure for working with scientific data in Python. Learn how to run calculations and summary statistics (such as mean or maximum) on one-dimensional and two-dimensional numpy arrays.

last updated: 15 Sep 2020

Import Text Files Into Numpy Arrays

Numpy arrays are an efficient data structure for working with scientific data in Python. Learn how to import text data from .txt and .csv files into numpy arrays.

last updated: 15 Sep 2020

Intro to Numpy Arrays

Numpy arrays are a commonly used scientific data structure in Python that store data as a grid, or a matrix. Learn about the key characteristics of numpy arrays that make them an efficient data structure for storing and working with large scientific datasets.

last updated: 15 Sep 2020

Use the OS and Glob Python Packages to Manipulate File Paths

The os and glob packages are very useful tools in Python for accessing files and directories and for creating lists of paths to files and directories, respectively. Learn how to manipulate and parse file and directory paths using os and glob.

last updated: 03 Sep 2020

Install Packages in Python

Packages in Python provide pre-built functionality that adds to the functionality available in base Python. Learn how to install packages in Python using conda environments.

last updated: 03 Sep 2020

Python Packages for Earth Data Science

The Python programming language provides many packages and libraries for working with scientific data. Learn about key Python packages for earth data science.

last updated: 16 Sep 2020

Customize Your Plots Using Matplotlib

Matplotlib is the most commonly used plotting library in Python. Learn how to customize the colors, symbols, and labels on your plots using matplotlib.

last updated: 16 Sep 2020

DRY Code and Modularity

DRY (Do Not Repeat Yourself) code supports reproducibility by removing repetition and making code easier to read. Learn about key strategies to write DRY code in Python.

last updated: 16 Sep 2020

Python Fundamentals Exercise

Complete these exercises to practice the skills you learned in the Python fundamentals chapters.

last updated: 10 Sep 2020

Basic Operators in Python

Operators are symbols in Python that carry out a specific computation, or operation, such as arithmetic calculations. Learn how to use basic operators in Python.

last updated: 16 Sep 2020

Lists in Python

A Python list is a data structure that stores a collection of values in a specified order (or sequence) and is mutable (or changeable). Learn how to create and work with lists in Python.

last updated: 16 Sep 2020

Variables in Python

Variables store data (i.e. information) that you want to re-use in your code (e.g. single numeric value, path to a directory or file). Learn how to to create and work with variables in Python.

last updated: 16 Sep 2020

Format Text In Jupyter Notebook With Markdown

Markdown allows you to format text using simple, plain-text syntax and can be used to document code in a variety of tools, including Jupyter Notebook. Learn how to format text in Jupyter Notebook using Markdown.

last updated: 03 Sep 2020

Text File Formats for Earth Data Science

There are many text file formats that are useful for earth data science workflows including Markdown, text (.txt, .csv) files, and YAML (Yet Another Markup Language). Learn about these common text file formats for earth data science workflows.

last updated: 03 Sep 2020

Useful Jupyter Notebook Shortcuts

The Jupyter ecosystem contains many useful tools for working with Python including Jupyter Notebook, an interactive coding environment. Learn useful shortcuts in Jupyter Notebook that can help you complete your tasks quickly and efficiently.

last updated: 14 Sep 2020

Manage Jupyter Notebook Files

The Jupyter ecosystem contains many useful tools for working with Python including Jupyter Notebook, an interactive coding environment, and the Jupyter Notebook dashboard, which allows you to manage files and directories in your Jupyter environment. Learn how to manage Jupyter Notebook files including saving, renaming, deleting, moving, and downloading notebooks.

last updated: 14 Sep 2020

Manage Directories in Jupyter Notebook Dashboard

The Jupyter ecosystem contains many useful tools for working with Python including the Jupyter Notebook dashboard, which allows you to manage files and directories in your Jupyter environment. Learn how to create, rename, move, and delete directories using the Jupyter Notebook dashboard.

last updated: 03 Sep 2020

Code and Markdown Cells in Jupyter Notebook

The Jupyter ecosystem contains many useful tools for working with Python including Jupyter Notebook, an interactive coding environment. Learn how to work with cells, including Python code and Markdown text cells, in Jupyter Notebook.

last updated: 14 Sep 2020

Get Started With Jupyter Notebook For Python

The Jupyter ecosystem contains many useful tools for working with Python including Jupyter Notebook, an interactive coding environment. Learn how to launch and close Jupyter Notebook sessions and how to navigate the Jupyter Dashboard to create and open Jupyter Notebook files (.ipynb).

last updated: 03 Sep 2020

Introduction to Jupyter For Python

The Jupyter ecosystem contains many useful tools for working with Python including Jupyter Notebook, an interactive coding environment. Learn how the components and functionality of Jupyter Notebook can help you implement open reproducible science workflows.

last updated: 03 Sep 2020

Bash Commands to Manage Directories and Files

Bash or Shell is a command line tool that is used in open science to efficiently manipulate files and directories. Learn how to run useful Bash commands to access and manage directories and files on your computer.

last updated: 03 Sep 2020

Tools For Open Reproducible Science

Key tools for open reproducible science include Shell (Bash), git and GitHub, Jupyter, and Python. Learn how these tools help you implement open reproducible science workflows.

last updated: 14 Sep 2020

What Is Open Reproducible Science

Open reproducible science refers to developing workflows that others can easily understand and use. It enables you to build on others' work rather than starting from scratch. Learn about the importance and benefits of open reproducible science.

last updated: 03 Sep 2020

Customize your Maps in Python using Matplotlib: GIS in Python

When making maps, you often want to create legends, customize colors, adjust zoom levels, or even make interactive maps. Learn how to customize maps created using vector data in Python with matplotlib, geopandas, and folium.

last updated: 21 Jul 2020

How Do You Design and Automate a Data Workflow

Designing and developing data workflows can help you complete your work more efficiently by allowing you to repeat and automate data tasks. Learn how to design and develop efficient workflows to automate data analyses in Python.

last updated: 11 Sep 2020

Learn to Write Pseudocode for Python Programming

Pseudcode can help you design data workflows through listing out the individual steps of workflow in plain language, so the focus is on the overall data process, rather than on the specific code needed. Learn best practices for writing pseudocode for data workflows.

last updated: 11 Sep 2020

Introduction to Documenting Python Software

Lack of documentation will limit peoples’ use of your code. In this lesson you will learn about 2 ways to document python code using docstrings and online documentation. YOu will also learn how to improve documentation in other software packages.

last updated: 01 Apr 2020

Undo Local Changes With Git

A version control system allows you to track and manage changes to your files. Learn how to undo changes in git after they have been added or committed to version control.

last updated: 08 Sep 2020

Copy (Fork) and Download (Clone) GitHub Repositories

GitHub.com can be used to store and access files in the cloud to share with others or simply as a backup of your local files. Learn how to create a copy of files on GitHub (fork) and to download files from GitHub to your computer (clone).

last updated: 16 Sep 2020

What Is Version Control

A version control system allows you to track and manage changes to your files. Learn benefits of version control for scientific workflows and how git and GitHub.com support version control.

last updated: 16 Sep 2020

Activity on Dry Code

This activity provides an opportunity to practice writing DRY code using loops, conditional statements, and functions.

last updated: 25 Aug 2020

Activity Data Structures

This activity provides an opportunity to practice working with commonly used Python data structures for scientific data: lists, numpy arrays, and pandas dataframes.

last updated: 25 Aug 2020

Subtract Raster Data in Python Using Numpy and Rasterio

Sometimes you need to manipulate multiple rasters to create a new raster output data set in Python. Learn how to create a CHM by subtracting an elevation raster dataset from a surface model dataset in Python.

last updated: 04 Sep 2019

Open, Plot and Explore Lidar Data in Raster Format with Python

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You will learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 04 Sep 2019

Learn to Use NAIP Multiband Remote Sensing Images in Python

Learn how to open up a multi-band raster layer or image stored in .tiff format in Python using Rasterio. Learn how to plot histograms of raster values and how to plot 3 band RGB and color infrared or false color images.

last updated: 11 Sep 2020

Introduction to Multispectral Remote Sensing Data in Python

Multispectral remote sensing data can be in different resolutions and formats and often has different bands. Learn about the differences between NAIP, Landsat and MODIS remote sensing data as it is used in Python.

last updated: 11 Sep 2020

Customize Map Extents in Python: GIS in Python

When making maps, sometimes you want to zoom in to a specific area in your map. Learn how to adjust the x and y limits of your matplotlib and geopandas map to change the spatial extent that is displayed.

last updated: 21 Jul 2020

Customize Matplotlib Raster Maps in Python

Sometimes you want to customize the colorbar and range of values plotted in a raster map. Learn how to create breaks to plot rasters in Python.

last updated: 21 Jul 2020

Interactive Maps in Python

Folium is a Python package that can be used to create interactive maps in Jupyter Notebook. Learn how to create interactive maps with raster overlays in Python using Folium.

last updated: 21 Jul 2020

Plot Spatial Raster Data in Python.

When plotting rasters, you often want to overlay two rasters, add a legend, or make the raster interactive. Learn how to make a map of raster data that has these attributes using Python.

last updated: 21 Jul 2020

Reproject Raster Data Python

Sometimes you will work with multiple rasters that are not in the same projections, and thus, need to reproject the rasters, so they are in the same coordinate reference system. Learn how to reproject raster data in Python using Rasterio.

last updated: 11 Sep 2020

Crop Spatial Raster Data With a Shapefile in Python

Sometimes a raster dataset covers a larger spatial extent than is needed for a particular purpose. In these cases, you can crop a raster file to a smaller extent. Learn how to crop raster data using a shapefile and export it as a new raster in open source Python

last updated: 11 Sep 2020

Classify and Plot Raster Data in Python

Reclassifying raster data allows you to use a set of defined values to organize pixel values into new bins or categories. Learn how to classify a raster dataset and export it as a new raster in Python.

last updated: 11 Sep 2020

About the Geotiff (.tif) Raster File Format: Raster Data in Python

Metadata describe the key characteristics of a dataset such as a raster. For spatial data, these characteristics including the coordinate reference system (CRS), resolution and spatial extent. Learn about the use of TIF tags or metadata embedded within a GeoTIFF file to explore the metadata programatically.

last updated: 11 Sep 2020

Plot Histograms of Raster Values in Python

Histograms of raster data provide the distribution of pixel values in the dataset. Learn how to explore and plot the distribution of values within a raster using histograms.

last updated: 11 Sep 2020

Open, Plot and Explore Raster Data with Python

Raster data are gridded data composed of pixels that store values, such as an image or elevation data file. Learn how to open, plot, and explore raster files in Python.

last updated: 11 Sep 2020

Challenge Yourself

This lesson contains a series of challenges that require using tidyverse functions in R to process data.

last updated: 03 Sep 2019

Automate Workflows Using Loops in R

When you are programming, it can be easy to copy and paste code that works. However this approach is not efficient. Learn how to create for-loops to process multiple files in R.

last updated: 03 Sep 2019

An introduction version control

Learn what version control is, and how Git and GitHub are used in a typical version control workflow.

last updated: 02 Apr 2020

Create For Loops

Learn how to write a for loop to process a set of .csv format text files in R.

last updated: 03 Sep 2019

Get to Know the Function Environment & Function Arguments in R

This lesson introduces the function environment and documenting functions in R. When you run a function intermediate variables are not stored in the global environment. This not only saves memory on your computer but also keeps our environment clean, reducing the risk of conflicting variables.

last updated: 03 Sep 2019

Clean Remote Sensing Data in R - Clouds, Shadows & Cloud Masks

In this lesson, you will learn how to deal with clouds when working with spectral remote sensing data. You will learn how to mask clouds from landsat and MODIS remote sensing data in R using the mask() function. You will also discuss issues associated with cloud cover - particular as they relate to a research topic.

last updated: 30 Mar 2020

Adjust plot extent in R.

In this lesson you will review how to adjust the extent of a spatial plot in R using the ext() or extent argument and the extent of another layer.

last updated: 03 Sep 2019

Plot Grid of Spatial Plots in R.

In this lesson you learn to use the par() or parameter settings in R to plot several raster RGB plots in R in a grid.

last updated: 03 Sep 2019

Work with MODIS Remote Sensing Data in Python

MODIS is a satellite remote sensing instrument that collects data daily across the globe at 250-500 m resolution. Learn how to import, clean up and plot MODIS data in Python.

last updated: 13 Mar 2020

Landsat Remote Sensing tif Files in R

In this lesson you will cover the basics of using Landsat 7 and 8 in R. You will learn how to import Landsat data stored in .tif format - where each .tif file represents a single band rather than a stack of bands. Finally you will plot the data using various 3 band combinations including RGB and color-infrared.

last updated: 08 Jan 2020

Calculate NDVI in R: Remote Sensing Vegetation Index

NDVI is calculated using near infrared and red wavelengths or types of light and is used to measure vegetation greenness or health. Learn how to calculate remote sensing NDVI using multispectral imagery in R.

last updated: 03 Sep 2019

How Multispectral Imagery is Drawn on Computers - Additive Color Models

In this lesson you will learn the basics of using Landsat 7 and 8 in R. You will learn how to import Landsat data stored in .tif format - where each .tif file represents a single band rather than a stack of bands. Finally you will plot the data using various 3 band combinations including RGB and color-infrared.

last updated: 03 Sep 2019

How to Open and Work with NAIP Multispectral Imagery in R

In this lesson you learn how to open up a multi-band raster layer or image stored in .tiff format in R. You are introduced to the stack() function in R which can be used to import more than one band into a stack object in R. You also review using plotRGB to plot a multi-band image using RGB, color-infrared to other band combinations.

last updated: 03 Sep 2019

Extract Raster Values Using Vector Boundaries in R

This lesson reviews how to extract pixels from a raster dataset using a vector boundary. You can use the extracted pixels to calculate mean and max tree height for a study area (in this case a field site where tree heights were measured on the ground. Finally you will compare tree heights derived from lidar data compared to tree height measured by humans on the ground.

last updated: 03 Sep 2019

GIS in R: Plot Spatial Data and Create Custom Legends in R

In this lesson you break down the steps required to create a custom legend for spatial data in R. You learn about creating unique symbols per category, customizing colors and placing your legend outside of the plot using the xpd argument combined with x,y placement and margin settings.

last updated: 30 Mar 2020

GIS With R: Projected vs Geographic Coordinate Reference Systems

Geographic coordinate reference systems are often used to make maps of the world. Projected coordinate reference systems are use to optimize spatial analysis for a region. Learn about WGS84 and UTM Coordinate Reference Systems as used in R.

last updated: 13 Mar 2020

Coordinate Reference System and Spatial Projection

Coordinate reference systems are used to convert locations on the earth which is round, to a two dimensional (flat) map. Learn about the differences between coordinate reference systems.

last updated: 03 Sep 2019

Clip Raster in R

You can clip a raster to a polygon extent to save processing time and make image sizes smaller. Learn how to crop a raster dataset in R.

last updated: 13 Mar 2020

Classify a Raster in R.

This lesson presents how to classify a raster dataset and export it as a new raster in R.

last updated: 13 Mar 2020

Create a Canopy Height Model With Lidar Data

A canopy height model contains height values trees and can be used to understand landscape change over time. Learn how to use LIDAR elevation data to calculate canopy height and change in terrain over time.

last updated: 03 Sep 2019

How to Open and Use Files in Geotiff Format

A GeoTIFF is a standard file format with spatial metadata embedded as tags. Use the raster package in R to open geotiff files and spatial metadata programmatically.

last updated: 03 Sep 2019

Plot Histograms of Raster Values in R

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 03 Sep 2019

Introduction to Lidar Raster Data Products

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 13 Mar 2020

How to Address Missing Values in R

Missing data in R can be caused by issues in data collection and / or processing and presents challenges in data analysis. Learn how to address missing data values in R.

last updated: 03 Sep 2019

The Syntax of the R Scientific Programming Language - Data Science for Scientists 101

This lesson introduces the basic syntax associated with the R scientific programming language. You will learn about assignment operators (<-), comments and basic functions that are available to use in R to perform basic tasks including head(), qplot() to quickly plot data and others. This lesson is designed for someone who has not used R before. You will work with precipitation and stream discharge data for Boulder County.

last updated: 30 Mar 2020

Compare Lidar to Measured Tree Height

To explore uncertainty in remote sensing data, it is helpful to compare ground-based measurements and data that are collected via airborne instruments or satellites. Learn how to create scatter plots that compare values across two datasets.

last updated: 11 Sep 2020

Extract Raster Values at Point Locations in Python

For many scientific analyses, it is helpful to be able to select raster pixels based on their relationship to a vector dataset (e.g. locations, boundaries). Learn how to extract data from a raster dataset using a vector dataset.

last updated: 11 Sep 2020

Compare Lidar With Human Measured Tree Heights - Remote Sensing Uncertainty

Uncertainty quantifies a range of values within which a measurement value could be within, considering a specified level of confidence. Learn about the types of uncertainty that you can expect when working with tree height data both derived from lidar remote sensing and human measurements and learn about sources of error including systematic vs. random error.

last updated: 11 Sep 2020

R Markdown resources

Find resources that will help you use the R Markdown format.

last updated: 03 Sep 2019

Add Images to an R Markdown Report

This lesson covers how to use markdown to add images to a report. It also discusses good file management practices associated with saving images within your project directory to avoid losing them if you have to go back and work on the report in the future.

last updated: 03 Sep 2019

Convert R Markdown to PDF or HTML

Knitr can be used to convert R Markdown files to different formats, including web friendly formats. Learn how to convert R Markdown to PDF or HTML in RStudio.

last updated: 03 Sep 2019

How to Use R Markdown Code Chunks

Code chunks in an R Markdown document are used to separate code from text in a Rmd file. Learn how to create reports using R Markdown.

last updated: 03 Sep 2019

File Organization 101

Learn key principles for naming and organizing files and folders in a working directory.

last updated: 03 Sep 2019

Get to Know RStudio

Learn how to work with R using the RStudio application.

last updated: 03 Sep 2019