Reproducible Science and Programming Lessons

Learn Methods for Reproducible Science, Automated Workflows and Version Control -

Reproducible science refers to sharing methods and workflows used in a project. One aspect of making your science reproducible, is automating your workflow using scientific programming. If your code is automated and well documented, then someone else could run the same analysis on your data and thus build upon your work. Reproducibility in Earth data science encourages sharing of knowledge and techniques so that scientific efforts can build off each other. In the lessons below, learn how to write clean, reproducible code. Also learn how to share your code and collaborate effectively using version control tools like Git and GitHub.

Data Wrangling With Numpy Arrays

This lesson teaches you how to wrangle data (e.g. run multi-task functions, combine) with numpy arrays.

last updated: 22 Aug 2018

Data Wrangling With Pandas

This lesson teaches you how to wrangle data (e.g. subselect, update, and combine) with pandas dataframes.

last updated: 10 Sep 2018

Write Custom Functions

This lesson teaches you how to write custom functions in Python.

last updated: 10 Sep 2018

Intro to Functions

This lesson describes how functions are used in Python to write DRY and modular code.

last updated: 10 Sep 2018

Intro to Conditional Statements

This lesson describes the structure of conditional statements in Python and demonstrates how they are used for writing DRY code.

last updated: 10 Sep 2018

Intro to Loops

This lesson describes the structure of loops in Python and how they are used to iteratively execute code.

last updated: 13 Aug 2018

Intro to DRY code

This lesson describes the DRY (i.e. Do Not Repeat Yourself) principle and lists key strategies for writing DRY code in Python.

last updated: 10 Sep 2018

Intro to Pandas Dataframes

This lesson describes key characteristics of pandas dataframes, a data structure commonly used for scientific data.

last updated: 10 Sep 2018

Introduction to Documenting Python Software

Lack of documentation will limit peoples’ use of your code. In this lesson you will learn about 2 ways to document python code using docstrings and online documentation. YOu will also learn how to improve documentation in other software packages.

last updated: 23 Oct 2018

Activity on Dry Code

This activity provides an opportunity to practice writing DRY code using loops, conditional statements, and functions.

last updated: 10 Sep 2018

Activity Data Structures

This activity provides an opportunity to practice working with commonly used Python data structures for scientific data: lists, numpy arrays, and pandas dataframes.

last updated: 10 Sep 2018

Intro to Numpy Arrays

This lesson describes the key characteristics of a commonly used data structure in Python for scientific data: numpy arrays.

last updated: 10 Sep 2018

What Is Version Control

This lesson reviews the process and benefits of version control and how Git and GitHub support version control.

last updated: 08 Aug 2018

Plot Data in Python with Matplotlib

Matplotlib is one of the most commonly used packages for plotting in Python. This lesson covers how to create a plot and customize plot colors and label axes using matplotlib.

last updated: 10 Sep 2018

Import Python Packages

Python packages are organized directories of code that provide functionality such as plotting data. Learn how to write Python Code to import packages.

last updated: 08 Aug 2018

Python Lists

This lesson walks you through creating and editing Python lists.

last updated: 12 Aug 2018

Variables in Python

Variables store data (i.e. information) that you want to re-use in your code (e.g. a single value, list of values, path to a directory, filename). Learn how to write Python code to work with variables.

last updated: 10 Sep 2018

Subtract Raster Data in Python Using Numpy and Rasterio

Sometimes you need to manipulate multiple rasters to create a new raster output data set in Python. Learn how to create a CHM by subtracting an elevation raster dataset from a surface model dataset in Python.

last updated: 19 Jul 2018

Open, Plot and Explore Lidar Data in Raster Format with Python

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You will learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 19 Jul 2018

The Jupyter Notebook Interface

Jupyter Notebooks is an interactive environment where you can write and run code and also add text that describes your workflow using Markdown. Learn how to use Jupyter Notebook to run Python Code and Markdown Text.

last updated: 08 Aug 2018

Get Files From GitHub

GitHub can be used to store and access files. Learn how to create a copy of files on Github (forking) and to use the Terminal to download the copy to your computer (cloning). You will also learn how to to update your forked repository with changes made in the original Github repository.

last updated: 19 Sep 2018

Intro to Shell

This lesson walks you through using Bash/Shell to navigate and manage files and directories on your computer.

last updated: 08 Aug 2018

Work with Landsat Remote Sensing Data in Python

Landsat 8 data are downloaded in tif file format. Learn how to open and manipulate Landsat data in Python. Also learn how to create RGB and color infrafed Landsat image composites.

last updated: 16 Oct 2018

Learn to Use NAIP Multiband Remote Sensing Images in Python

Learn how to open up a multi-band raster layer or image stored in .tiff format in Python using Rasterio. Learn how to plot histograms of raster values and how to plot 3 band RGB and color infrared or false color images.

last updated: 30 Oct 2018

Introduction to Multispectral Remote Sensing Data in Python

Multispectral remote sensing data can be in different resolutions and formats and often has different bands. Learn about the differences between NAIP, Landsat and MODIS remote sensing data as it is used in Python.

last updated: 16 Oct 2018

Get Help with Python

This tutorial covers ways to get help when you are stuck in Python.

last updated: 08 Oct 2018

Write Clean Python Code - Expressive programming 101

This lesson covers the basics of clean coding meaning that we ensure that the code that we write is easy for someone else to understand. We will briefly cover style guides, consistent spacing, literate object naming best practices.

last updated: 08 Oct 2018

Objects and variables in Python

This tutorial introduces the Python programming language. It is designed for someone who has not used Python before. You will work with precipitation and stream discharge data for Boulder County.

last updated: 08 Oct 2018

Get to Know Python & Jupyter Notebooks

This tutorial introduces the Python scientific programming language. It is designed for someone who has not used Python before. You will work with precipitation and stream discharge data for Boulder County in Python but also learn the basics of working with python.

last updated: 08 Oct 2018

Work With Datetime Format in Python - Time Series Data

This lesson covers how to deal with dates in Python. It reviews how to convert a field containing dates as strings to a datetime object that Python can understand and plot efficiently. This tutorial also covers how to handle missing data values in Python.

last updated: 08 Oct 2018

About the Geotiff (.tif) Raster File Format: Raster Data in Python

This lesson introduces the geotiff file format. Further it introduces the concept of metadata - or data about the data. Metadata describe key characteristics of a data set. For spatial data these characteristics including CRS, resolution and spatial extent. Here you learn about the use of tif tags or metadata embedded within a geotiff file as they can be used to explore data programatically.

last updated: 25 Sep 2018

Plot Histograms of Raster Values in Python

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You cover the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 25 Sep 2018

Open, Plot and Explore Lidar Data in Raster Format with Python

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You will learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 30 Oct 2018

Customize Matplotlib Raster Maps in Python

Sometimes you want to customize the colorbar and range of values plotted in a raster map. Learn how to create breaks to plot rasters in Python.

last updated: 25 Sep 2018

Setup Your Earth Analytics Working Directory

This tutorial walks you through how to create your earth-analytics working directory in bash. It also covers how to change the working directory in Jupyter Notebook.

last updated: 14 Sep 2018

Get to Know the Jupyter Notebook Interface

The Jupyter Notebook is an interactive coding environment that allows you to combine code, documentation and outputs. Learn how to use the Jupyter notebook interface.

last updated: 25 Sep 2018

File Organization Tips

This lesson provides a broad overview of file organization principles.

last updated: 25 Sep 2018

Challenge Yourself

This lesson contains a series of challenges that require using tidyverse functions in R to process data.

last updated: 02 Feb 2018

Automate Workflows Using Loops in R

When you are programming, it can be easy to copy and paste code that works. However this approach is not efficient. Learn how to create for-loops to process multiple files in R.

last updated: 02 Feb 2018

An introduction version control

Learn what version control is, and how Git and GitHub are used in a typical version control workflow.

last updated: 14 Sep 2018

Create For Loops

Learn how to write a for loop to process a set of .csv format text files in R.

last updated: 10 Jan 2018

Get to Know the Function Environment & Function Arguments in R

This lesson introduces the function environment and documenting functions in R. When you run a function intermediate variables are not stored in the global environment. This not only saves memory on your computer but also keeps our environment clean, reducing the risk of conflicting variables.

last updated: 10 Jan 2018

Clean Remote Sensing Data in R - Clouds, Shadows & Cloud Masks

In this lesson, you will learn how to deal with clouds when working with spectral remote sensing data. You will learn how to mask clouds from landsat and MODIS remote sensing data in R using the mask() function. You will also discuss issues associated with cloud cover - particular as they relate to a research topic.

last updated: 10 Jan 2018

Adjust plot extent in R.

In this lesson you will review how to adjust the extent of a spatial plot in R using the ext() or extent argument and the extent of another layer.

last updated: 10 Jan 2018

Plot Grid of Spatial Plots in R.

In this lesson you learn to use the par() or parameter settings in R to plot several raster RGB plots in R in a grid.

last updated: 10 Jan 2018

Work with MODIS Remote Sensing Data in Python

MODIS is a satellite remote sensing instrument that collects data daily across the globe at 250-500 m resolution. Learn how to import, clean up and plot MODIS data in Python.

last updated: 23 Oct 2018

Clean Remote Sensing Data in Python - Clouds, Shadows & Cloud Masks

In this lesson, you will learn how to deal with clouds when working with spectral remote sensing data. You will learn how to mask clouds from landsat and MODIS remote sensing data in R using the mask() function. You will also discuss issues associated with cloud cover - particular as they relate to a research topic.

last updated: 06 Nov 2018

Landsat Remote Sensing tif Files in R

In this lesson you will cover the basics of using Landsat 7 and 8 in R. You will learn how to import Landsat data stored in .tif format - where each .tif file represents a single band rather than a stack of bands. Finally you will plot the data using various 3 band combinations including RGB and color-infrared.

last updated: 30 Jul 2018

Calculate NDVI in R: Remote Sensing Vegetation Index

NDVI is calculated using near infrared and red wavelengths or types of light and is used to measure vegetation greenness or health. Learn how to calculate remote sensing NDVI using multispectral imagery in R.

last updated: 30 Jul 2018

How Multispectral Imagery is Drawn on Computers - Additive Color Models

In this lesson you will learn the basics of using Landsat 7 and 8 in R. You will learn how to import Landsat data stored in .tif format - where each .tif file represents a single band rather than a stack of bands. Finally you will plot the data using various 3 band combinations including RGB and color-infrared.

last updated: 08 Dec 2017

How to Open and Work with NAIP Multispectral Imagery in R

In this lesson you learn how to open up a multi-band raster layer or image stored in .tiff format in R. You are introduced to the stack() function in R which can be used to import more than one band into a stack object in R. You also review using plotRGB to plot a multi-band image using RGB, color-infrared to other band combinations.

last updated: 08 Dec 2017

Extract Raster Values Using Vector Boundaries in R

This lesson reviews how to extract pixels from a raster dataset using a vector boundary. You can use the extracted pixels to calculate mean and max tree height for a study area (in this case a field site where tree heights were measured on the ground. Finally you will compare tree heights derived from lidar data compared to tree height measured by humans on the ground.

last updated: 30 Jul 2018

GIS in R: Plot Spatial Data and Create Custom Legends in R

In this lesson you break down the steps required to create a custom legend for spatial data in R. You learn about creating unique symbols per category, customizing colors and placing your legend outside of the plot using the xpd argument combined with x,y placement and margin settings.

last updated: 10 Jan 2018

GIS With R: Projected vs Geographic Coordinate Reference Systems

Geographic coordinate reference systems are often used to make maps of the world. Projected coordinate reference systems are use to optimize spatial analysis for a region. Learn about WGS84 and UTM Coordinate Reference Systems as used in R.

last updated: 30 Jul 2018

Coordinate Reference System and Spatial Projection

Coordinate reference systems are used to convert locations on the earth which is round, to a two dimensional (flat) map. Learn about the differences between coordinate reference systems.

last updated: 30 Jul 2018

Clip Raster in R

You can clip a raster to a polygon extent to save processing time and make image sizes smaller. Learn how to crop a raster dataset in R.

last updated: 10 Jan 2018

Classify a Raster in R.

This lesson presents how to classify a raster dataset and export it as a new raster in R.

last updated: 10 Jan 2018

Create a Canopy Height Model With Lidar Data

A canopy height model contains height values trees and can be used to understand landscape change over time. Learn how to use LIDAR elevation data to calculate canopy height and change in terrain over time.

last updated: 10 Jan 2018

How to Open and Use Files in Geotiff Format

A GeoTIFF is a standard file format with spatial metadata embedded as tags. Use the raster package in R to open geotiff files and spatial metadata programmatically.

last updated: 10 Jan 2018

Plot Histograms of Raster Values in R

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 10 Jan 2018

Introduction to Lidar Raster Data Products

This lesson introduces the raster geotiff file format - which is often used to store lidar raster data. You learn the 3 key spatial attributes of a raster dataset including Coordinate reference system, spatial extent and resolution.

last updated: 30 Jul 2018

How to Address Missing Values in R

Missing data in R can be caused by issues in data collection and / or processing and presents challenges in data analysis. Learn how to address missing data values in R.

last updated: 10 Jan 2018

The Syntax of the R Scientific Programming Language - Data Science for Scientists 101

This lesson introduces the basic syntax associated with the R scientific programming language. You will learn about assignment operators (<-), comments and basic functions that are available to use in R to perform basic tasks including head(), qplot() to quickly plot data and others. This lesson is designed for someone who has not used R before. You will work with precipitation and stream discharge data for Boulder County.

last updated: 10 Jan 2018

R Markdown resources

Find resources that will help you use the R Markdown format.

last updated: 07 Dec 2017

Add Images to an R Markdown Report

This lesson covers how to use markdown to add images to a report. It also discusses good file management practices associated with saving images within your project directory to avoid losing them if you have to go back and work on the report in the future.

last updated: 10 Jan 2018

Convert R Markdown to PDF or HTML

Knitr can be used to convert R Markdown files to different formats, including web friendly formats. Learn how to convert R Markdown to PDF or HTML in RStudio.

last updated: 10 Jan 2018

How to Use R Markdown Code Chunks

Code chunks in an R Markdown document are used to separate code from text in a Rmd file. Learn how to create reports using R Markdown.

last updated: 10 Jan 2018

File Organization 101

Learn key principles for naming and organizing files and folders in a working directory.

last updated: 10 Jan 2018

Get to Know RStudio

Learn how to work with R using the RStudio application.

last updated: 10 Jan 2018