Lesson 2. Open Science Lesson Instructor Notes Intro open science

About

This lesson challenges students to critically think about good file and process management and organization in support of reproducible open science.

Background Materials

Students should review the following presentation PRIOR to participating in the activity.

Download Lesson Data

Activity Overview

Time (Mins)	Topic
10	Intro to Reproducibility
25	Group Work - Identify issues
20	Discuss Issues
05	Wrap up / Survey

First 10 Minutes

Introduction to Reproducibility
Definition
Story about some element where it would have been helpful

Four Facets

Organization
Documentation
Automation
Dissemination

Why It Makes Science Better

Laziness
Help out your future self
Contribute to building upon research efforts
Error checking

List any other reasons / motivation for it.

The Scenario

You are in a lab and a colleague has moved on to a new job and left you their research which you are tasked by your supervisor with picking up and moving forward. Have a look at the files that were left for you to work with and answer the following questions:

Are the contents of the directory easy to understand?
Do you feel confident that you can easily recreate the workflow associated with the data / code?
Do you have access to the data? What data are available and where / how were they collected?

Have the students work in small groups to:

Create a list of things that would make the working directory easier to work with.
Break that list into general “areas” / categories of reproducibility.

Files for an exercise on file, data, and code documentation and organization

Files in the subdirectory messy-dir-example can be used to help students identify problems that make it difficult to share or reuse analyses. There are many problems with the folder structure, file nameing, data organization, and code organization in this example directory.

Identified Problems

Some of the problems within this directory include:

No metadata or readme
No directory structure
Background info is a picture of text instead of searchable text
Multiple files with similar content and different names; ambiguous naming
Some vector GIS files are missing and it is unclear why
Tabular data is in proprietary format
Not clear which sites different files are from
Not clear the order in which the script were run or should be run
In the code:
- Multiple copies of similar code pasted near each other but with slight changes
- Very few comments
- Unclear about the order in which lines should be run
In the tabular file foliar chem:
- Notes at bottom of files
- Notes off to the right in unlabeled column
- Gap between columns
- Column name starting with a number
- Duplicate column names
- Spaces in column names
- Misspellings in columns that might be used as categorical variables
- Different values for missing data
- Dealing with dates in Excel (DANGER)
- Units for values?
- Where is metadata?
- Using colors rather than machine readible column flags
- Multiple tabs

There are more issues with the repo that participants will find.

About This Lesson

This lesson was originally taught as part of the NEON Data Institute 2016 by Naupaka Zimmerman. The data and files are for the most part derived from various NEON remote sensing data products from the D17 California field sites.

Lesson Overview

Share on

Twitter Facebook Google+ LinkedIn

Earth Data Analytics Online Certificate

Lesson 2. Open Science Lesson Instructor Notes Intro open science

About

Background Materials

Activity Overview

First 10 Minutes

The Scenario

Files for an exercise on file, data, and code documentation and organization

Identified Problems

About This Lesson

Share on

You May Also Enjoy

Plot Data With Matplotlib

Calculate Seasonal Summary Values from Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

Calculate Summary Values Using Spatial Areas of Interest (AOIs) including Shapefiles for Climate Data Variables Stored in NetCDF 4 Format: Work With MACA v2 Climate Data in Python

How to Open and Process NetCDF 4 Data Format in Open Source Python