Lesson 1. Time Series Data: Work with Dates in R

Work with Sensor Network Derived Time Series Data in R - Earth analytics course module

Welcome to the first lesson in the Work with Sensor Network Derived Time Series Data in R module. This module covers how to work with, plot and subset data with date fields in R. It also covers how to plot data using ggplot.

Get Started with Date Formats in R

In this tutorial, you will look at the date time format - which is important for plotting and working with time series data in R.

Learning Objectives

At the end of this activity, you will be able to:

  • Convert a column in a data.frame containing dates and times to a date/time object that can be used in R.
  • Be able to describe how you can use the data class ‘date’ to create easier to read time series plots in R.

What You Need

You need R and RStudio to complete this tutorial. Also we recommend that you have an earth-analytics directory set up on your computer with a /data directory within it.

In this tutorial, you will learn how to convert data that contain dates and times into a date / time format in R.

First let’s revisit the boulder_precip data variable that you’ve been working with in this module.

# load the ggplot2 library for plotting
options(stringsAsFactors = FALSE)
# download data from figshare
# note that you already downloaded the data in the previous exercises so this line
# is commented out. If you want to redownload the data, umcomment the line below.
              method = "libcurl")

# import data
boulder_precip <- read.csv(file = "data/boulder-precip.csv")

# view first few rows of the data
## 1 756 8/21/13    0.1   55
## 2 757 8/26/13    0.1   25
## 3 758 8/27/13    0.1   NA
## 4 759  9/1/13    0.0 -999
## 5 760  9/9/13    0.1   15
## 6 761 9/10/13    1.0   25

Next, plot the data using ggplot().

# plot the data using ggplot
ggplot(data = boulder_precip, aes(x = DATE, y = PRECIP)) +
  geom_point() +
  labs(x = "Date",
    y = "Total Precipitation (Inches)",
    title = "Precipitation Data",
    subtitle = "Boulder, Colorado 2013")

ggplot of precip data

Notice when you plot the data, the x axis is “messy”. It would be easier to read if you only had ticks on the x axis for dates incrementally - every few weeks. Or once a month even.

Let’s look closely at the structure of the data to understand why R is placing so many labels on the x axis.

## 'data.frame':	18 obs. of  4 variables:
##  $ ID    : int  756 757 758 759 760 761 762 763 764 765 ...
##  $ DATE  : chr  "8/21/13" "8/26/13" "8/27/13" "9/1/13" ...
##  $ PRECIP: num  0.1 0.1 0.1 0 0.1 1 2.3 9.8 1.9 1.4 ...
##  $ TEMP  : int  55 25 NA -999 15 25 65 NA 95 -999 ...

Data Types (Classes) in R

The structure results above tell us that the data columns in your data.frame are stored as several different data types or classes as follows:

  • chr - Character: It holds strings that are composed of letters and words. Character class data cannot be interpreted numerically - that is to say you can not perform math on these values even if they contain only numbers.
  • int - Integer: It holds numbers that are whole integers without decimals. Mathematical operations can be performed on integers.
  • num - Numeric: It accepts data that are a wide variety of numeric formats including decimals (floating point values) and integers. Numeric also accept larger numbers than int will.

Data Frame Columns Can Only Contain One Data Class

A data.frame column can only store one type. This means that a column cannot store both numbers and strings. If a column contains a list of numbers and one letter, then the entire column will be stored as a chr (character).

Storing variables using different classes is a strategic decision by R (and other programming languages) that optimizes processing and storage. It allows:

  • data to be processed more quickly & efficiently.
  • the program (R) to minimize the storage size.

Remember, that you also learned about classes during class in these lessons: vectors in R - data classes

Dates Stored as Characters

Note that the Date column in your data.frame is of class character (chr). This means that R is reading it as letters and numbers rather than dates that contain a value that is sequential.

# View data class for each column that you wish to plot
## [1] "character"

## [1] "numeric"

Thus, when you plot, R tries to plot EVERY date value in your data, on the x-axis. This makes it hard to read. But also it makes it hard to work with the data. For instance - what if you wanted to subset out a particular time period from your data? You can’t do that if the data are stored as characters.

The PRECIP data is numeric so that variable plots just fine.

Convert Date to an R Date Class

You need to convert your date column, which is currently stored as a character to a date class that can be displayed as a continuous variable. Lucky for us, R has a date class. You can convert the date field to a date class using the function as.Date().

When you convert, you need to tell R how the date is formatted - where it can find the month, day and year and what format each element is in.

For example: 1/1/10 vs 1-1-2010

Looking at the results above, you see that your data are stored in the format: Year-Month-Day (2003-08-21). Each part of the date is separated in this case with a -. You can use this information to populate your format string using the following designations for the components of the date-time data:

  • %Y - 4 digit year
  • %y - 2 digit year
  • %m - month
  • %d - day

Your format string will look like this: %m/%d/%y. Notice that you are telling R where to find the year (%y), month (%m) and day (%d). Also notice that you include the dashes that separate each component in each date cell of your data.

NOTE: look up ?strptime to see all of the date “elements” that you can use to describe the format of a date string in R.

# convert date column to date class
boulder_precip$DATE <- as.Date(boulder_precip$DATE,
                        format = "%m/%d/%y")

# view R class of data
## [1] "Date"

# view results
## [1] "2013-08-21" "2013-08-26" "2013-08-27" "2013-09-01" "2013-09-09"
## [6] "2013-09-10"

Now that you have adjusted the date, let’s plot again. Notice that it plots much quicker now that R recognizes date as a date class. R can aggregate ticks on the x-axis by year instead of trying to plot every day!

# quickly plot the data and include a title using main = ""
# use '\n' to force the string to wrap onto a new line

ggplot(data = boulder_precip, aes(x = DATE, y = PRECIP)) +
      geom_bar(stat = "identity", fill = "purple") +
      labs(title = "Total daily precipitation in Boulder, Colorado",
           subtitle = "Fall 2013",
           x = "Date", y = "Daily Precipitation (Inches)")

precip bar plot

Now, your plot looks a lot nicer!

Leave a Comment