contact@a2zlearners.com

2.4.2. Dates and Times in R: lubridate & hms


1. Introduction

Based on the comprehensive exploration of the Atorus Research Academy e-learning content, I have created an extensive tutorial covering the lubridate and hms packages for R programming in clinical applications. This guide significantly enhances the original material with additional examples, error handling scenarios, and advanced applications.

Lubridate package workflow diagram for R programming

Lubridate package workflow diagram for R programming

2. Core Data Types and Concepts

R handles temporal data through three primary data types that serve different purposes in clinical programming:[^2]

Date Objects store dates without time information as the number of days since January 1, 1970. These are ideal for visit dates, enrollment dates, and other calendar-based clinical events.

Date-Time Objects (POSIXct) combine date and time information, stored as seconds since the Unix epoch, and include timezone awareness. These are essential for precise timestamp recording in clinical trials.

Time Objects (hms) represent time-of-day values as durations since midnight, perfect for dosing times, procedure schedules, and time-based measurements.

HMS package time format diagram for R programming

HMS package time format diagram for R programming

3. Creating Dates and Date-Times

The lubridate package provides multiple approaches for creating date-time objects, each optimized for different data scenarios commonly encountered in clinical programming.[^1][^3]

Parsing Functions: The Recommended Approach

The most intuitive method uses parsing functions named after the order of date components:

# Year-Month-Day formats
ymd("2022-01-15")
ymd_hms("2022-01-15 10:30:45")

# Month-Day-Year formats  
mdy("01/15/2022")
mdy_hms("01/15/2022 10:30:45")

# Day-Month-Year formats
dmy("15-01-2022")
dmy_hms("15-01-2022 10:30:45")

These functions are separator agnostic, meaning they work with various delimiters (dashes, slashes, spaces) and can even parse flexible text formats like "January 4th 2022".[^1][^3]

Converting from Different Input Types

The as_datetime() and as_date() functions handle conversion from numeric Unix timestamps, existing date objects, and character strings with specific format specifications:

# From Unix timestamp
as_datetime(1641040230)

# From character with format specification
as_datetime("02-17-2022 08:25:30", format = "%m-%d-%Y %H:%M:%S")
Creating from Individual Components

For datasets where date components are stored in separate columns (common in clinical databases), the make_datetime() and make_date() functions provide clean solutions:

make_datetime(year = 2022, month = 12, day = 13, 
              hour = 10, min = 44, sec = 30)
make_date(year = 2022, month = 12, day = 13)

4. Working with Times Using HMS

The hms package specializes in handling time-of-day values, which are crucial for clinical applications requiring precise timing:[^2][^4]

# From character (requires colon separators)
hms::as_hms("09:30:45")

# From numeric seconds since midnight
hms::as_hms(34245)  # Results in 09:30:45

# From components
hms::hms(hours = 9, minutes = 30, seconds = 45)

Important Note: Unlike lubridate functions, hms requires proper colon separators and is not separator agnostic.

5. Date and Time Arithmetic

One of lubridate's most powerful features is its sophisticated approach to temporal arithmetic, addressing the complexities of calendar systems and time zones.[^1][^5][^6]

Difftimes: Basic Time Differences

Simple subtraction between date objects creates difftime objects that automatically handle unit conversion:

start_date <- as_date("2019-03-26")
end_date <- as_date("2019-09-24")
duration <- end_date - start_date  # Results in 182 days
Durations vs. Periods: A Critical Distinction

Durations represent exact amounts of time in seconds and are suitable when physical time matters:

dyears(1)    # Exact number of seconds in a year
dmonths(3)   # Exact number of seconds in 3 months

Periods represent calendar time and handle irregularities like leap years and varying month lengths:

years(1)     # Calendar period of 1 year
months(3)    # Calendar period of 3 months

The distinction is crucial for clinical applications:

# Leap year example
leap_date <- as_date("2020-01-01")
leap_date + dyears(1)  # Results in "2020-12-31" (duration)
leap_date + years(1)   # Results in "2021-01-01" (period)

Comprehensive comparison of lubridate vs base R date-time functions

Comprehensive comparison of lubridate vs base R date-time functions

6. Extracting Date and Time Components

Lubridate provides comprehensive functions for extracting temporal components, essential for creating derived variables in clinical datasets:[^1][^6]

sample_datetime <- ymd_hms("2019-06-26 11:45:57")

# Date components
year(sample_datetime)      # 2019
month(sample_datetime)     # 6
day(sample_datetime)       # 26
wday(sample_datetime)      # 4 (Wednesday)

# Time components  
hour(sample_datetime)      # 11
minute(sample_datetime)    # 45
second(sample_datetime)    # 57

# With labels for reporting
month(sample_datetime, label = TRUE)  # "Jun"
wday(sample_datetime, label = TRUE)   # "Wed"

7. Clinical Programming Applications

Clinical data processing flowchart with R date-time handling

Clinical data processing flowchart with R date-time handling

Study Day Calculations

A fundamental requirement in clinical programming is calculating study days relative to a baseline date:

baseline_date <- ymd("2022-01-01")
visit_date <- ymd("2022-02-15")
study_day <- as.numeric(visit_date - baseline_date) + 1  # 46
Adverse Event Duration Analysis

Clinical trials require precise tracking of adverse event durations:

ae_start <- ymd("2022-01-20")
ae_end <- ymd("2022-01-25")
ae_duration <- as.numeric(ae_end - ae_start) + 1  # 6 days

# Classification based on duration
ae_severity <- case_when(
  ae_duration <= 1 ~ "Mild",
  ae_duration <= 3 ~ "Moderate",
  TRUE ~ "Severe"
)
Time Window Calculations

Clinical protocols often specify time windows for procedures and assessments:

# Calculate visit windows
scheduled_date <- ymd("2022-03-15")
window_start <- scheduled_date - days(3)
window_end <- scheduled_date + days(3)

# Check if actual visit falls within window
actual_visit <- ymd("2022-03-17")
in_window <- actual_visit >= window_start & actual_visit <= window_end

Clinical trial timeline with lubridate function usage examples

Clinical trial timeline with lubridate function usage examples

8. Error Handling and Best Practices

Robust Date Parsing

Clinical data often contains inconsistent date formats. A robust parsing function handles multiple scenarios:

safe_date_parse <- function(date_string) {
  # Try base R formats first
  base_formats <- c("%Y-%m-%d", "%m/%d/%Y", "%d-%m-%Y", "%Y%m%d")
  
  for (fmt in base_formats) {
    result <- tryCatch(as.Date(date_string, format = fmt), 
                      error = function(e) NULL)
    if (!is.null(result) && !is.na(result)) return(result)
  }
  
  # Try lubridate functions
  lubridate_funcs <- list(ymd, mdy, dmy, ydm, myd, dym)
  for (func in lubridate_funcs) {
    result <- tryCatch(func(date_string), 
                      warning = function(w) NULL, 
                      error = function(e) NULL)
    if (!is.null(result) && !is.na(result)) return(result)
  }
  
  return(NA)
}
Handling Partial Dates

Clinical data sometimes contains partial dates (e.g., "2022-06-UNK"):

handle_partial_date <- function(date_str) {
  # Replace unknown day with middle of month
  date_str <- gsub("-UNK$", "-15", date_str)
  
  # Use truncated parameter for missing components
  ymd(date_str, truncated = 2)
}
Timezone Management

Clinical trials often span multiple time zones. Proper timezone handling is essential:

# Create datetime with specific timezone
utc_time <- ymd_hms("2022-12-25 10:30:00", tz = "UTC")

# Convert to different timezone (changes the time)
local_time <- with_tz(utc_time, "America/New_York")

# Force timezone without conversion (keeps same time)
forced_time <- force_tz(utc_time, "America/New_York")

9. Advanced Features and Performance Considerations

Vectorized Operations

Lubridate functions are vectorized for efficient processing of large datasets:

# Efficient: Process entire vectors at once
date_vector <- c("2022-01-01", "2022-01-02", "2022-01-03")
parsed_dates <- ymd(date_vector)

# Less efficient: Individual processing
parsed_individual <- sapply(date_vector, ymd)
Memory Efficiency for Large Datasets

For clinical datasets with millions of records, consider using data.table for memory efficiency:

library(data.table)
large_dt <- data.table(date_char = rep("2022-01-01", 1000000))
large_dt[, date_parsed := ymd(date_char)]  # Efficient in-place operation

10. Integration with Tidyverse Workflows

Lubridate seamlessly integrates with dplyr and other tidyverse packages for comprehensive data processing:

clinical_data %>%
  mutate(
    visit_date = ymd(visit_date_char),
    visit_month = lubridate::month(visit_date, label = TRUE),
    study_day = as.numeric(visit_date - baseline_date) + 1,
    study_week = ceiling(study_day / 7)
  ) %>%
  filter(study_day <= 84) %>%
  group_by(study_week) %>%
  summarise(avg_lab_value = mean(lab_value, na.rm = TRUE))

11. Summary and Recommendations

The lubridate and hms packages provide a comprehensive toolkit for handling temporal data in R, with particular strengths in clinical programming applications. Key recommendations include:

  1. Use parsing functions (ymd(), mdy(), etc.) for character data conversion
  2. Implement robust error handling for inconsistent data formats
  3. Understand the distinction between periods and durations
  4. Be explicit about timezone handling in multi-site studies
  5. Leverage vectorization for performance with large datasets
  6. Document temporal assumptions and conventions in your code

The enhanced tutorial materials provide a complete foundation for implementing professional-grade temporal data processing in clinical programming contexts, with comprehensive error handling, best practices, and real-world examples that extend well beyond the basics covered in the original e-learning content.


**Resource download links**

2.4.2.-Dates-and-Times.zip