contact@a2zlearners.com

2.5.3. Functional Programming in R

1. Introduction

Functional programming is a style of coding where you use functions to transform data, often by applying a function repeatedly to elements of a vector, list, or data frame.

  • Makes code easier to read, debug, and maintain.
  • R is fundamentally a functional language, and packages like purrr make this approach even more accessible.

2. Why Use Functional Programming?

  • Encourages code reuse and modularity.
  • Reduces copy-paste and repetitive code.
  • Makes code easier to test and debug.
  • Enables concise, expressive data transformations.
  • Essential for working efficiently with lists, data frames, and complex data.

3. Functional Programming Basics in R

  • Functions are first-class objects: assign to variables, pass as arguments, return from other functions.
  • The apply family (apply, lapply, sapply, etc.) are classic base R tools for functional programming.
  • The purrr package (tidyverse) provides a consistent, powerful set of tools for functional programming.

4. The apply Family (Base R)

The apply family lets you perform operations across rows or columns of matrices and arrays.

R Code:

mat <- matrix(c(101, 102, 103, 104, 105, 106), nrow = 2)
apply(mat, 1, max)   # Max subject ID in each **row**
apply(mat, 2, min)   # Min subject ID in each **column**
  • Description:
    • mat is a 2x3 matrix of subject IDs:
      V1 V2 V3
      1 101 103 105
      2 102 104 106
    • apply(mat, 1, max) finds the maximum subject ID in each row.
    • apply(mat, 2, min) finds the minimum subject ID in each column.

Input Table:

V1 V2 V3
1 101 103 105
2 102 104 106

Output Table:

apply(mat, 1, max) apply(mat, 2, min)
105 101
106 103
105

Comparison of apply Family Functions

The apply family includes several functions, each designed for specific use cases. Here's a comparison:

Function Input Type Output Type Purpose
apply Matrix/Array Vector/Array/List Apply a function across rows (MARGIN=1) or columns (MARGIN=2).
lapply List/Vector List Apply a function to each element of a list or vector.
sapply List/Vector Vector/Matrix/List Simplified version of lapply that tries to simplify the output.
vapply List/Vector Vector/Matrix Similar to sapply, but requires specifying the output type explicitly.
tapply Vector + Factor(s) Array/List Apply a function to subsets of a vector, split by one or more factors.
mapply Multiple Vectors Vector/List Multivariate version of sapply; applies a function to multiple inputs.

Key Differences:

  • apply is for matrices/arrays, while lapply, sapply, and vapply are for lists/vectors.
  • sapply simplifies the output, while lapply always returns a list.
  • vapply is safer than sapply because it enforces the output type.
  • tapply is specifically for grouped operations on vectors.
  • mapply works with multiple vectors/lists simultaneously.

5. Functional Programming with purrr

The purrr package provides a family of functions for applying operations to lists, vectors, or data frames.

R Code:

library(purrr)
labs <- list(
  ALT = c(35, 40, 38),
  AST = c(30, 32, 31),
  HGB = c(13.5, 14.2, 13.8)
)
map(labs, mean)
map_int(labs, length)
  • Description:
    • labs is a list of lab test results.
    • map(labs, mean) calculates the mean result for each lab test.
    • map_int(labs, length) returns the number of results for each test.

Input Table:

Lab Test Results
ALT 35, 40, 38
AST 30, 32, 31
HGB 13.5, 14.2, 13.8

Output Table:

map(labs, mean) map_int(labs, length)
ALT: 37.67 ALT: 3
AST: 31.00 AST: 3
HGB: 13.83 HGB: 3

6. Using Anonymous Functions

Anonymous (lambda) functions are functions you define on the fly, without giving them a name.

R Code:

map(labs, function(x) x + 1) # Add 1 to each lab result
map_chr(labs, ~ paste("Count:", length(.))) # Count results per test
  • Description:
    • Adds 1 to each lab result.
    • Returns a string with the count of results for each test.

Input Table:

Lab Test Results
ALT 35, 40, 38
AST 30, 32, 31
HGB 13.5, 14.2, 13.8

Output Table:

map(labs, function(x) x + 1) map_chr(labs, ~ paste("Count:", length(.)))
ALT: 36, 41, 39 ALT: "Count: 3"
AST: 31, 33, 32 AST: "Count: 3"
HGB: 14.5, 15.2, 14.8 HGB: "Count: 3"

7. Functional Programming with Data Frames

You can use purrr::map() with data frames to apply functions to columns.

R Code:

df <- data.frame(
  USUBJID = c("01-701-101", "01-701-102"),
  AGE = c(34, 58),
  SEX = c("M", "F")
)
map_chr(df, class)
map_lgl(df, is.character)
  • Description:
    • Returns the class of each SDTM column.
    • Checks if each column is character type.

Input Table:

USUBJID AGE SEX
01-701-101 34 M
01-701-102 58 F

Output Table:

map_chr(df, class) map_lgl(df, is.character)
USUBJID: "character" USUBJID: TRUE
AGE: "numeric" AGE: FALSE
SEX: "character" SEX: TRUE

8. For Loops vs. Functionals: A User-Friendly Example

Suppose you want to find the maximum lab result for each test.

Copy + Paste Approach

labs <- data.frame(ALT = c(35, 40, 38), AST = c(30, 32, 31))
max(labs$ALT)
max(labs$AST)
  • Description:
    • Finds the max for each lab test manually.

For Loop Approach

output <- vector("double", ncol(labs))
for (i in seq_along(labs)) {
  output[[i]] <- max(labs[[i]])
}
output
  • Description:
    • Loops through each lab test and stores the max value.

Function Approach

col_max <- function(df) {
  output <- vector("double", length(df))
  for (i in seq_along(df)) {
    output[i] <- max(df[[i]])
  }
  output
}
col_max(labs)
  • Description:
    • Wraps the loop in a function for reuse.

purrr Approach

library(purrr)
map_dbl(labs, max)
  • Description:
    • Uses map_dbl to apply max to each lab test column.

Input Table:

ALT AST
35 30
40 32
38 31

Output Table:

max(ALT) max(AST)
40 32

9. The map Family of Functions

The map family provides different output types depending on your needs.

R Code:

labs <- list(
  ALT = c(35, 40, 38),
  AST = c(30, 32, 31),
  HGB = c(13.5, 14.2, 13.8)
)
map(labs, min)                        # Minimum result per test
map_lgl(labs, function(y) all(y > 10)) # All results above 10?
map_dbl(labs, sum)                    # Sum of results per test
map_dbl(labs, mean)                   # Mean result per test
map_chr(labs, ~ paste("Sum is", sum(.))) # String summary per test
  • Description:
    • Finds the minimum, checks if all results are above 10, sums, averages, and summarizes each test.

Input Table:

Lab Test Results
ALT 35, 40, 38
AST 30, 32, 31
HGB 13.5, 14.2, 13.8

Output Table:

map(labs, min) map_lgl(labs, ...) map_dbl(labs, sum) map_dbl(labs, mean) map_chr(labs, ...)
35 TRUE 113 37.67 "Sum is 113"
30 TRUE 93 31.00 "Sum is 93"
13.5 TRUE 41.5 13.83 "Sum is 41.5"

10. Multiple Vectors: map2 and pmap

You can iterate over two or more vectors in parallel using map2 and pmap.

R Code:

visit <- c("SCREENING", "BASELINE", "WEEK 1")
day <- c(1, 2, 8)
map2_chr(visit, day, ~ paste(.x, "Day", .y))
  • Description:
    • Combines visit name and day for each record.

Input Table:

visit day
SCREENING 1
BASELINE 2
WEEK 1 8

Output Table:

map2_chr(visit, day, ...)
"SCREENING Day 1"
"BASELINE Day 2"
"WEEK 1 Day 8"

pmap Example:

df <- data.frame(
  USUBJID = c("01-701-101", "01-701-102"),
  VISIT = c("SCREENING", "BASELINE"),
  DAY = c(1, 2)
)
pmap_chr(df, ~ paste("Subject", ..1, "Visit", ..2, "Day", ..3))
  • Description:
    • Combines subject ID, visit, and day into a summary string.

Input Table:

USUBJID VISIT DAY
01-701-101 SCREENING 1
01-701-102 BASELINE 2

Output Table:

pmap_chr output
"Subject 01-701-101 Visit SCREENING Day 1"
"Subject 01-701-102 Visit BASELINE Day 2"

11. Anonymous Functions in purrr

Anonymous functions are useful for quick, one-off operations inside map calls.

R Code:

usubjid <- c("01-701-101", "01-701-102")
visit <- c("SCREENING", "BASELINE")
map2_chr(usubjid, visit, function(x, y) paste("Subject:", x, "Visit:", y))
map2_chr(usubjid, visit, ~ paste("Subject:", .x, "Visit:", .y))
  • Description:
    • Both approaches combine subject ID and visit into a string.

Input Table:

usubjid visit
01-701-101 SCREENING
01-701-102 BASELINE

Output Table:

map2_chr output
"Subject: 01-701-101 Visit: SCREENING"
"Subject: 01-701-102 Visit: BASELINE"

12. Conclusion

  • Functional programming in R (especially with purrr) is more efficient, readable, and scalable than for loops or manual repetition.
  • Use map functions for common iteration tasks and to write cleaner, more maintainable code.
  • Practice replacing for loops with purrr functionals to simplify your data analysis.

**Resource download links**

2.5.3.-Functional-Programming-in-R.zip