2.5.3. Functional Programming in R
1. Introduction
Functional programming is a style of coding where you use functions to transform data, often by applying a function repeatedly to elements of a vector, list, or data frame.
- Makes code easier to read, debug, and maintain.
- R is fundamentally a functional language, and packages like
purrrmake this approach even more accessible.
2. Why Use Functional Programming?
- Encourages code reuse and modularity.
- Reduces copy-paste and repetitive code.
- Makes code easier to test and debug.
- Enables concise, expressive data transformations.
- Essential for working efficiently with lists, data frames, and complex data.
3. Functional Programming Basics in R
- Functions are first-class objects: assign to variables, pass as arguments, return from other functions.
- The
applyfamily (apply,lapply,sapply, etc.) are classic base R tools for functional programming. - The
purrrpackage (tidyverse) provides a consistent, powerful set of tools for functional programming.
4. The apply Family (Base R)
The apply family lets you perform operations across rows or columns of matrices and arrays.
R Code:
mat <- matrix(c(101, 102, 103, 104, 105, 106), nrow = 2)
apply(mat, 1, max) # Max subject ID in each **row**
apply(mat, 2, min) # Min subject ID in each **column**
- Description:
matis a 2x3 matrix of subject IDs:V1 V2 V3 1 101 103 105 2 102 104 106 apply(mat, 1, max)finds the maximum subject ID in each row.apply(mat, 2, min)finds the minimum subject ID in each column.
Input Table:
| V1 | V2 | V3 | |
|---|---|---|---|
| 1 | 101 | 103 | 105 |
| 2 | 102 | 104 | 106 |
Output Table:
| apply(mat, 1, max) | apply(mat, 2, min) |
|---|---|
| 105 | 101 |
| 106 | 103 |
| 105 |
Comparison of apply Family Functions
The apply family includes several functions, each designed for specific use cases. Here's a comparison:
| Function | Input Type | Output Type | Purpose |
|---|---|---|---|
apply |
Matrix/Array | Vector/Array/List | Apply a function across rows (MARGIN=1) or columns (MARGIN=2). |
lapply |
List/Vector | List | Apply a function to each element of a list or vector. |
sapply |
List/Vector | Vector/Matrix/List | Simplified version of lapply that tries to simplify the output. |
vapply |
List/Vector | Vector/Matrix | Similar to sapply, but requires specifying the output type explicitly. |
tapply |
Vector + Factor(s) | Array/List | Apply a function to subsets of a vector, split by one or more factors. |
mapply |
Multiple Vectors | Vector/List | Multivariate version of sapply; applies a function to multiple inputs. |
Key Differences:
applyis for matrices/arrays, whilelapply,sapply, andvapplyare for lists/vectors.sapplysimplifies the output, whilelapplyalways returns a list.vapplyis safer thansapplybecause it enforces the output type.tapplyis specifically for grouped operations on vectors.mapplyworks with multiple vectors/lists simultaneously.
5. Functional Programming with purrr
The purrr package provides a family of functions for applying operations to lists, vectors, or data frames.
R Code:
library(purrr)
labs <- list(
ALT = c(35, 40, 38),
AST = c(30, 32, 31),
HGB = c(13.5, 14.2, 13.8)
)
map(labs, mean)
map_int(labs, length)
- Description:
labsis a list of lab test results.map(labs, mean)calculates the mean result for each lab test.map_int(labs, length)returns the number of results for each test.
Input Table:
| Lab Test | Results |
|---|---|
| ALT | 35, 40, 38 |
| AST | 30, 32, 31 |
| HGB | 13.5, 14.2, 13.8 |
Output Table:
| map(labs, mean) | map_int(labs, length) |
|---|---|
| ALT: 37.67 | ALT: 3 |
| AST: 31.00 | AST: 3 |
| HGB: 13.83 | HGB: 3 |
6. Using Anonymous Functions
Anonymous (lambda) functions are functions you define on the fly, without giving them a name.
R Code:
map(labs, function(x) x + 1) # Add 1 to each lab result
map_chr(labs, ~ paste("Count:", length(.))) # Count results per test
- Description:
- Adds 1 to each lab result.
- Returns a string with the count of results for each test.
Input Table:
| Lab Test | Results |
|---|---|
| ALT | 35, 40, 38 |
| AST | 30, 32, 31 |
| HGB | 13.5, 14.2, 13.8 |
Output Table:
| map(labs, function(x) x + 1) | map_chr(labs, ~ paste("Count:", length(.))) |
|---|---|
| ALT: 36, 41, 39 | ALT: "Count: 3" |
| AST: 31, 33, 32 | AST: "Count: 3" |
| HGB: 14.5, 15.2, 14.8 | HGB: "Count: 3" |
7. Functional Programming with Data Frames
You can use purrr::map() with data frames to apply functions to columns.
R Code:
df <- data.frame(
USUBJID = c("01-701-101", "01-701-102"),
AGE = c(34, 58),
SEX = c("M", "F")
)
map_chr(df, class)
map_lgl(df, is.character)
- Description:
- Returns the class of each SDTM column.
- Checks if each column is character type.
Input Table:
| USUBJID | AGE | SEX |
|---|---|---|
| 01-701-101 | 34 | M |
| 01-701-102 | 58 | F |
Output Table:
| map_chr(df, class) | map_lgl(df, is.character) |
|---|---|
| USUBJID: "character" | USUBJID: TRUE |
| AGE: "numeric" | AGE: FALSE |
| SEX: "character" | SEX: TRUE |
8. For Loops vs. Functionals: A User-Friendly Example
Suppose you want to find the maximum lab result for each test.
Copy + Paste Approach
labs <- data.frame(ALT = c(35, 40, 38), AST = c(30, 32, 31))
max(labs$ALT)
max(labs$AST)
- Description:
- Finds the max for each lab test manually.
For Loop Approach
output <- vector("double", ncol(labs))
for (i in seq_along(labs)) {
output[[i]] <- max(labs[[i]])
}
output
- Description:
- Loops through each lab test and stores the max value.
Function Approach
col_max <- function(df) {
output <- vector("double", length(df))
for (i in seq_along(df)) {
output[i] <- max(df[[i]])
}
output
}
col_max(labs)
- Description:
- Wraps the loop in a function for reuse.
purrr Approach
library(purrr)
map_dbl(labs, max)
- Description:
- Uses
map_dblto applymaxto each lab test column.
- Uses
Input Table:
| ALT | AST |
|---|---|
| 35 | 30 |
| 40 | 32 |
| 38 | 31 |
Output Table:
| max(ALT) | max(AST) |
|---|---|
| 40 | 32 |
9. The map Family of Functions
The map family provides different output types depending on your needs.
R Code:
labs <- list(
ALT = c(35, 40, 38),
AST = c(30, 32, 31),
HGB = c(13.5, 14.2, 13.8)
)
map(labs, min) # Minimum result per test
map_lgl(labs, function(y) all(y > 10)) # All results above 10?
map_dbl(labs, sum) # Sum of results per test
map_dbl(labs, mean) # Mean result per test
map_chr(labs, ~ paste("Sum is", sum(.))) # String summary per test
- Description:
- Finds the minimum, checks if all results are above 10, sums, averages, and summarizes each test.
Input Table:
| Lab Test | Results |
|---|---|
| ALT | 35, 40, 38 |
| AST | 30, 32, 31 |
| HGB | 13.5, 14.2, 13.8 |
Output Table:
| map(labs, min) | map_lgl(labs, ...) | map_dbl(labs, sum) | map_dbl(labs, mean) | map_chr(labs, ...) |
|---|---|---|---|---|
| 35 | TRUE | 113 | 37.67 | "Sum is 113" |
| 30 | TRUE | 93 | 31.00 | "Sum is 93" |
| 13.5 | TRUE | 41.5 | 13.83 | "Sum is 41.5" |
10. Multiple Vectors: map2 and pmap
You can iterate over two or more vectors in parallel using map2 and pmap.
R Code:
visit <- c("SCREENING", "BASELINE", "WEEK 1")
day <- c(1, 2, 8)
map2_chr(visit, day, ~ paste(.x, "Day", .y))
- Description:
- Combines visit name and day for each record.
Input Table:
| visit | day |
|---|---|
| SCREENING | 1 |
| BASELINE | 2 |
| WEEK 1 | 8 |
Output Table:
| map2_chr(visit, day, ...) |
|---|
| "SCREENING Day 1" |
| "BASELINE Day 2" |
| "WEEK 1 Day 8" |
pmap Example:
df <- data.frame(
USUBJID = c("01-701-101", "01-701-102"),
VISIT = c("SCREENING", "BASELINE"),
DAY = c(1, 2)
)
pmap_chr(df, ~ paste("Subject", ..1, "Visit", ..2, "Day", ..3))
- Description:
- Combines subject ID, visit, and day into a summary string.
Input Table:
| USUBJID | VISIT | DAY |
|---|---|---|
| 01-701-101 | SCREENING | 1 |
| 01-701-102 | BASELINE | 2 |
Output Table:
| pmap_chr output |
|---|
| "Subject 01-701-101 Visit SCREENING Day 1" |
| "Subject 01-701-102 Visit BASELINE Day 2" |
11. Anonymous Functions in purrr
Anonymous functions are useful for quick, one-off operations inside map calls.
R Code:
usubjid <- c("01-701-101", "01-701-102")
visit <- c("SCREENING", "BASELINE")
map2_chr(usubjid, visit, function(x, y) paste("Subject:", x, "Visit:", y))
map2_chr(usubjid, visit, ~ paste("Subject:", .x, "Visit:", .y))
- Description:
- Both approaches combine subject ID and visit into a string.
Input Table:
| usubjid | visit |
|---|---|
| 01-701-101 | SCREENING |
| 01-701-102 | BASELINE |
Output Table:
| map2_chr output |
|---|
| "Subject: 01-701-101 Visit: SCREENING" |
| "Subject: 01-701-102 Visit: BASELINE" |
12. Conclusion
- Functional programming in R (especially with purrr) is more efficient, readable, and scalable than for loops or manual repetition.
- Use map functions for common iteration tasks and to write cleaner, more maintainable code.
- Practice replacing for loops with purrr functionals to simplify your data analysis.
**Resource download links**
2.5.3.-Functional-Programming-in-R.zip