1.4.16. The apply Family Functions in R vs SAS

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.16. The apply Family Functions in R vs SAS

1. Introduction

The apply family of functions in R (apply, lapply, sapply, vapply, mapply, tapply, etc.) provides a powerful and concise way to perform operations on data structures without explicit loops. These functions are essential for efficient data manipulation, especially in clinical trial programming, and offer a flexible alternative to SAS's array processing and summary procedures.

2. SAS vs R: Program Comparison

Task	SAS Approach	R Approach (apply family)
Row/column summary	`PROC MEANS`, `DATA` step loops	`apply()`
List-wise operation	Array processing, macros	`lapply()`, `sapply()`
Grouped summary	`PROC MEANS`, `BY` statement	`tapply()`, `aggregate()`, `by()`
Parallel operation	Macro loops	`mapply()`

2A. Table: Common Use Cases for apply Family Functions

Function	Typical Use Cases
apply	- Row/column means, sums, min, max, sd; - Summarize repeated measures (labs, vitals); - Custom row/col ops
lapply	- Apply function to each column (e.g., uppercase, trim); - List element summaries; - Convert types
sapply	- Simplified summaries (length, class); - Quick checks per column/element; - Vectorized transformations
tapply	- Grouped summaries (mean/sum by group); - AE by severity, labs by visit; - Counts by group
mapply	- Row-wise operations across columns; - Concatenate/combine columns; - Custom flags/labels
replicate	- Simulations, bootstrapping; - Generate mock data; - Repeated random sampling
vapply	- Type-safe version of sapply; - Enforce output type for summaries; - Production code safety

3. apply(): Row/Column Operations on Matrices and Data Frames

Description:
apply() is used to apply a function to the rows or columns of a matrix or numeric data frame. It is especially useful for summarizing repeated measures (e.g., labs, vitals) for each subject.

Possible Uses:

Calculate row or column means, sums, min, max, standard deviation, etc.
Summarize repeated measures for each subject (e.g., multiple lab results, vital signs).
Apply custom functions to each row or column.

Input Table (Lab Results - LB):

USUBJID	LBTESTCD	LBORRES1	LBORRES2	LBORRES3
01-001	HGB	13.2	13.5	13.1
01-002	HGB	12.8	13.0	12.9
01-003	HGB	14.0	13.8	14.2

The second argument in apply function MARGIN, determines whether the function should be applied across rows or columns:
MARGIN = 1 → apply the function across rows (i.e., row-wise)
MARGIN = 2 → apply the function across columns (i.e., column-wise)

R Row Example:

# Dummy LB data
lb <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  LBTESTCD = c("HGB", "HGB", "HGB"),
  LBORRES1 = c(13.2, 12.8, 14.0),
  LBORRES2 = c(13.5, 13.0, 13.8),
  LBORRES3 = c(13.1, 12.9, 14.2)
)
lb$row_mean <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, mean)
lb$row_sd <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, sd)
lb$row_min <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, min)
lb$row_max <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, max)

Output Table:

USUBJID	LBTESTCD	LBORRES1	LBORRES2	LBORRES3	row_mean	row_sd	row_min	row_max
01-001	HGB	13.2	13.5	13.1	13.27	0.208	13.1	13.5
01-002	HGB	12.8	13.0	12.9	12.90	0.100	12.8	13.0
01-003	HGB	14.0	13.8	14.2	14.00	0.200	13.8	14.2

R Column Example: We’ll now apply a function column-wise to the three lab result columns.

apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 2, mean)

Explanation

lb[, c("LBORRES1", "LBORRES2", "LBORRES3")]: selects the numeric lab result columns.
2: tells apply() to operate column-wise.
mean: calculates the mean of each column.

Output

LBORRES1 LBORRES2 LBORRES3 
   13.33     13.43     13.40

This tells us:

The average of LBORRES1 across all subjects is 13.33
The average of LBORRES2 is 13.43
The average of LBORRES3 is 13.40

4. lapply(): Element-wise Operations on Lists or Data Frame Columns

Description:
lapply() applies a function to each element of a list or each column of a data frame, always returning a list. It is commonly used for cleaning or transforming all columns of a domain.

Possible Uses:

Apply a function to each column of a data frame (e.g., uppercase, trim whitespace).
Apply a function to each element of a list (e.g., summary, length, custom transformation).
Convert all factors to characters or vice versa.

Input Table (Demographics - DM):

USUBJID	SEX	RACE
01-001	F	Asian
01-002	M	White
01-003	F	Black

R Example:

# Dummy DM data
dm <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  SEX = c("F", "M", "F"),
  RACE = c("Asian", "White", "Black"),
  stringsAsFactors = FALSE
)
char_cols <- sapply(dm, is.character)
dm[char_cols] <- lapply(dm[char_cols], toupper)

Output Table:

USUBJID	SEX	RACE
01-001	F	ASIAN
01-002	M	WHITE
01-003	F	BLACK

5. sapply(): Simplified Element-wise Operations

Description:
sapply() is similar to lapply() but tries to simplify the result to a vector or matrix. It is useful for getting summary statistics or properties for each element/column.

Possible Uses:

Get the length, class, or summary of each column.
Apply a function to each element and return a vector.
Quick checks or summaries for reporting.

Input Table (Adverse Events - AE):

AEDECOD
HEADACHE
NAUSEA
DIZZINESS

R Example:

# Dummy AE data
ae <- data.frame(AEDECOD = c("HEADACHE", "NAUSEA", "DIZZINESS"))
sapply(ae$AEDECOD, nchar)

Output:

AEDECOD	nchar
HEADACHE	8
NAUSEA	6
DIZZINESS	9

6. tapply(): Grouped Summaries

Description:
tapply() applies a function over subsets of a vector, defined by a factor (grouping variable). It is ideal for grouped summaries.

Possible Uses:

Calculate mean, sum, min, max by group (e.g., mean AGE by SEX, mean lab by visit).
Summarize adverse events by severity or seriousness.
Count occurrences by group.

Input Table:

USUBJID	SEX	AGE
01-001	F	34
01-002	M	40
01-003	F	29

R Example:

dm <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  SEX = c("F", "M", "F"),
  AGE = c(34, 40, 29)
)
tapply(dm$AGE, dm$SEX, mean)

Output:

SEX	Mean AGE
F	31.5
M	40

Example: Group by Multiple Columns with tapply()

To convert the tapply() output (which is typically a matrix or array) into a data frame, you can use the as.data.frame() function combined with reshape techniques. Here's how you can do it using the SDTM-style VS domain example: Example: tapply() → Data Frame

# Sample SDTM-style data
vs <- data.frame(
  USUBJID = c("SUBJ001", "SUBJ001", "SUBJ002", "SUBJ002", "SUBJ003", "SUBJ003"),
  VSTESTCD = c("SYSBP", "DIABP", "SYSBP", "DIABP", "SYSBP", "DIABP"),
  VSSTRESN = c(120, 80, 130, 85, 125, 82),
  VISIT = c("SCREENING", "SCREENING", "SCREENING", "SCREENING", "SCREENING", "SCREENING")
)

# Group by USUBJID and VSTESTCD, compute mean
result <- tapply(vs$VSSTRESN, list(vs$USUBJID, vs$VSTESTCD), mean)

# Convert to data frame
df_result <- as.data.frame(as.table(result))
colnames(df_result) <- c("USUBJID", "VSTESTCD", "MEAN_VSSTRESN")

df_result

Output:

USUBJID	VSTESTCD	MEAN_VSSTRESN
SUBJ001	DIABP	80
SUBJ002	DIABP	85
SUBJ003	DIABP	82
SUBJ001	SYSBP	120
SUBJ002	SYSBP	130
SUBJ003	SYSBP	125

Explanation

tapply(...): Computes the grouped mean.
as.table(...): Converts the array to a table object.
as.data.frame(...): Converts the table to a tidy data frame.
colnames(...): Renames columns for clarity.

This is a clean and efficient way to get grouped summary statistics in a data frame format using only base R.

7. mapply(): Parallel Operations Across Multiple Vectors

Description:
mapply() applies a function in parallel over multiple arguments (vectors/lists). It is useful for row-wise operations combining multiple columns.

Possible Uses:

Concatenate or combine values from multiple columns for reporting.
Create custom flags or labels using multiple variables.
Apply a function to corresponding elements of several vectors.

Input Table:

AEDECOD	AESEV
HEADACHE	MILD
NAUSEA	MODERATE
DIZZINESS	SEVERE

R Example:

ae <- data.frame(
  AEDECOD = c("HEADACHE", "NAUSEA", "DIZZINESS"),
  AESEV = c("MILD", "MODERATE", "SEVERE")
)
mapply(function(a, b) paste(a, b, sep = " - "), ae$AEDECOD, ae$AESEV)

Output:

AEDECOD	AESEV	Combined
HEADACHE	MILD	HEADACHE - MILD
NAUSEA	MODERATE	NAUSEA - MODERATE
DIZZINESS	SEVERE	DIZZINESS - SEVERE

8. replicate(): Simulations and Repeated Calculations

Description:
replicate() repeats an expression multiple times, useful for simulations or generating mock data.

Possible Uses:

Simulate random subject ages or lab values.
Bootstrap resampling.
Generate mock data for testing.

R Example:

set.seed(123)
replicate(5, sample(20:70, 1))

Output:
A vector of 5 random ages, e.g., [38 70 57 21 67]

9. calling Custom Function

We'll use a simulated VS (Vital Signs) domain and apply a custom function using the apply family.

Step 1: Simulate an SDTM-like VS Dataset

vs <- data.frame(
  USUBJID = rep(c("SUBJ001", "SUBJ002", "SUBJ003"), each = 4),
  VSTESTCD = rep(c("SYSBP", "DIABP"), times = 6),
  VISIT = rep(c("SCREENING", "WEEK1"), times = 6),
  VSSTRESN = c(120, 80, 130, 85, 125, 82, 118, 78, 135, 88, 140, 90)
)

Step 2: Define a Custom Function

cv <- function(x) {
  if (mean(x) == 0) return(NA)
  return(sd(x) / mean(x))
}

Step 3: Apply the Custom Function Using tapply()

tapply(vs$VSSTRESN, list(vs$VSTESTCD, vs$VISIT), cv)

Output:

         SCREENING     WEEK1
DIABP   0.03608439 0.06681531
SYSBP   0.04166667 0.07856742

Step 4: Convert to Data Frame (Optional)

cv_result <- as.data.frame(as.table(
  tapply(vs$VSSTRESN, list(vs$VSTESTCD, vs$VISIT), cv)
))
colnames(cv_result) <- c("VSTESTCD", "VISIT", "CV")

Resource download links

1.4.16.-The-apply-Family-Functions-in-R-vs-SAS.zip

⁂

1.4. Migrating from SAS to R: A Skill Conversion Guide