contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.16. The apply Family Functions in R vs SAS

1. Introduction

The apply family of functions in R (apply, lapply, sapply, vapply, mapply, tapply, etc.) provides a powerful and concise way to perform operations on data structures without explicit loops. These functions are essential for efficient data manipulation, especially in clinical trial programming, and offer a flexible alternative to SAS's array processing and summary procedures.


2. SAS vs R: Program Comparison

Task SAS Approach R Approach (apply family)
Row/column summary PROC MEANS, DATA step loops apply()
List-wise operation Array processing, macros lapply(), sapply()
Grouped summary PROC MEANS, BY statement tapply(), aggregate(), by()
Parallel operation Macro loops mapply()

2A. Table: Common Use Cases for apply Family Functions

Function Typical Use Cases
apply - Row/column means, sums, min, max, sd;
- Summarize repeated measures (labs, vitals);
- Custom row/col ops
lapply - Apply function to each column (e.g., uppercase, trim);
- List element summaries;
- Convert types
sapply - Simplified summaries (length, class);
- Quick checks per column/element;
- Vectorized transformations
tapply - Grouped summaries (mean/sum by group);
- AE by severity, labs by visit;
- Counts by group
mapply - Row-wise operations across columns;
- Concatenate/combine columns;
- Custom flags/labels
replicate - Simulations, bootstrapping;
- Generate mock data;
- Repeated random sampling
vapply - Type-safe version of sapply;
- Enforce output type for summaries;
- Production code safety

3. apply(): Row/Column Operations on Matrices and Data Frames

Description:
apply() is used to apply a function to the rows or columns of a matrix or numeric data frame. It is especially useful for summarizing repeated measures (e.g., labs, vitals) for each subject.

Possible Uses:

  • Calculate row or column means, sums, min, max, standard deviation, etc.
  • Summarize repeated measures for each subject (e.g., multiple lab results, vital signs).
  • Apply custom functions to each row or column.

Input Table (Lab Results - LB):

USUBJID LBTESTCD LBORRES1 LBORRES2 LBORRES3
01-001 HGB 13.2 13.5 13.1
01-002 HGB 12.8 13.0 12.9
01-003 HGB 14.0 13.8 14.2
  • The second argument in apply function MARGIN, determines whether the function should be applied across rows or columns:
  • MARGIN = 1 → apply the function across rows (i.e., row-wise)
  • MARGIN = 2 → apply the function across columns (i.e., column-wise)

R Row Example:

# Dummy LB data
lb <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  LBTESTCD = c("HGB", "HGB", "HGB"),
  LBORRES1 = c(13.2, 12.8, 14.0),
  LBORRES2 = c(13.5, 13.0, 13.8),
  LBORRES3 = c(13.1, 12.9, 14.2)
)
lb$row_mean <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, mean)
lb$row_sd <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, sd)
lb$row_min <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, min)
lb$row_max <- apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 1, max)

Output Table:

USUBJID LBTESTCD LBORRES1 LBORRES2 LBORRES3 row_mean row_sd row_min row_max
01-001 HGB 13.2 13.5 13.1 13.27 0.208 13.1 13.5
01-002 HGB 12.8 13.0 12.9 12.90 0.100 12.8 13.0
01-003 HGB 14.0 13.8 14.2 14.00 0.200 13.8 14.2

R Column Example: We’ll now apply a function column-wise to the three lab result columns.

apply(lb[, c("LBORRES1", "LBORRES2", "LBORRES3")], 2, mean)

Explanation

  • lb[, c("LBORRES1", "LBORRES2", "LBORRES3")]: selects the numeric lab result columns.
  • 2: tells apply() to operate column-wise.
  • mean: calculates the mean of each column.

Output

LBORRES1 LBORRES2 LBORRES3 
   13.33     13.43     13.40 

This tells us:

  • The average of LBORRES1 across all subjects is 13.33
  • The average of LBORRES2 is 13.43
  • The average of LBORRES3 is 13.40

4. lapply(): Element-wise Operations on Lists or Data Frame Columns

Description:
lapply() applies a function to each element of a list or each column of a data frame, always returning a list. It is commonly used for cleaning or transforming all columns of a domain.

Possible Uses:

  • Apply a function to each column of a data frame (e.g., uppercase, trim whitespace).
  • Apply a function to each element of a list (e.g., summary, length, custom transformation).
  • Convert all factors to characters or vice versa.

Input Table (Demographics - DM):

USUBJID SEX RACE
01-001 F Asian
01-002 M White
01-003 F Black

R Example:

# Dummy DM data
dm <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  SEX = c("F", "M", "F"),
  RACE = c("Asian", "White", "Black"),
  stringsAsFactors = FALSE
)
char_cols <- sapply(dm, is.character)
dm[char_cols] <- lapply(dm[char_cols], toupper)

Output Table:

USUBJID SEX RACE
01-001 F ASIAN
01-002 M WHITE
01-003 F BLACK

5. sapply(): Simplified Element-wise Operations

Description:
sapply() is similar to lapply() but tries to simplify the result to a vector or matrix. It is useful for getting summary statistics or properties for each element/column.

Possible Uses:

  • Get the length, class, or summary of each column.
  • Apply a function to each element and return a vector.
  • Quick checks or summaries for reporting.

Input Table (Adverse Events - AE):

AEDECOD
HEADACHE
NAUSEA
DIZZINESS

R Example:

# Dummy AE data
ae <- data.frame(AEDECOD = c("HEADACHE", "NAUSEA", "DIZZINESS"))
sapply(ae$AEDECOD, nchar)

Output:

AEDECOD nchar
HEADACHE 8
NAUSEA 6
DIZZINESS 9

6. tapply(): Grouped Summaries

Description:
tapply() applies a function over subsets of a vector, defined by a factor (grouping variable). It is ideal for grouped summaries.

Possible Uses:

  • Calculate mean, sum, min, max by group (e.g., mean AGE by SEX, mean lab by visit).
  • Summarize adverse events by severity or seriousness.
  • Count occurrences by group.

Input Table:

USUBJID SEX AGE
01-001 F 34
01-002 M 40
01-003 F 29

R Example:

dm <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003"),
  SEX = c("F", "M", "F"),
  AGE = c(34, 40, 29)
)
tapply(dm$AGE, dm$SEX, mean)

Output:

SEX Mean AGE
F 31.5
M 40

Example: Group by Multiple Columns with tapply()

To convert the tapply() output (which is typically a matrix or array) into a data frame, you can use the as.data.frame() function combined with reshape techniques. Here's how you can do it using the SDTM-style VS domain example: Example: tapply() → Data Frame

# Sample SDTM-style data
vs <- data.frame(
  USUBJID = c("SUBJ001", "SUBJ001", "SUBJ002", "SUBJ002", "SUBJ003", "SUBJ003"),
  VSTESTCD = c("SYSBP", "DIABP", "SYSBP", "DIABP", "SYSBP", "DIABP"),
  VSSTRESN = c(120, 80, 130, 85, 125, 82),
  VISIT = c("SCREENING", "SCREENING", "SCREENING", "SCREENING", "SCREENING", "SCREENING")
)

# Group by USUBJID and VSTESTCD, compute mean
result <- tapply(vs$VSSTRESN, list(vs$USUBJID, vs$VSTESTCD), mean)

# Convert to data frame
df_result <- as.data.frame(as.table(result))
colnames(df_result) <- c("USUBJID", "VSTESTCD", "MEAN_VSSTRESN")

df_result

Output:

USUBJID VSTESTCD MEAN_VSSTRESN
SUBJ001 DIABP 80
SUBJ002 DIABP 85
SUBJ003 DIABP 82
SUBJ001 SYSBP 120
SUBJ002 SYSBP 130
SUBJ003 SYSBP 125

Explanation

  • tapply(...): Computes the grouped mean.
  • as.table(...): Converts the array to a table object.
  • as.data.frame(...): Converts the table to a tidy data frame.
  • colnames(...): Renames columns for clarity.

This is a clean and efficient way to get grouped summary statistics in a data frame format using only base R.


7. mapply(): Parallel Operations Across Multiple Vectors

Description:
mapply() applies a function in parallel over multiple arguments (vectors/lists). It is useful for row-wise operations combining multiple columns.

Possible Uses:

  • Concatenate or combine values from multiple columns for reporting.
  • Create custom flags or labels using multiple variables.
  • Apply a function to corresponding elements of several vectors.

Input Table:

AEDECOD AESEV
HEADACHE MILD
NAUSEA MODERATE
DIZZINESS SEVERE

R Example:

ae <- data.frame(
  AEDECOD = c("HEADACHE", "NAUSEA", "DIZZINESS"),
  AESEV = c("MILD", "MODERATE", "SEVERE")
)
mapply(function(a, b) paste(a, b, sep = " - "), ae$AEDECOD, ae$AESEV)

Output:

AEDECOD AESEV Combined
HEADACHE MILD HEADACHE - MILD
NAUSEA MODERATE NAUSEA - MODERATE
DIZZINESS SEVERE DIZZINESS - SEVERE

8. replicate(): Simulations and Repeated Calculations

Description:
replicate() repeats an expression multiple times, useful for simulations or generating mock data.

Possible Uses:

  • Simulate random subject ages or lab values.
  • Bootstrap resampling.
  • Generate mock data for testing.

R Example:

set.seed(123)
replicate(5, sample(20:70, 1))

Output:
A vector of 5 random ages, e.g., [38 70 57 21 67]


9. calling Custom Function

We'll use a simulated VS (Vital Signs) domain and apply a custom function using the apply family.

Step 1: Simulate an SDTM-like VS Dataset

vs <- data.frame(
  USUBJID = rep(c("SUBJ001", "SUBJ002", "SUBJ003"), each = 4),
  VSTESTCD = rep(c("SYSBP", "DIABP"), times = 6),
  VISIT = rep(c("SCREENING", "WEEK1"), times = 6),
  VSSTRESN = c(120, 80, 130, 85, 125, 82, 118, 78, 135, 88, 140, 90)
)

Step 2: Define a Custom Function

cv <- function(x) {
  if (mean(x) == 0) return(NA)
  return(sd(x) / mean(x))
}

Step 3: Apply the Custom Function Using tapply()

tapply(vs$VSSTRESN, list(vs$VSTESTCD, vs$VISIT), cv)

Output:

         SCREENING     WEEK1
DIABP   0.03608439 0.06681531
SYSBP   0.04166667 0.07856742

Step 4: Convert to Data Frame (Optional)

cv_result <- as.data.frame(as.table(
  tapply(vs$VSSTRESN, list(vs$VSTESTCD, vs$VISIT), cv)
))
colnames(cv_result) <- c("VSTESTCD", "VISIT", "CV")

**Resource download links**

1.4.16.-The-apply-Family-Functions-in-R-vs-SAS.zip