contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.15. Recoding Variables in SAS vs R

1. Introduction

Recoding variables is a common data management task in clinical trials, especially for SDTM domains such as AE (Adverse Events), DM (Demographics), and LB (Laboratory). Recoding allows you to:

  • Collapse categories (e.g., recode AESEV from "MILD"/"MODERATE"/"SEVERE" to "NON-SERIOUS"/"SERIOUS")
  • Reverse code or group lab results (e.g., recode LBORRES to "LOW"/"NORMAL"/"HIGH")
  • Assign new values for analysis or reporting

It is best practice to create new variables for recoded values to preserve the original data.


2. Basic Recoding: SAS vs R

Task SAS R (base/car)
Recode with IF/THEN IF AESEV="SEVERE" THEN AESER="Y"; ifelse(AESEV == "SEVERE", "Y", "N")
Recode with array/loop ARRAY aesev{*} AESEV1-AESEV4; ... lapply() or mutate(across())
Recode with value labels PROC FORMAT factor() or recode() (car package)

3. Input Example (SDTM AE Domain)

Suppose you have the following AE data:

USUBJID AEDECOD AESEV AESER
01-001 HEADACHE MILD N
01-002 NAUSEA MODERATE N
01-003 DIZZINESS SEVERE Y
01-004 FATIGUE SEVERE Y
01-005 NAUSEA MILD N

4. Recoding a Single Variable

SAS Example:

data ae;
  set ae;
  if AESEV = "SEVERE" then AESER = "Y";
  else AESER = "N";
run;
  • Recodes AESER: sets to "Y" if AESEV is "SEVERE", otherwise "N".

R Example (base):

# Dummy AE data
ae <- data.frame(
  USUBJID = c("01-001", "01-002", "01-003", "01-004", "01-005"),
  AEDECOD = c("HEADACHE", "NAUSEA", "DIZZINESS", "FATIGUE", "NAUSEA"),
  AESEV = c("MILD", "MODERATE", "SEVERE", "SEVERE", "MILD"),
  AESER = c("N", "N", "Y", "Y", "N")
)

ae$AESER <- ifelse(ae$AESEV == "SEVERE", "Y", "N")
  • Uses ifelse() for recoding.

R Example (car package):

library(car)
# Dummy AE data already defined above
ae$AESEV_REC <- recode(ae$AESEV, "'MILD'='NON-SERIOUS'; 'MODERATE'='NON-SERIOUS'; 'SEVERE'='SERIOUS'")
  • Collapses AESEV into "NON-SERIOUS" and "SERIOUS".

Expected Output:

AESEV AESER AESEV_REC
MILD N NON-SERIOUS
MODERATE N NON-SERIOUS
SEVERE Y SERIOUS
SEVERE Y SERIOUS
MILD N NON-SERIOUS

5. Recoding Multiple Variables (e.g., LBORRES in SDTM LB Domain)

Suppose you want to recode multiple lab result columns (e.g., LBORRES1, LBORRES2) to "LOW", "NORMAL", "HIGH" based on reference ranges.

SAS Example:

data lb;
  set lb;
  array lborres{2} LBORRES1 LBORRES2;
  array lbnorm{2} LBNORM1 LBNORM2;
  do i = 1 to 2;
    if lborres{i} < 70 then lbnorm{i} = "LOW";
    else if lborres{i} > 110 then lbnorm{i} = "HIGH";
    else lbnorm{i} = "NORMAL";
  end;
run;

R Example (vectorized):

# Dummy LB data
lb <- data.frame(
  LBORRES1 = c(65, 90, 115),
  LBORRES2 = c(120, 80, 60)
)

lb$LBNORM1 <- ifelse(lb$LBORRES1 < 70, "LOW",
                ifelse(lb$LBORRES1 > 110, "HIGH", "NORMAL"))
lb$LBNORM2 <- ifelse(lb$LBORRES2 < 70, "LOW",
                ifelse(lb$LBORRES2 > 110, "HIGH", "NORMAL"))

Or, for many columns:

lab_cols <- c("LBORRES1", "LBORRES2")
lb[paste0("LBNORM", 1:2)] <- lapply(lb[lab_cols], function(x)
  ifelse(x < 70, "LOW", ifelse(x > 110, "HIGH", "NORMAL")))

Expected Output:

LBORRES1 LBORRES2 LBNORM1 LBNORM2
65 120 LOW HIGH
90 80 NORMAL NORMAL
115 60 HIGH LOW

6. Recoding Continuous to Categorical (e.g., Age Groups in DM Domain)

R Example:

# Dummy DM data
dm <- data.frame(
  AGE = c(12, 34, 70)
)

dm$AGEGRP <- cut(dm$AGE, breaks = c(0, 18, 65, Inf),
                 labels = c("Child", "Adult", "Senior"), right = FALSE)
  • Categorizes AGE into "Child", "Adult", "Senior".

Expected Output:

AGE AGEGRP
12 Child
34 Adult
70 Senior

7. Reverse Coding (e.g., Questionnaire Scores)

Suppose QSVAL is a 1–5 scale, and you want to reverse it.

R Example:

# Dummy QS data
qs <- data.frame(
  QSVAL = 1:5
)

qs$QSVAL_REV <- 6 - qs$QSVAL
  • 1→5, 2→4, 3→3, 4→2, 5→1

Expected Output:

QSVAL QSVAL_REV
1 5
2 4
3 3
4 2
5 1

8. Beyond Basics: Advanced Recoding

  • Multiple recodes at once: Use dplyr::mutate(across(...)) for many columns.
  • Custom functions: Write your own recoding logic for complex SDTM domains.
  • Factor recoding: Use forcats::fct_recode() for factor levels (e.g., DM$SEX).

R Example: dplyr/across

library(dplyr)
# Dummy AE data already defined above
ae <- ae %>%
  mutate(across(starts_with("AESEV"), ~recode(., "'MILD'='NON-SERIOUS';'MODERATE'='NON-SERIOUS';'SEVERE'='SERIOUS'")))

R Example: forcats

library(forcats)
# Dummy DM data for SEX
dm <- data.frame(
  SEX = c("F", "M", "F", "M")
)
dm$SEX <- fct_recode(dm$SEX, Female = "F", Male = "M")

Expected Output:

SEX recoded_SEX
F Female
M Male

9. Input and Output Table: Recoding Example (AE Domain)

Input Table:

USUBJID AEDECOD AESEV
01-001 HEADACHE MILD
01-002 NAUSEA SEVERE

R Recoding:

# Dummy AE data for recoding example
ae <- data.frame(
  USUBJID = c("01-001", "01-002"),
  AEDECOD = c("HEADACHE", "NAUSEA"),
  AESEV = c("MILD", "SEVERE")
)

ae$AESER <- ifelse(ae$AESEV == "SEVERE", "Y", "N")

Output Table:

USUBJID AEDECOD AESEV AESER
01-001 HEADACHE MILD N
01-002 NAUSEA SEVERE Y

10. Key Points and Best Practices

  • Always create new variables for recoded values to preserve the original SDTM data.
  • Use vectorized functions (lapply, mutate(across())) for efficiency.
  • For categorical/factor variables, use factor() or forcats for level recoding.
  • Document your recoding logic for traceability and regulatory compliance.
  • For complex recoding, custom functions or lookup tables can be helpful.

**Resource download links**

1.4.15.-Recoding-Variables-in-SAS-vs-R.zip