contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.11. IF/ELSE statements in SAS vs R equivalent

1. Basic Conditional Logic: Simple Value Transformation

Capability SAS (IF/ELSE) R (case_when, if_else)
Basic conditionals if condition then value; case_when(condition ~ value) or if_else(condition, value_if_true, value_if_false)
Multiple conditions else if condition then value; Additional conditions in case_when()
Default value else value; TRUE ~ value in case_when()
Multiple variables Using DO blocks Multiple mutate() calls or repeated conditions

SAS Example

data ae_severity;
  set ae;
  length severity_cat $20 urgency_level 8;
  if aesev = 'MILD' then severity_cat = 'Non-significant';
  else if aesev = 'MODERATE' then severity_cat = 'Notable';
  else if aesev = 'SEVERE' then severity_cat = 'Significant';
  else if aesev = 'LIFE THREATENING' then severity_cat = 'Critical';

  if aesev = 'MILD' then urgency_level = 1;
  else if aesev = 'MODERATE' then urgency_level = 2;
  else if aesev = 'SEVERE' then urgency_level = 3;
  else if aesev = 'LIFE THREATENING' then urgency_level = 4;
run;

Explanation (SAS):

  • Uses if/else if statements to categorize adverse events based on severity (AESEV)
  • Creates both character (severity_cat) and numeric (urgency_level) variables
  • Each severity level gets a corresponding category and numeric urgency level
  • Sequential logic ensures each adverse event falls into only one category

R Example

library(dplyr)

# Dummy data for AE example
ae <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004"),
  AEDECOD = c("HEADACHE", "NAUSEA", "DYSPNEA", "ANAPHYLAXIS"),
  AESEV = c("MILD", "MODERATE", "SEVERE", "LIFE THREATENING")
)

ae_severity <- ae %>%
  mutate(
    SEVERITY_CAT = case_when(
      AESEV == "MILD" ~ "Non-significant",
      AESEV == "MODERATE" ~ "Notable",
      AESEV == "SEVERE" ~ "Significant",
      AESEV == "LIFE THREATENING" ~ "Critical"
    ),
    URGENCY_LEVEL = case_when(
      AESEV == "MILD" ~ 1,
      AESEV == "MODERATE" ~ 2,
      AESEV == "SEVERE" ~ 3,
      AESEV == "LIFE THREATENING" ~ 4
    )
  )

Explanation (R):

  • Uses case_when() function from dplyr package
  • Conditions appear on the left side of ~, return values on the right
  • Multiple conditions evaluated in order until one is TRUE
  • Multiple variables can be created in a single mutate() call
  • case_when() handles each column separately (no direct equivalent to DO blocks)

Input Table

USUBJID AEDECOD AESEV
001-001 HEADACHE MILD
001-002 NAUSEA MODERATE
001-003 DYSPNEA SEVERE
001-004 ANAPHYLAXIS LIFE THREATENING

Expected Output

USUBJID AEDECOD AESEV SEVERITY_CAT URGENCY_LEVEL
001-001 HEADACHE MILD Non-significant 1
001-002 NAUSEA MODERATE Notable 2
001-003 DYSPNEA SEVERE Significant 3
001-004 ANAPHYLAXIS LIFE THREATENING Critical 4

2. Compound Conditions: Multiple Criteria

Capability SAS R
AND conditions if cond1 & cond2 then value; case_when(cond1 & cond2 ~ value)
OR conditions `if cond1 cond2 then value;`
Mixed conditions `if (cond1 & cond2) cond3 then value;`

SAS Example

data vs_category;
  set vs;
  if vstestcd = 'SYSBP' & vsstresn > 180 then clinical_flag = 'Critical Hypertension';
  else if vstestcd = 'SYSBP' & vsstresn > 140 then clinical_flag = 'Hypertension';
  else if vstestcd = 'DIABP' & vsstresn > 120 then clinical_flag = 'Critical Diastolic';
  else if vstestcd = 'TEMP' & vsstresn > 38.5 then clinical_flag = 'Fever';
  else clinical_flag = 'Normal Range';
run;

Explanation (SAS):

  • Combines multiple conditions (vital sign test code and result value)
  • Uses logical AND (&) to ensure both conditions must be met
  • Classifies vital signs measurements based on clinical thresholds
  • Creates a flagging variable for abnormal values requiring attention

R Example

library(dplyr)

# Dummy data for VS example
vs <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
  VSTESTCD = c("SYSBP", "SYSBP", "DIABP", "TEMP", "PULSE"),
  VSSTRESN = c(185, 145, 125, 39.2, 88)
)

vs_category <- vs %>%
  mutate(
    CLINICAL_FLAG = case_when(
      VSTESTCD == "SYSBP" & VSSTRESN > 180 ~ "Critical Hypertension",
      VSTESTCD == "SYSBP" & VSSTRESN > 140 ~ "Hypertension",
      VSTESTCD == "DIABP" & VSSTRESN > 120 ~ "Critical Diastolic",
      VSTESTCD == "TEMP" & VSSTRESN > 38.5 ~ "Fever",
      TRUE ~ "Normal Range"
    )
  )

Explanation (R):

  • Combines conditions with & (AND) operator, just like in SAS
  • Values that don't match any condition become NA by default
  • The case_when() function processes conditions in order
  • Each observation gets assigned the first matching condition's value

Input Table

USUBJID VSTESTCD VSSTRESN
001-001 SYSBP 185
001-002 SYSBP 145
001-003 DIABP 125
001-004 TEMP 39.2
001-005 PULSE 88

Expected Output

USUBJID VSTESTCD VSSTRESN CLINICAL_FLAG
001-001 SYSBP 185 Critical Hypertension
001-002 SYSBP 145 Hypertension
001-003 DIABP 125 Critical Diastolic
001-004 TEMP 39.2 Fever
001-005 PULSE 88 Normal Range

3. Default Values: Handling Unmatched Conditions

Capability SAS R
Default value else value; TRUE ~ value in case_when()
Catch-all condition Final else without condition TRUE ~ as final condition

SAS Example

data cm_importance;
  set cm;
  if cmclas = 'ACE INHIBITORS' & cmstrf = 'Y' then priority = 'Critical';
  else if cmclas = 'ACE INHIBITORS' then priority = 'High';
  else if cmstrf = 'Y' then priority = 'Medium';
  else if cmstdy < 0 then priority = 'Baseline';
  else priority = 'Standard';
run;

Explanation (SAS):

  • Categorizes concomitant medications based on drug class and study relevance flag
  • Adds a final else statement as a catch-all for medications not meeting special criteria
  • All medications are guaranteed to have a priority assigned
  • The catch-all is only executed if all previous conditions are FALSE

R Example

library(dplyr)

# Dummy data for CM example
cm <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
  CMCLAS = c("ACE INHIBITORS", "ACE INHIBITORS", "ANALGESICS", "ANALGESICS", "ANTIBIOTICS"),
  CMSTRF = c("Y", "N", "Y", "N", "N"),
  CMSTDY = c(10, 5, -3, -5, 15)
)

cm_importance <- cm %>%
  mutate(
    PRIORITY = case_when(
      CMCLAS == "ACE INHIBITORS" & CMSTRF == "Y" ~ "Critical",
      CMCLAS == "ACE INHIBITORS" ~ "High",
      CMSTRF == "Y" ~ "Medium",
      CMSTDY < 0 ~ "Baseline",
      TRUE ~ "Standard"
    )
  )

Explanation (R):

  • Uses TRUE ~ as the final condition to catch all remaining cases
  • This works because TRUE is always TRUE for any observation
  • Equivalent to the else statement in SAS
  • Ensures all observations get a value (no NAs for unmatched conditions)

Input Table

USUBJID CMCLAS CMSTRF CMSTDY
001-001 ACE INHIBITORS Y 10
001-002 ACE INHIBITORS N 5
001-003 ANALGESICS Y -3
001-004 ANALGESICS N -5
001-005 ANTIBIOTICS N 15

Expected Output

USUBJID CMCLAS CMSTRF CMSTDY PRIORITY
001-001 ACE INHIBITORS Y 10 Critical
001-002 ACE INHIBITORS N 5 High
001-003 ANALGESICS Y -3 Medium
001-004 ANALGESICS N -5 Baseline
001-005 ANTIBIOTICS N 15 Standard

4. Multiple Actions: DO Blocks and Equivalent R Approaches

Capability SAS R
Multiple actions if condition then do; stmt1; stmt2; end; Multiple variables in mutate() with same conditions
Block structure DO/END blocks encapsulate multiple statements No direct equivalent, use separate column transformations
Action sequence Actions within DO block executed in order Column operations in mutate() executed independently

SAS Example

data ex_derived;
  set ex;
  if exdose > 100 & exdosu = 'mg' then do;
    dose_category = 'High';
    monitoring_freq = 'Daily';
    max_duration = 14;
  end;
  else if exdose > 50 & exdosu = 'mg' then do;
    dose_category = 'Medium';
    monitoring_freq = 'Weekly';
    max_duration = 28;
  end;
  else if exdose > 0 then do;
    dose_category = 'Low';
    monitoring_freq = 'Monthly';
    max_duration = 90;
  end;
  else do;
    dose_category = 'Unknown';
    monitoring_freq = 'Unknown';
    max_duration = 0;
  end;
  
  /* Calculate additional derived values */
  total_dose = exdose * exdur;
  avg_daily = total_dose / exdur;
run;

Explanation (SAS):

  • Uses DO blocks to perform multiple related actions for each condition
  • For each order, assigns appropriate discount, shipping cost, and delivery time
  • Final calculations use the values assigned in the conditional blocks
  • Clear visual grouping of related operations per condition

R Example

library(dplyr)

# Dummy data for EX example
ex <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004"),
  EXTRT = c("DRUG-A", "DRUG-A", "DRUG-B", "PLACEBO"),
  EXDOSE = c(120, 75, 25, 0),
  EXDOSU = c("mg", "mg", "mg", "mg"),
  EXDUR = c(7, 14, 30, 30)
)

ex_derived <- ex %>%
  mutate(
    DOSE_CATEGORY = case_when(
      EXDOSE > 100 & EXDOSU == "mg" ~ "High",
      EXDOSE > 50 & EXDOSU == "mg" ~ "Medium",
      EXDOSE > 0 ~ "Low",
      TRUE ~ "Unknown"
    ),
    MONITORING_FREQ = case_when(
      EXDOSE > 100 & EXDOSU == "mg" ~ "Daily",
      EXDOSE > 50 & EXDOSU == "mg" ~ "Weekly",
      EXDOSE > 0 ~ "Monthly",
      TRUE ~ "Unknown"
    ),
    MAX_DURATION = case_when(
      EXDOSE > 100 & EXDOSU == "mg" ~ 14,
      EXDOSE > 50 & EXDOSU == "mg" ~ 28,
      EXDOSE > 0 ~ 90,
      TRUE ~ 0
    )
  ) %>%
  mutate(
    TOTAL_DOSE = EXDOSE * EXDUR,
    AVG_DAILY = TOTAL_DOSE / EXDUR
  )

Explanation (R):

  • No direct equivalent to DO blocks in dplyr's approach
  • Instead, repeat the conditions for each variable being modified
  • Each column transformation uses its own case_when() function
  • R's approach is more column-oriented, while SAS is row-oriented

Input Table

USUBJID EXTRT EXDOSE EXDOSU EXDUR
001-001 DRUG-A 120 mg 7
001-002 DRUG-A 75 mg 14
001-003 DRUG-B 25 mg 30
001-004 PLACEBO 0 mg 30

Expected Output

USUBJID EXTRT EXDOSE EXDOSU EXDUR DOSE_CATEGORY MONITORING_FREQ MAX_DURATION TOTAL_DOSE AVG_DAILY
001-001 DRUG-A 120 mg 7 High Daily 14 840 120
001-002 DRUG-A 75 mg 14 Medium Weekly 28 1050 75
001-003 DRUG-B 25 mg 30 Low Monthly 90 750 25
001-004 PLACEBO 0 mg 30 Unknown Unknown 0 0 0

5. Handling Missing Values: NA Conditions

Capability SAS R
Missing value check if var = . or if var = '' or if missing(var) is.na(var) within conditions
Explicit missing handling Special missing value operators and functions Special handling with is.na() or missing parameter
Missing as default No special handling (missing remains missing) case_when() returns NA for unmatched conditions

SAS Example

data qs_complete;
  set qs;
  if qsstresn = 5 then response_level = "Complete Response";
  else if qsstresn = 4 then response_level = "Strong Response";
  else if qsstresn = 3 then response_level = "Moderate Response";
  else if qsstresn = 2 then response_level = "Mild Response";
  else if qsstresn = 1 then response_level = "No Response";
  else if qsstresn = . then response_level = "Missing Response";
run;

Explanation (SAS):

  • Handles survey responses with satisfaction scores from 1-5
  • Special condition for missing scores (satisfaction_score = .)
  • Categorizes responses into descriptive result categories
  • Explicitly handles missing values with a specific condition

R Example - Approach 1: Using case_when() with is.na()

library(dplyr)

# Dummy data for QS example
qs <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
  QSTEST = rep("PAIN ASSESSMENT", 5),
  QSSTRESN = c(5, 3, 1, NA, 4)
)

qs_complete1 <- qs %>%
  mutate(
    RESPONSE_LEVEL = case_when(
      QSSTRESN == 5 ~ "Complete Response",
      QSSTRESN == 4 ~ "Strong Response",
      QSSTRESN == 3 ~ "Moderate Response",
      QSSTRESN == 2 ~ "Mild Response",
      QSSTRESN == 1 ~ "No Response",
      is.na(QSSTRESN) ~ "Missing Response"
    )
  )

Explanation (R - Approach 1):

  • Uses is.na() function to explicitly test for NA values
  • Special handling required as case_when() silently skips NA inputs
  • Order matters: test for exact match, non-missing non-match, then missing
  • !is.na() ensures we don't incorrectly evaluate NA values

R Example - Approach 2: Using if_else() chain

library(dplyr)

# Dummy data for QS example (already defined above)
qs_complete2 <- qs %>%
  mutate(
    RESPONSE_LEVEL = if_else(QSSTRESN == 5, "Complete Response",
                    if_else(QSSTRESN == 4, "Strong Response",
                    if_else(QSSTRESN == 3, "Moderate Response",
                    if_else(QSSTRESN == 2, "Mild Response",
                    if_else(QSSTRESN == 1, "No Response",
                          "Missing Response")))))
  )

Explanation (R - Approach 2):

  • if_else() provides a simpler syntax for simple conditions
  • Takes condition, true value, false value, and missing value
  • More concise than case_when() for simple binary conditions
  • The missing parameter specifies what to return for NA inputs

Input Table

USUBJID QSTEST QSSTRESN
001-001 PAIN ASSESSMENT 5
001-002 PAIN ASSESSMENT 3
001-003 PAIN ASSESSMENT 1
001-004 PAIN ASSESSMENT NA
001-005 PAIN ASSESSMENT 4

Expected Output

USUBJID QSTEST QSSTRESN RESPONSE_LEVEL
001-001 PAIN ASSESSMENT 5 Complete Response
001-002 PAIN ASSESSMENT 3 Moderate Response
001-003 PAIN ASSESSMENT 1 No Response
001-004 PAIN ASSESSMENT NA Missing Response
001-005 PAIN ASSESSMENT 4 Strong Response

6. Beyond Basics: Complex Conditional Logic

Capability SAS R
Nested conditions if cond1 then if cond2 then value; Combine with & or nest case_when() calls
Complex expressions Any valid SAS expression in condition Any valid R expression in condition
Computed conditions Variables or functions in condition Variables or functions in condition

SAS Example

data lb_flagging;
  set lb;
  
  /* Direct lab abnormality flagging based on reference ranges */
  if lbstresn > (lbstnrhi * 3) & lbstnrhi > 0 then do;
    abnormality = "Critically High";
    flag = "H++";
    action = "Repeat Test Immediately";
    score = 3;
  end;
  else if lbstresn > lbstnrhi & lbstnrhi > 0 then do;
    abnormality = "High";
    flag = "H";
    action = "Monitor";
    score = 1;
  end;
  else if lbstresn < (lbstnrlo * 0.5) & lbstnrlo > 0 then do;
    abnormality = "Critically Low";
    flag = "L++";
    action = "Repeat Test Immediately";
    score = 3;
  end;
  else if lbstresn < lbstnrlo & lbstnrlo > 0 then do;
    abnormality = "Low";
    flag = "L";
    action = "Monitor";
    score = 1;
  end;
  else do;
    abnormality = "Normal";
    flag = "N";
    action = "No Action";
    score = 0;
  end;
  
  /* Additional processing for specific tests */
  if lbtestcd = "ALT" then do;
    if lbstresn > (lbstnrhi * 3) then priority = "Hepatic Alert";
    else if lbstresn > lbstnrhi then priority = "Hepatic Monitor";
    else priority = "Routine";
  end;
  else priority = "Standard";
run;

Explanation (SAS):

  • Directly compares lab results against reference ranges
  • Uses simple multipliers for critical thresholds (3x upper limit, 0.5x lower limit)
  • Assigns a simplified score based on severity level
  • Includes special processing for liver function tests (ALT)
  • Avoids unnecessary intermediate calculations

R Example

library(dplyr)

# Dummy data for LB example
lb <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
  LBTESTCD = c("ALT", "AST", "HGB", "PLT", "ALT"),
  LBSTRESN = c(150, 55, 9, 140, 25),
  LBSTNRLO = c(10, 10, 12, 150, 10),
  LBSTNRHI = c(40, 40, 16, 450, 40)
)

lb_flagging <- lb %>%
  mutate(
    # Direct lab abnormality flagging based on reference ranges
    ABNORMALITY = case_when(
      LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "Critically High",
      LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "High",
      LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "Critically Low",
      LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "Low",
      TRUE ~ "Normal"
    ),
    FLAG = case_when(
      LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "H++",
      LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "H",
      LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "L++",
      LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "L",
      TRUE ~ "N"
    ),
    ACTION = case_when(
      LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "Repeat Test Immediately",
      LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "Repeat Test Immediately",
      LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "Monitor",
      LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "Monitor",
      TRUE ~ "No Action"
    ),
    SCORE = case_when(
      LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ 3,
      LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ 3,
      LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ 1,
      LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ 1,
      TRUE ~ 0
    )
  ) %>%
  # Additional processing for specific tests
  mutate(
    PRIORITY = case_when(
      LBTESTCD == "ALT" & LBSTRESN > (LBSTNRHI * 3) ~ "Hepatic Alert",
      LBTESTCD == "ALT" & LBSTRESN > LBSTNRHI ~ "Hepatic Monitor",
      LBTESTCD == "ALT" ~ "Routine",
      TRUE ~ "Standard"
    )
  )

Explanation (R):

  • Uses direct comparisons with reference ranges rather than calculating intermediate ratios
  • Simplified scoring with straightforward severity levels (3 = critical, 1 = abnormal, 0 = normal)
  • Maintains the same logical structure and outcomes as the original example
  • Makes the code more readable by removing unnecessary complexity
  • Still allows for special handling of specific test types (ALT)

Input Table

USUBJID LBTESTCD LBSTRESN LBSTNRLO LBSTNRHI
001-001 ALT 150 10 40
001-002 AST 55 10 40
001-003 HGB 9 12 16
001-004 PLT 140 150 450
001-005 ALT 25 10 40

Expected Output

USUBJID LBTESTCD ABNORMALITY FLAG ACTION SCORE PRIORITY
001-001 ALT Critically High H++ Repeat Test Immediately 3 Hepatic Alert
001-002 AST High H Monitor 1 Standard
001-003 HGB Low L Monitor 1 Standard
001-004 PLT Low L Monitor 1 Standard
001-005 ALT Normal N No Action 0 Routine

7. Beyond Basics: Vectorized Alternatives in R

Capability SAS R
Vectorized operations Limited to specific functions Native vectorized alternatives
Lookup-based approach Use formats or lookup tables Use recode() function
Concise replacements Limited alternatives to IF/THEN Multiple approaches (ifelse(), recode(), etc.)

SAS Example

/* Standard IF/ELSE approach */
data ds_status1;
  set ds;
  if dsdecod = 'COMPLETED' then status = 'Completed';
  else if dsdecod = 'ADVERSE EVENT' then status = 'Discontinued Due to AE';
  else if dsdecod = 'LACK OF EFFICACY' then status = 'Discontinued Due to LOE';
  else if dsdecod = 'WITHDRAWAL BY SUBJECT' then status = 'Withdrawn';
  else if dsdecod = 'LOST TO FOLLOW-UP' then status = 'LTFU';
  else status = 'Other Discontinuation';
run;

/* FORMAT approach as alternative */
proc format;
  value $dsstatfmt 'COMPLETED' = 'Completed'
                   'ADVERSE EVENT' = 'Discontinued Due to AE'
                   'LACK OF EFFICACY' = 'Discontinued Due to LOE'
                   'WITHDRAWAL BY SUBJECT' = 'Withdrawn'
                   'LOST TO FOLLOW-UP' = 'LTFU'
                   other = 'Other Discontinuation';
run;

data ds_status2;
  set ds;
  status = put(dsdecod, $dsstatfmt.);
run;

Explanation (SAS):

  • IF/ELSE is the standard approach for conditional logic
  • FORMAT approach can be an alternative for simple value mapping
  • Less flexible but potentially more concise for straight value mapping
  • Both approaches produce the same result

R Example - Multiple Approaches

library(dplyr)

# Dummy data for DS example
ds <- data.frame(
  USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
  DSTERM = c("COMPLETED STUDY", "DISCONTINUED DUE TO AE", "LACK OF EFFICACY", "PATIENT WITHDREW", "RELOCATION"),
  DSDECOD = c("COMPLETED", "ADVERSE EVENT", "LACK OF EFFICACY", "WITHDRAWAL BY SUBJECT", "OTHER")
)

# Standard case_when approach
ds_status1 <- ds %>%
  mutate(
    STATUS = case_when(
      DSDECOD == "COMPLETED" ~ "Completed",
      DSDECOD == "ADVERSE EVENT" ~ "Discontinued Due to AE",
      DSDECOD == "LACK OF EFFICACY" ~ "Discontinued Due to LOE",
      DSDECOD == "WITHDRAWAL BY SUBJECT" ~ "Withdrawn",
      DSDECOD == "LOST TO FOLLOW-UP" ~ "LTFU",
      TRUE ~ "Other Discontinuation"
    )
  )

# Alternative with recode
ds_status2 <- ds %>%
  mutate(
    STATUS = recode(DSDECOD,
      "COMPLETED" = "Completed",
      "ADVERSE EVENT" = "Discontinued Due to AE",
      "LACK OF EFFICACY" = "Discontinued Due to LOE",
      "WITHDRAWAL BY SUBJECT" = "Withdrawn",
      "LOST TO FOLLOW-UP" = "LTFU",
      .default = "Other Discontinuation"
    )
  )

# Alternative using named vector lookup
status_values <- c(
  "COMPLETED" = "Completed",
  "ADVERSE EVENT" = "Discontinued Due to AE",
  "LACK OF EFFICACY" = "Discontinued Due to LOE",
  "WITHDRAWAL BY SUBJECT" = "Withdrawn",
  "LOST TO FOLLOW-UP" = "LTFU"
)

ds_status3 <- ds %>%
  mutate(
    STATUS = ifelse(DSDECOD %in% names(status_values),
                   status_values[DSDECOD],
                   "Other Discontinuation")
  )

Explanation (R):

  • R offers multiple approaches for conditional transformations
  • case_when(): Most similar to IF/ELSE, best for complex conditions
  • recode(): Concise for simple value mapping (like SAS FORMATs)
  • ifelse(): Can be nested for multiple conditions
  • Each approach has its own use cases and advantages

Input Table

USUBJID DSTERM DSDECOD
001-001 COMPLETED STUDY COMPLETED
001-002 DISCONTINUED DUE TO AE ADVERSE EVENT
001-003 LACK OF EFFICACY LACK OF EFFICACY
001-004 PATIENT WITHDREW WITHDRAWAL BY SUBJECT
001-005 RELOCATION OTHER

Expected Output (all approaches)

USUBJID DSTERM DSDECOD STATUS
001-001 COMPLETED STUDY COMPLETED Completed
001-002 DISCONTINUED DUE TO AE ADVERSE EVENT Discontinued Due to AE
001-003 LACK OF EFFICACY LACK OF EFFICACY Discontinued Due to LOE
001-004 PATIENT WITHDREW WITHDRAWAL BY SUBJECT Withdrawn
001-005 RELOCATION OTHER Other Discontinuation

8. Summary: SAS vs R Conditional Logic Capabilities

Capability SAS (IF/ELSE) R (case_when, if_else, etc.)
Basic conditionals ✓ IF/THEN/ELSE ✓ case_when(), if_else(), ifelse()
Compound conditions ✓ &, |, etc. ✓ &, |, etc.
Missing value handling ✓ Explicit (=., ='', missing()) ✓ Explicit (is.na())
Multiple actions per condition ✓ DO blocks ⚠ Requires repeating conditions
Default/catch-all ✓ ELSE statement ✓ TRUE ~ value
Nested conditions ✓ Nested IF statements ⚠ Combined conditions or sequential mutate()
Row-by-row processing ✓ Natural approach ⚠ Column-oriented, less intuitive
Readable complex logic ✓ Clear block structure ⚠ Can become verbose with repeated conditions
Vectorized alternatives ⚠ Limited (formats, arrays) ✓ Many options (recode, vectorized functions)
Performance on large datasets ⚠ Can be slower with many conditions ✓ Vectorized operations, potentially faster

Key Points:

  • SAS IF/ELSE provides a procedural, row-by-row approach that's intuitive for complex logic
  • R's case_when() offers a vectorized approach that's more column-oriented
  • SAS DO blocks have no direct equivalent in R's tidyverse approach
  • R offers more alternatives for simple value mapping (recode, etc.)
  • Both systems require special handling for missing values
  • SAS is often more readable for complex nested logic
  • R's pipe operator (%>%) helps chain operations for complex transformations
  • For simple value mapping, R's recode() is similar to SAS formats

When to use each approach:

  • SAS IF/ELSE: When you need complex row-based operations with multiple actions per condition
  • R case_when(): For most conditional operations, especially with multiple outcomes
  • R if_else(): For simple binary conditions, especially with missing value handling
  • R recode(): For simple value mapping/recoding with no complex conditions

**Resource download links**

1.4.11.-IF-ELSE-statements-in-SAS-vs-R-equivalent.zip