1.4. Migrating from SAS to R: A Skill Conversion Guide
1.4.11. IF/ELSE statements in SAS vs R equivalent
1. Basic Conditional Logic: Simple Value Transformation
| Capability | SAS (IF/ELSE) | R (case_when, if_else) |
|---|---|---|
| Basic conditionals | if condition then value; |
case_when(condition ~ value) or if_else(condition, value_if_true, value_if_false) |
| Multiple conditions | else if condition then value; |
Additional conditions in case_when() |
| Default value | else value; |
TRUE ~ value in case_when() |
| Multiple variables | Using DO blocks | Multiple mutate() calls or repeated conditions |
SAS Example
data ae_severity;
set ae;
length severity_cat $20 urgency_level 8;
if aesev = 'MILD' then severity_cat = 'Non-significant';
else if aesev = 'MODERATE' then severity_cat = 'Notable';
else if aesev = 'SEVERE' then severity_cat = 'Significant';
else if aesev = 'LIFE THREATENING' then severity_cat = 'Critical';
if aesev = 'MILD' then urgency_level = 1;
else if aesev = 'MODERATE' then urgency_level = 2;
else if aesev = 'SEVERE' then urgency_level = 3;
else if aesev = 'LIFE THREATENING' then urgency_level = 4;
run;
Explanation (SAS):
- Uses
if/else ifstatements to categorize adverse events based on severity (AESEV) - Creates both character (
severity_cat) and numeric (urgency_level) variables - Each severity level gets a corresponding category and numeric urgency level
- Sequential logic ensures each adverse event falls into only one category
R Example
library(dplyr)
# Dummy data for AE example
ae <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004"),
AEDECOD = c("HEADACHE", "NAUSEA", "DYSPNEA", "ANAPHYLAXIS"),
AESEV = c("MILD", "MODERATE", "SEVERE", "LIFE THREATENING")
)
ae_severity <- ae %>%
mutate(
SEVERITY_CAT = case_when(
AESEV == "MILD" ~ "Non-significant",
AESEV == "MODERATE" ~ "Notable",
AESEV == "SEVERE" ~ "Significant",
AESEV == "LIFE THREATENING" ~ "Critical"
),
URGENCY_LEVEL = case_when(
AESEV == "MILD" ~ 1,
AESEV == "MODERATE" ~ 2,
AESEV == "SEVERE" ~ 3,
AESEV == "LIFE THREATENING" ~ 4
)
)
Explanation (R):
- Uses
case_when()function from dplyr package - Conditions appear on the left side of
~, return values on the right - Multiple conditions evaluated in order until one is TRUE
- Multiple variables can be created in a single
mutate()call case_when()handles each column separately (no direct equivalent to DO blocks)
Input Table
| USUBJID | AEDECOD | AESEV |
|---|---|---|
| 001-001 | HEADACHE | MILD |
| 001-002 | NAUSEA | MODERATE |
| 001-003 | DYSPNEA | SEVERE |
| 001-004 | ANAPHYLAXIS | LIFE THREATENING |
Expected Output
| USUBJID | AEDECOD | AESEV | SEVERITY_CAT | URGENCY_LEVEL |
|---|---|---|---|---|
| 001-001 | HEADACHE | MILD | Non-significant | 1 |
| 001-002 | NAUSEA | MODERATE | Notable | 2 |
| 001-003 | DYSPNEA | SEVERE | Significant | 3 |
| 001-004 | ANAPHYLAXIS | LIFE THREATENING | Critical | 4 |
2. Compound Conditions: Multiple Criteria
| Capability | SAS | R |
|---|---|---|
| AND conditions | if cond1 & cond2 then value; |
case_when(cond1 & cond2 ~ value) |
| OR conditions | `if cond1 | cond2 then value;` |
| Mixed conditions | `if (cond1 & cond2) | cond3 then value;` |
SAS Example
data vs_category;
set vs;
if vstestcd = 'SYSBP' & vsstresn > 180 then clinical_flag = 'Critical Hypertension';
else if vstestcd = 'SYSBP' & vsstresn > 140 then clinical_flag = 'Hypertension';
else if vstestcd = 'DIABP' & vsstresn > 120 then clinical_flag = 'Critical Diastolic';
else if vstestcd = 'TEMP' & vsstresn > 38.5 then clinical_flag = 'Fever';
else clinical_flag = 'Normal Range';
run;
Explanation (SAS):
- Combines multiple conditions (vital sign test code and result value)
- Uses logical AND (
&) to ensure both conditions must be met - Classifies vital signs measurements based on clinical thresholds
- Creates a flagging variable for abnormal values requiring attention
R Example
library(dplyr)
# Dummy data for VS example
vs <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
VSTESTCD = c("SYSBP", "SYSBP", "DIABP", "TEMP", "PULSE"),
VSSTRESN = c(185, 145, 125, 39.2, 88)
)
vs_category <- vs %>%
mutate(
CLINICAL_FLAG = case_when(
VSTESTCD == "SYSBP" & VSSTRESN > 180 ~ "Critical Hypertension",
VSTESTCD == "SYSBP" & VSSTRESN > 140 ~ "Hypertension",
VSTESTCD == "DIABP" & VSSTRESN > 120 ~ "Critical Diastolic",
VSTESTCD == "TEMP" & VSSTRESN > 38.5 ~ "Fever",
TRUE ~ "Normal Range"
)
)
Explanation (R):
- Combines conditions with
&(AND) operator, just like in SAS - Values that don't match any condition become
NAby default - The
case_when()function processes conditions in order - Each observation gets assigned the first matching condition's value
Input Table
| USUBJID | VSTESTCD | VSSTRESN |
|---|---|---|
| 001-001 | SYSBP | 185 |
| 001-002 | SYSBP | 145 |
| 001-003 | DIABP | 125 |
| 001-004 | TEMP | 39.2 |
| 001-005 | PULSE | 88 |
Expected Output
| USUBJID | VSTESTCD | VSSTRESN | CLINICAL_FLAG |
|---|---|---|---|
| 001-001 | SYSBP | 185 | Critical Hypertension |
| 001-002 | SYSBP | 145 | Hypertension |
| 001-003 | DIABP | 125 | Critical Diastolic |
| 001-004 | TEMP | 39.2 | Fever |
| 001-005 | PULSE | 88 | Normal Range |
3. Default Values: Handling Unmatched Conditions
| Capability | SAS | R |
|---|---|---|
| Default value | else value; |
TRUE ~ value in case_when() |
| Catch-all condition | Final else without condition |
TRUE ~ as final condition |
SAS Example
data cm_importance;
set cm;
if cmclas = 'ACE INHIBITORS' & cmstrf = 'Y' then priority = 'Critical';
else if cmclas = 'ACE INHIBITORS' then priority = 'High';
else if cmstrf = 'Y' then priority = 'Medium';
else if cmstdy < 0 then priority = 'Baseline';
else priority = 'Standard';
run;
Explanation (SAS):
- Categorizes concomitant medications based on drug class and study relevance flag
- Adds a final
elsestatement as a catch-all for medications not meeting special criteria - All medications are guaranteed to have a priority assigned
- The catch-all is only executed if all previous conditions are FALSE
R Example
library(dplyr)
# Dummy data for CM example
cm <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
CMCLAS = c("ACE INHIBITORS", "ACE INHIBITORS", "ANALGESICS", "ANALGESICS", "ANTIBIOTICS"),
CMSTRF = c("Y", "N", "Y", "N", "N"),
CMSTDY = c(10, 5, -3, -5, 15)
)
cm_importance <- cm %>%
mutate(
PRIORITY = case_when(
CMCLAS == "ACE INHIBITORS" & CMSTRF == "Y" ~ "Critical",
CMCLAS == "ACE INHIBITORS" ~ "High",
CMSTRF == "Y" ~ "Medium",
CMSTDY < 0 ~ "Baseline",
TRUE ~ "Standard"
)
)
Explanation (R):
- Uses
TRUE ~as the final condition to catch all remaining cases - This works because
TRUEis always TRUE for any observation - Equivalent to the
elsestatement in SAS - Ensures all observations get a value (no NAs for unmatched conditions)
Input Table
| USUBJID | CMCLAS | CMSTRF | CMSTDY |
|---|---|---|---|
| 001-001 | ACE INHIBITORS | Y | 10 |
| 001-002 | ACE INHIBITORS | N | 5 |
| 001-003 | ANALGESICS | Y | -3 |
| 001-004 | ANALGESICS | N | -5 |
| 001-005 | ANTIBIOTICS | N | 15 |
Expected Output
| USUBJID | CMCLAS | CMSTRF | CMSTDY | PRIORITY |
|---|---|---|---|---|
| 001-001 | ACE INHIBITORS | Y | 10 | Critical |
| 001-002 | ACE INHIBITORS | N | 5 | High |
| 001-003 | ANALGESICS | Y | -3 | Medium |
| 001-004 | ANALGESICS | N | -5 | Baseline |
| 001-005 | ANTIBIOTICS | N | 15 | Standard |
4. Multiple Actions: DO Blocks and Equivalent R Approaches
| Capability | SAS | R |
|---|---|---|
| Multiple actions | if condition then do; stmt1; stmt2; end; |
Multiple variables in mutate() with same conditions |
| Block structure | DO/END blocks encapsulate multiple statements | No direct equivalent, use separate column transformations |
| Action sequence | Actions within DO block executed in order | Column operations in mutate() executed independently |
SAS Example
data ex_derived;
set ex;
if exdose > 100 & exdosu = 'mg' then do;
dose_category = 'High';
monitoring_freq = 'Daily';
max_duration = 14;
end;
else if exdose > 50 & exdosu = 'mg' then do;
dose_category = 'Medium';
monitoring_freq = 'Weekly';
max_duration = 28;
end;
else if exdose > 0 then do;
dose_category = 'Low';
monitoring_freq = 'Monthly';
max_duration = 90;
end;
else do;
dose_category = 'Unknown';
monitoring_freq = 'Unknown';
max_duration = 0;
end;
/* Calculate additional derived values */
total_dose = exdose * exdur;
avg_daily = total_dose / exdur;
run;
Explanation (SAS):
- Uses DO blocks to perform multiple related actions for each condition
- For each order, assigns appropriate discount, shipping cost, and delivery time
- Final calculations use the values assigned in the conditional blocks
- Clear visual grouping of related operations per condition
R Example
library(dplyr)
# Dummy data for EX example
ex <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004"),
EXTRT = c("DRUG-A", "DRUG-A", "DRUG-B", "PLACEBO"),
EXDOSE = c(120, 75, 25, 0),
EXDOSU = c("mg", "mg", "mg", "mg"),
EXDUR = c(7, 14, 30, 30)
)
ex_derived <- ex %>%
mutate(
DOSE_CATEGORY = case_when(
EXDOSE > 100 & EXDOSU == "mg" ~ "High",
EXDOSE > 50 & EXDOSU == "mg" ~ "Medium",
EXDOSE > 0 ~ "Low",
TRUE ~ "Unknown"
),
MONITORING_FREQ = case_when(
EXDOSE > 100 & EXDOSU == "mg" ~ "Daily",
EXDOSE > 50 & EXDOSU == "mg" ~ "Weekly",
EXDOSE > 0 ~ "Monthly",
TRUE ~ "Unknown"
),
MAX_DURATION = case_when(
EXDOSE > 100 & EXDOSU == "mg" ~ 14,
EXDOSE > 50 & EXDOSU == "mg" ~ 28,
EXDOSE > 0 ~ 90,
TRUE ~ 0
)
) %>%
mutate(
TOTAL_DOSE = EXDOSE * EXDUR,
AVG_DAILY = TOTAL_DOSE / EXDUR
)
Explanation (R):
- No direct equivalent to DO blocks in dplyr's approach
- Instead, repeat the conditions for each variable being modified
- Each column transformation uses its own
case_when()function - R's approach is more column-oriented, while SAS is row-oriented
Input Table
| USUBJID | EXTRT | EXDOSE | EXDOSU | EXDUR |
|---|---|---|---|---|
| 001-001 | DRUG-A | 120 | mg | 7 |
| 001-002 | DRUG-A | 75 | mg | 14 |
| 001-003 | DRUG-B | 25 | mg | 30 |
| 001-004 | PLACEBO | 0 | mg | 30 |
Expected Output
| USUBJID | EXTRT | EXDOSE | EXDOSU | EXDUR | DOSE_CATEGORY | MONITORING_FREQ | MAX_DURATION | TOTAL_DOSE | AVG_DAILY |
|---|---|---|---|---|---|---|---|---|---|
| 001-001 | DRUG-A | 120 | mg | 7 | High | Daily | 14 | 840 | 120 |
| 001-002 | DRUG-A | 75 | mg | 14 | Medium | Weekly | 28 | 1050 | 75 |
| 001-003 | DRUG-B | 25 | mg | 30 | Low | Monthly | 90 | 750 | 25 |
| 001-004 | PLACEBO | 0 | mg | 30 | Unknown | Unknown | 0 | 0 | 0 |
5. Handling Missing Values: NA Conditions
| Capability | SAS | R |
|---|---|---|
| Missing value check | if var = . or if var = '' or if missing(var) |
is.na(var) within conditions |
| Explicit missing handling | Special missing value operators and functions | Special handling with is.na() or missing parameter |
| Missing as default | No special handling (missing remains missing) | case_when() returns NA for unmatched conditions |
SAS Example
data qs_complete;
set qs;
if qsstresn = 5 then response_level = "Complete Response";
else if qsstresn = 4 then response_level = "Strong Response";
else if qsstresn = 3 then response_level = "Moderate Response";
else if qsstresn = 2 then response_level = "Mild Response";
else if qsstresn = 1 then response_level = "No Response";
else if qsstresn = . then response_level = "Missing Response";
run;
Explanation (SAS):
- Handles survey responses with satisfaction scores from 1-5
- Special condition for missing scores (
satisfaction_score = .) - Categorizes responses into descriptive result categories
- Explicitly handles missing values with a specific condition
R Example - Approach 1: Using case_when() with is.na()
library(dplyr)
# Dummy data for QS example
qs <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
QSTEST = rep("PAIN ASSESSMENT", 5),
QSSTRESN = c(5, 3, 1, NA, 4)
)
qs_complete1 <- qs %>%
mutate(
RESPONSE_LEVEL = case_when(
QSSTRESN == 5 ~ "Complete Response",
QSSTRESN == 4 ~ "Strong Response",
QSSTRESN == 3 ~ "Moderate Response",
QSSTRESN == 2 ~ "Mild Response",
QSSTRESN == 1 ~ "No Response",
is.na(QSSTRESN) ~ "Missing Response"
)
)
Explanation (R - Approach 1):
- Uses
is.na()function to explicitly test for NA values - Special handling required as
case_when()silently skips NA inputs - Order matters: test for exact match, non-missing non-match, then missing
!is.na()ensures we don't incorrectly evaluate NA values
R Example - Approach 2: Using if_else() chain
library(dplyr)
# Dummy data for QS example (already defined above)
qs_complete2 <- qs %>%
mutate(
RESPONSE_LEVEL = if_else(QSSTRESN == 5, "Complete Response",
if_else(QSSTRESN == 4, "Strong Response",
if_else(QSSTRESN == 3, "Moderate Response",
if_else(QSSTRESN == 2, "Mild Response",
if_else(QSSTRESN == 1, "No Response",
"Missing Response")))))
)
Explanation (R - Approach 2):
if_else()provides a simpler syntax for simple conditions- Takes condition, true value, false value, and missing value
- More concise than
case_when()for simple binary conditions - The
missingparameter specifies what to return for NA inputs
Input Table
| USUBJID | QSTEST | QSSTRESN |
|---|---|---|
| 001-001 | PAIN ASSESSMENT | 5 |
| 001-002 | PAIN ASSESSMENT | 3 |
| 001-003 | PAIN ASSESSMENT | 1 |
| 001-004 | PAIN ASSESSMENT | NA |
| 001-005 | PAIN ASSESSMENT | 4 |
Expected Output
| USUBJID | QSTEST | QSSTRESN | RESPONSE_LEVEL |
|---|---|---|---|
| 001-001 | PAIN ASSESSMENT | 5 | Complete Response |
| 001-002 | PAIN ASSESSMENT | 3 | Moderate Response |
| 001-003 | PAIN ASSESSMENT | 1 | No Response |
| 001-004 | PAIN ASSESSMENT | NA | Missing Response |
| 001-005 | PAIN ASSESSMENT | 4 | Strong Response |
6. Beyond Basics: Complex Conditional Logic
| Capability | SAS | R |
|---|---|---|
| Nested conditions | if cond1 then if cond2 then value; |
Combine with & or nest case_when() calls |
| Complex expressions | Any valid SAS expression in condition | Any valid R expression in condition |
| Computed conditions | Variables or functions in condition | Variables or functions in condition |
SAS Example
data lb_flagging;
set lb;
/* Direct lab abnormality flagging based on reference ranges */
if lbstresn > (lbstnrhi * 3) & lbstnrhi > 0 then do;
abnormality = "Critically High";
flag = "H++";
action = "Repeat Test Immediately";
score = 3;
end;
else if lbstresn > lbstnrhi & lbstnrhi > 0 then do;
abnormality = "High";
flag = "H";
action = "Monitor";
score = 1;
end;
else if lbstresn < (lbstnrlo * 0.5) & lbstnrlo > 0 then do;
abnormality = "Critically Low";
flag = "L++";
action = "Repeat Test Immediately";
score = 3;
end;
else if lbstresn < lbstnrlo & lbstnrlo > 0 then do;
abnormality = "Low";
flag = "L";
action = "Monitor";
score = 1;
end;
else do;
abnormality = "Normal";
flag = "N";
action = "No Action";
score = 0;
end;
/* Additional processing for specific tests */
if lbtestcd = "ALT" then do;
if lbstresn > (lbstnrhi * 3) then priority = "Hepatic Alert";
else if lbstresn > lbstnrhi then priority = "Hepatic Monitor";
else priority = "Routine";
end;
else priority = "Standard";
run;
Explanation (SAS):
- Directly compares lab results against reference ranges
- Uses simple multipliers for critical thresholds (3x upper limit, 0.5x lower limit)
- Assigns a simplified score based on severity level
- Includes special processing for liver function tests (ALT)
- Avoids unnecessary intermediate calculations
R Example
library(dplyr)
# Dummy data for LB example
lb <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
LBTESTCD = c("ALT", "AST", "HGB", "PLT", "ALT"),
LBSTRESN = c(150, 55, 9, 140, 25),
LBSTNRLO = c(10, 10, 12, 150, 10),
LBSTNRHI = c(40, 40, 16, 450, 40)
)
lb_flagging <- lb %>%
mutate(
# Direct lab abnormality flagging based on reference ranges
ABNORMALITY = case_when(
LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "Critically High",
LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "High",
LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "Critically Low",
LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "Low",
TRUE ~ "Normal"
),
FLAG = case_when(
LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "H++",
LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "H",
LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "L++",
LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "L",
TRUE ~ "N"
),
ACTION = case_when(
LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ "Repeat Test Immediately",
LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ "Repeat Test Immediately",
LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ "Monitor",
LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ "Monitor",
TRUE ~ "No Action"
),
SCORE = case_when(
LBSTRESN > (LBSTNRHI * 3) & LBSTNRHI > 0 ~ 3,
LBSTRESN < (LBSTNRLO * 0.5) & LBSTNRLO > 0 ~ 3,
LBSTRESN > LBSTNRHI & LBSTNRHI > 0 ~ 1,
LBSTRESN < LBSTNRLO & LBSTNRLO > 0 ~ 1,
TRUE ~ 0
)
) %>%
# Additional processing for specific tests
mutate(
PRIORITY = case_when(
LBTESTCD == "ALT" & LBSTRESN > (LBSTNRHI * 3) ~ "Hepatic Alert",
LBTESTCD == "ALT" & LBSTRESN > LBSTNRHI ~ "Hepatic Monitor",
LBTESTCD == "ALT" ~ "Routine",
TRUE ~ "Standard"
)
)
Explanation (R):
- Uses direct comparisons with reference ranges rather than calculating intermediate ratios
- Simplified scoring with straightforward severity levels (3 = critical, 1 = abnormal, 0 = normal)
- Maintains the same logical structure and outcomes as the original example
- Makes the code more readable by removing unnecessary complexity
- Still allows for special handling of specific test types (ALT)
Input Table
| USUBJID | LBTESTCD | LBSTRESN | LBSTNRLO | LBSTNRHI |
|---|---|---|---|---|
| 001-001 | ALT | 150 | 10 | 40 |
| 001-002 | AST | 55 | 10 | 40 |
| 001-003 | HGB | 9 | 12 | 16 |
| 001-004 | PLT | 140 | 150 | 450 |
| 001-005 | ALT | 25 | 10 | 40 |
Expected Output
| USUBJID | LBTESTCD | ABNORMALITY | FLAG | ACTION | SCORE | PRIORITY |
|---|---|---|---|---|---|---|
| 001-001 | ALT | Critically High | H++ | Repeat Test Immediately | 3 | Hepatic Alert |
| 001-002 | AST | High | H | Monitor | 1 | Standard |
| 001-003 | HGB | Low | L | Monitor | 1 | Standard |
| 001-004 | PLT | Low | L | Monitor | 1 | Standard |
| 001-005 | ALT | Normal | N | No Action | 0 | Routine |
7. Beyond Basics: Vectorized Alternatives in R
| Capability | SAS | R |
|---|---|---|
| Vectorized operations | Limited to specific functions | Native vectorized alternatives |
| Lookup-based approach | Use formats or lookup tables | Use recode() function |
| Concise replacements | Limited alternatives to IF/THEN | Multiple approaches (ifelse(), recode(), etc.) |
SAS Example
/* Standard IF/ELSE approach */
data ds_status1;
set ds;
if dsdecod = 'COMPLETED' then status = 'Completed';
else if dsdecod = 'ADVERSE EVENT' then status = 'Discontinued Due to AE';
else if dsdecod = 'LACK OF EFFICACY' then status = 'Discontinued Due to LOE';
else if dsdecod = 'WITHDRAWAL BY SUBJECT' then status = 'Withdrawn';
else if dsdecod = 'LOST TO FOLLOW-UP' then status = 'LTFU';
else status = 'Other Discontinuation';
run;
/* FORMAT approach as alternative */
proc format;
value $dsstatfmt 'COMPLETED' = 'Completed'
'ADVERSE EVENT' = 'Discontinued Due to AE'
'LACK OF EFFICACY' = 'Discontinued Due to LOE'
'WITHDRAWAL BY SUBJECT' = 'Withdrawn'
'LOST TO FOLLOW-UP' = 'LTFU'
other = 'Other Discontinuation';
run;
data ds_status2;
set ds;
status = put(dsdecod, $dsstatfmt.);
run;
Explanation (SAS):
- IF/ELSE is the standard approach for conditional logic
- FORMAT approach can be an alternative for simple value mapping
- Less flexible but potentially more concise for straight value mapping
- Both approaches produce the same result
R Example - Multiple Approaches
library(dplyr)
# Dummy data for DS example
ds <- data.frame(
USUBJID = c("001-001", "001-002", "001-003", "001-004", "001-005"),
DSTERM = c("COMPLETED STUDY", "DISCONTINUED DUE TO AE", "LACK OF EFFICACY", "PATIENT WITHDREW", "RELOCATION"),
DSDECOD = c("COMPLETED", "ADVERSE EVENT", "LACK OF EFFICACY", "WITHDRAWAL BY SUBJECT", "OTHER")
)
# Standard case_when approach
ds_status1 <- ds %>%
mutate(
STATUS = case_when(
DSDECOD == "COMPLETED" ~ "Completed",
DSDECOD == "ADVERSE EVENT" ~ "Discontinued Due to AE",
DSDECOD == "LACK OF EFFICACY" ~ "Discontinued Due to LOE",
DSDECOD == "WITHDRAWAL BY SUBJECT" ~ "Withdrawn",
DSDECOD == "LOST TO FOLLOW-UP" ~ "LTFU",
TRUE ~ "Other Discontinuation"
)
)
# Alternative with recode
ds_status2 <- ds %>%
mutate(
STATUS = recode(DSDECOD,
"COMPLETED" = "Completed",
"ADVERSE EVENT" = "Discontinued Due to AE",
"LACK OF EFFICACY" = "Discontinued Due to LOE",
"WITHDRAWAL BY SUBJECT" = "Withdrawn",
"LOST TO FOLLOW-UP" = "LTFU",
.default = "Other Discontinuation"
)
)
# Alternative using named vector lookup
status_values <- c(
"COMPLETED" = "Completed",
"ADVERSE EVENT" = "Discontinued Due to AE",
"LACK OF EFFICACY" = "Discontinued Due to LOE",
"WITHDRAWAL BY SUBJECT" = "Withdrawn",
"LOST TO FOLLOW-UP" = "LTFU"
)
ds_status3 <- ds %>%
mutate(
STATUS = ifelse(DSDECOD %in% names(status_values),
status_values[DSDECOD],
"Other Discontinuation")
)
Explanation (R):
- R offers multiple approaches for conditional transformations
case_when(): Most similar to IF/ELSE, best for complex conditionsrecode(): Concise for simple value mapping (like SAS FORMATs)ifelse(): Can be nested for multiple conditions- Each approach has its own use cases and advantages
Input Table
| USUBJID | DSTERM | DSDECOD |
|---|---|---|
| 001-001 | COMPLETED STUDY | COMPLETED |
| 001-002 | DISCONTINUED DUE TO AE | ADVERSE EVENT |
| 001-003 | LACK OF EFFICACY | LACK OF EFFICACY |
| 001-004 | PATIENT WITHDREW | WITHDRAWAL BY SUBJECT |
| 001-005 | RELOCATION | OTHER |
Expected Output (all approaches)
| USUBJID | DSTERM | DSDECOD | STATUS |
|---|---|---|---|
| 001-001 | COMPLETED STUDY | COMPLETED | Completed |
| 001-002 | DISCONTINUED DUE TO AE | ADVERSE EVENT | Discontinued Due to AE |
| 001-003 | LACK OF EFFICACY | LACK OF EFFICACY | Discontinued Due to LOE |
| 001-004 | PATIENT WITHDREW | WITHDRAWAL BY SUBJECT | Withdrawn |
| 001-005 | RELOCATION | OTHER | Other Discontinuation |
8. Summary: SAS vs R Conditional Logic Capabilities
| Capability | SAS (IF/ELSE) | R (case_when, if_else, etc.) |
|---|---|---|
| Basic conditionals | ✓ IF/THEN/ELSE | ✓ case_when(), if_else(), ifelse() |
| Compound conditions | ✓ &, |, etc. | ✓ &, |, etc. |
| Missing value handling | ✓ Explicit (=., ='', missing()) | ✓ Explicit (is.na()) |
| Multiple actions per condition | ✓ DO blocks | ⚠ Requires repeating conditions |
| Default/catch-all | ✓ ELSE statement | ✓ TRUE ~ value |
| Nested conditions | ✓ Nested IF statements | ⚠ Combined conditions or sequential mutate() |
| Row-by-row processing | ✓ Natural approach | ⚠ Column-oriented, less intuitive |
| Readable complex logic | ✓ Clear block structure | ⚠ Can become verbose with repeated conditions |
| Vectorized alternatives | ⚠ Limited (formats, arrays) | ✓ Many options (recode, vectorized functions) |
| Performance on large datasets | ⚠ Can be slower with many conditions | ✓ Vectorized operations, potentially faster |
Key Points:
- SAS IF/ELSE provides a procedural, row-by-row approach that's intuitive for complex logic
- R's
case_when()offers a vectorized approach that's more column-oriented - SAS DO blocks have no direct equivalent in R's tidyverse approach
- R offers more alternatives for simple value mapping (recode, etc.)
- Both systems require special handling for missing values
- SAS is often more readable for complex nested logic
- R's pipe operator (
%>%) helps chain operations for complex transformations - For simple value mapping, R's
recode()is similar to SAS formats
When to use each approach:
- SAS IF/ELSE: When you need complex row-based operations with multiple actions per condition
- R case_when(): For most conditional operations, especially with multiple outcomes
- R if_else(): For simple binary conditions, especially with missing value handling
- R recode(): For simple value mapping/recoding with no complex conditions
**Resource download links**
1.4.11.-IF-ELSE-statements-in-SAS-vs-R-equivalent.zip