1.4. Migrating from SAS to R: A Skill Conversion Guide
1.4.12. DO LOOP Statements in SAS vs R equivalent
1. Basic Iterative DO Loops
| Capability | SAS (DO Loops) | R (Loops and Vectorized Operations) |
|---|---|---|
| Simple iteration | DO i = start TO end; |
for (i in start:end) {} |
| Iteration with steps | DO i = start TO end BY step; |
for (i in seq(start, end, by=step)) {} |
| Iteration with condition | DO i = start TO end WHILE (condition); |
for (i in start:end) { if (!condition) break } |
SAS Example: Simple Counting
/* Simple counter that makes a dataset with numbers 1-5 */
data counter;
do i = 1 to 5;
output;
end;
run;
Explanation (SAS):
- Creates a dataset with 5 rows
- Each row contains a single number (1, 2, 3, 4, or 5)
- The
outputstatement creates a new row for each value of i
R Example: Simple Counting
# Simple counter that makes a data frame with numbers 1-5
counter <- data.frame(i = 1:5)
Explanation (R):
- Creates a data frame with 5 rows
- Each row contains a single number from 1 to 5
- In R, we can directly create the sequence in one step
Expected Output (Both Examples):
| i |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Key Points About Basic Loops:
- SAS uses DO loops to repeat actions a specific number of times
- R has for loops but often doesn't need them thanks to vectorization
- In both languages, loops help you automate repetitive tasks
2. DO WHILE and DO UNTIL: Conditional Loop Execution
| Capability | SAS | R |
|---|---|---|
| Evaluate at beginning | DO WHILE (condition); |
while (condition) {} |
| Evaluate at end | DO UNTIL (condition); |
repeat { if (condition) break } |
SAS Example: Counting to a Target
/* Count until we reach 50 */
data counting;
sum = 0; /* Starting sum */
count = 0; /* How many numbers we've added */
/* Add numbers until sum reaches 50 */
do until (sum >= 50);
count = count + 1; /* Add 1 to our counter */
sum = sum + count; /* Add the counter to our sum */
output;
end;
run;
Explanation (SAS):
- Starts with sum = 0
- Adds 1, then 2, then 3... until the sum reaches at least 50
- Records each step to see the progress
- The loop runs until the condition (sum >= 50) becomes true
R Example: Counting to a Target
# Count until we reach 50
counting <- data.frame(count = numeric(), sum = numeric())
sum <- 0
count <- 0
repeat {
count <- count + 1 # Add 1 to our counter
sum <- sum + count # Add the counter to our sum
# Save the current state
counting <- rbind(counting, data.frame(count = count, sum = sum))
# Stop when sum reaches 50
if (sum >= 50) break
}
Explanation (R):
- Does the same thing as the SAS example
- Uses
repeatwithbreakto mimic DO UNTIL behavior - When sum reaches 50, the
breakstatement exits the loop
Expected Output:
| count | sum |
|---|---|
| 1 | 1 |
| 2 | 3 |
| 3 | 6 |
| ... | ... |
| 10 | 55 |
3. Nested DO Loops: Handling Multiple Levels
SAS Example: Simple Multiplication Table
/* Create a small multiplication table */
data multiplication;
do row = 1 to 3;
do col = 1 to 3;
product = row * col;
output;
end;
end;
run;
Explanation (SAS):
- Creates a 3x3 multiplication table
- Outer loop (row) goes from 1 to 3
- For each row, inner loop (col) goes from 1 to 3
- Calculates the product of row * col
R Example: Simple Multiplication Table
# Create a small multiplication table
mult_table <- data.frame(row = numeric(),
col = numeric(),
product = numeric())
for (row in 1:3) {
for (col in 1:3) {
product <- row * col
mult_table <- rbind(mult_table,
data.frame(row = row,
col = col,
product = product))
}
}
Explanation (R):
- Does the same as the SAS example
- Uses nested for loops - one for rows and one for columns
- For each combination, calculates and stores the product
Expected Output:
| row | col | product |
|---|---|---|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 1 | 2 |
| 2 | 2 | 4 |
| 2 | 3 | 6 |
| 3 | 1 | 3 |
| 3 | 2 | 6 |
| 3 | 3 | 9 |
4. DO Loops with Arrays: Working with Multiple Variables
SAS Example: Temperature Conversion
data temperatures;
/* Store 3 temperatures in Celsius */
array celsius[3] _temporary_ (0, 25, 100);
/* Create variables for Fahrenheit values */
array fahrenheit[3] temp1 temp2 temp3;
/* Convert each temperature from C to F */
do i = 1 to 3;
/* F = C*9/5 + 32 */
fahrenheit[i] = celsius[i] * 9/5 + 32;
end;
drop i;
run;
Explanation (SAS):
- Creates an array with 3 Celsius temperatures
- Creates another array to hold Fahrenheit values
- The DO loop processes each temperature:
- Converts from Celsius to Fahrenheit
- Stores the result in the corresponding position
R Example: Temperature Conversion
# Celsius temperatures
celsius <- c(0, 25, 100)
# Create data frame to store results
temperatures <- data.frame(
celsius = celsius,
fahrenheit = numeric(3) # Empty vector for Fahrenheit values
)
# Convert each temperature
for (i in 1:3) {
temperatures$fahrenheit[i] <- temperatures$celsius[i] * 9/5 + 32
}
Explanation (R):
- Creates a vector with Celsius temperatures
- Makes a data frame with a column for Celsius and a column for Fahrenheit
- The for loop converts each temperature one by one
Expected Output:
| celsius | fahrenheit |
|---|---|
| 0 | 32 |
| 25 | 77 |
| 100 | 212 |
5. DO WHILE and DO UNTIL: Simple Examples
SAS Example: Doubling a Number
/* Double a number until it exceeds 100 */
data doubling;
value = 2;
iteration = 0;
/* Keep doubling until we exceed 100 */
do while (value <= 100);
iteration + 1;
value = value * 2;
output;
end;
run;
Explanation (SAS):
- Starts with value = 2
- Doubles the value each time through the loop
- Continues until the value exceeds 100
- DO WHILE checks the condition at the beginning of each iteration
R Example: Doubling a Number
# Double a number until it exceeds 100
doubling <- data.frame(iteration = numeric(),
value = numeric())
value <- 2
iteration <- 0
while (value <= 100) {
iteration <- iteration + 1
value <- value * 2
doubling <- rbind(doubling,
data.frame(iteration = iteration,
value = value))
}
Explanation (R):
- Same logic as the SAS example
- While loop checks the condition before each iteration
- Stops when value exceeds 100
Expected Output:
| iteration | value |
|---|---|
| 1 | 4 |
| 2 | 8 |
| 3 | 16 |
| 4 | 32 |
| 5 | 64 |
| 6 | 128 |
6. Simple Data Generation: Using Loops to Create Data
SAS Example: Simple Sequence of Dates
/* Create a series of weekly dates */
data weekly_dates;
format date_value date9.;
/* Start from January 1, 2023 */
date_value = '01JAN2023'd;
/* Create 5 weekly dates */
do week = 1 to 5;
/* Add 7 days for each week */
date_value = date_value + 7;
output;
end;
run;
Explanation (SAS):
- Starts with January 1, 2023
- For each of 5 weeks, adds 7 days to get the next date
- Creates a dataset with 5 weekly dates
R Example: Simple Sequence of Dates
# Create a series of weekly dates
library(lubridate)
# Start date: January 1, 2023
start_date <- ymd("2023-01-01")
# Create data frame with 5 weekly dates
weekly_dates <- data.frame(
week = 1:5,
date_value = start_date + days(7 * 1:5)
)
Explanation (R):
- Uses the lubridate package for date handling
- Starts with January 1, 2023
- Creates a sequence of 5 weekly dates all at once using vectorization
Expected Output:
| week | date_value |
|---|---|
| 1 | 2023-01-08 |
| 2 | 2023-01-15 |
| 3 | 2023-01-22 |
| 4 | 2023-01-29 |
| 5 | 2023-02-05 |
7. Using across() in R as an Alternative to DO Loops
| Capability | SAS DO Loops | R across() Function |
|---|---|---|
| Process multiple columns | Loop through array elements | across() selects multiple columns |
| Apply same transformation | Same code in loop body | Single function applied to all selected columns |
| Column selection | Must define arrays | Flexible selection with helpers like starts_with() |
| Performance | Row-by-row processing | Optimized vectorized operations |
SAS Example: Convert Multiple Columns to Uppercase
data patients_clean;
set patients;
/* Define array of character variables */
array chars[3] name city state;
/* Convert each to uppercase */
do i = 1 to 3;
chars[i] = upcase(chars[i]);
end;
drop i;
run;
Explanation (SAS):
- Creates an array of three character columns: name, city, and state
- Uses a DO loop to iterate through each column
- Applies the UPCASE function to convert each value to uppercase
- Each column is processed one at a time
R Example: Convert Multiple Columns to Uppercase
library(dplyr)
patients_clean <- patients %>%
mutate(across(c(name, city, state), toupper))
Explanation (R):
- Uses
across()to select multiple columns (name, city, and state) - Applies the
toupper()function to all selected columns at once - No need for a loop - one line replaces the entire DO loop structure
Example with Column Selection Patterns
# Convert all character columns to uppercase
patients_clean <- patients %>%
mutate(across(where(is.character), toupper))
# Convert only columns that start with "addr_" to uppercase
patients_clean <- patients %>%
mutate(across(starts_with("addr_"), toupper))
Explanation:
where(is.character)selects all character columnsstarts_with("addr_")selects columns with names starting with "addr_"- This flexibility makes it easy to apply transformations to groups of columns
Example with Custom Functions
# Round all numeric columns to 1 decimal place
data_rounded <- data %>%
mutate(across(where(is.numeric), ~round(., 1)))
Explanation:
- Selects all numeric columns using
where(is.numeric) - Uses a formula (~) to define a custom function that rounds to 1 decimal
- The dot (.) represents each column's values
Key Benefits over DO Loops:
- More concise code - often just one line instead of a multi-line loop
- More readable - clearly shows what's being transformed and how
- Usually faster performance with vectorized operations
- Less error-prone - no need to manage loop counters or array indices
Input Example:
| name | city | state | age |
|---|---|---|---|
| John Smith | New York | NY | 45 |
| Mary Jones | Dallas | TX | 36 |
Expected Output:
| name | city | state | age |
|---|---|---|---|
| JOHN SMITH | NEW YORK | NY | 45 |
| MARY JONES | DALLAS | TX | 36 |
8. Apply Family Functions in R: Loop Alternatives
| Capability | SAS DO Loops | R Apply Functions |
|---|---|---|
| Apply to all rows | Row-by-row DO loop | apply(data, 1, function) |
| Apply to all columns | Loop through array elements | apply(data, 2, function) |
| Apply to list elements | Loop through array | lapply(list, function) |
| Apply with name preservation | Manual naming in loop | sapply(list, function) or mapply(function, ...) |
| Apply with result combination | Collect results, then process | vapply() for type safety or map_dfr() to combine |
SAS Example: Calculate Row Sums
data row_stats;
set scores;
/* Calculate sum for each row using a loop through columns */
array score_cols[4] q1 q2 q3 q4;
row_sum = 0;
do i = 1 to 4;
row_sum = row_sum + score_cols[i];
end;
drop i;
run;
Explanation (SAS):
- Uses a DO loop to iterate through each column in the array
- Adds each score to the running total
- Creates a new variable with the sum of all scores
R Example: Using apply() Functions
# Method 1: Using rowSums() (vectorized)
scores$row_sum <- rowSums(scores[, c("q1", "q2", "q3", "q4")])
# Method 2: Using apply()
scores$row_sum <- apply(scores[, c("q1", "q2", "q3", "q4")], 1, sum)
Explanation (R):
- First method uses the specialized
rowSums()function - fastest and most concise - Second method uses
apply()with 1 indicating operation across rows - Both eliminate the need for explicit loops
Example: Processing List Elements
# Create a list of data frames
student_groups <- list(
group_a = data.frame(id = 1:3, score = c(85, 92, 78)),
group_b = data.frame(id = 4:6, score = c(88, 76, 94))
)
# Calculate average score for each group using lapply
group_means <- lapply(student_groups, function(group) mean(group$score))
# Get result as a named vector instead of a list
group_means_vector <- sapply(student_groups, function(group) mean(group$score))
Key Benefits of Apply Functions:
- More concise code than explicit for loops
- Often faster execution (though not always)
- Clearer intent - function describes what you're doing, not just how
- Avoid creating temporary variables for storing intermediate results
- Consistent return structure with different apply variants
When to Use Apply vs Loops:
- Use
applyfamily when working with existing data structures - Use loops when algorithm requires fine-grained control or state tracking
- Use
lapplyfor lists,sapplyfor simplified output,vapplyfor type safety - For data frames, consider
across()(from dplyr) first for column operations
Input Example:
| id | q1 | q2 | q3 | q4 |
|---|---|---|---|---|
| 1 | 85 | 90 | 78 | 88 |
| 2 | 92 | 85 | 90 | 95 |
Expected Output:
| id | q1 | q2 | q3 | q4 | row_sum |
|---|---|---|---|---|---|
| 1 | 85 | 90 | 78 | 88 | 341 |
| 2 | 92 | 85 | 90 | 95 | 362 |
9. Summary: SAS DO Loops vs R Equivalents
| Capability | SAS DO Loops | R Approaches |
|---|---|---|
| Basic iteration | ✓ DO i = start TO end | ✓ for (i in start:end) {} |
| Step values | ✓ DO i = start TO end BY step | ✓ for (i in seq(start, end, by=step)) {} |
| Conditional loops | ✓ DO WHILE / DO UNTIL | ✓ while / repeat with break |
| Arrays with loops | ✓ Natural integration | ✓ Vector indexing with for loops |
| Nested loops | ✓ Common practice | ✓ Available but often avoidable |
| Data generation | ✓ Intuitive row-by-row | ✓ Vectorized operations usually better |
| across() function | ✗ Not available | ✓ Powerful column selection and transformation |
Key Points:
- SAS uses DO loops for repetitive operations and data creation
- R has similar loops but also offers vectorized alternatives that are often better
- In SAS, loops naturally work with the data step's row-by-row processing
- In R, try to use vectorized operations first, then loops if necessary
- The apply family functions (
apply,lapply,sapply, etc.) provide powerful alternatives to explicit loops - The
across()function makes it easy to apply operations to multiple columns at once - Both languages can accomplish the same tasks, but with different approaches
- For beginners: start with simple loops, then learn more advanced techniques
**Resource download links**
1.4.12.-DO-LOOP-Statements-in-SAS-vs-R-equivalent.zip