contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.12. DO LOOP Statements in SAS vs R equivalent

1. Basic Iterative DO Loops

Capability SAS (DO Loops) R (Loops and Vectorized Operations)
Simple iteration DO i = start TO end; for (i in start:end) {}
Iteration with steps DO i = start TO end BY step; for (i in seq(start, end, by=step)) {}
Iteration with condition DO i = start TO end WHILE (condition); for (i in start:end) { if (!condition) break }

SAS Example: Simple Counting

/* Simple counter that makes a dataset with numbers 1-5 */
data counter;
  do i = 1 to 5;
    output;
  end;
run;

Explanation (SAS):

  • Creates a dataset with 5 rows
  • Each row contains a single number (1, 2, 3, 4, or 5)
  • The output statement creates a new row for each value of i

R Example: Simple Counting

# Simple counter that makes a data frame with numbers 1-5
counter <- data.frame(i = 1:5)

Explanation (R):

  • Creates a data frame with 5 rows
  • Each row contains a single number from 1 to 5
  • In R, we can directly create the sequence in one step

Expected Output (Both Examples):

i
1
2
3
4
5

Key Points About Basic Loops:

  • SAS uses DO loops to repeat actions a specific number of times
  • R has for loops but often doesn't need them thanks to vectorization
  • In both languages, loops help you automate repetitive tasks

2. DO WHILE and DO UNTIL: Conditional Loop Execution

Capability SAS R
Evaluate at beginning DO WHILE (condition); while (condition) {}
Evaluate at end DO UNTIL (condition); repeat { if (condition) break }

SAS Example: Counting to a Target

/* Count until we reach 50 */
data counting;
  sum = 0;      /* Starting sum */
  count = 0;    /* How many numbers we've added */
  
  /* Add numbers until sum reaches 50 */
  do until (sum >= 50);
    count = count + 1;    /* Add 1 to our counter */
    sum = sum + count;    /* Add the counter to our sum */
    output;
  end;
run;

Explanation (SAS):

  • Starts with sum = 0
  • Adds 1, then 2, then 3... until the sum reaches at least 50
  • Records each step to see the progress
  • The loop runs until the condition (sum >= 50) becomes true

R Example: Counting to a Target

# Count until we reach 50
counting <- data.frame(count = numeric(), sum = numeric())
sum <- 0
count <- 0

repeat {
  count <- count + 1        # Add 1 to our counter
  sum <- sum + count        # Add the counter to our sum
  
  # Save the current state
  counting <- rbind(counting, data.frame(count = count, sum = sum))
  
  # Stop when sum reaches 50
  if (sum >= 50) break
}

Explanation (R):

  • Does the same thing as the SAS example
  • Uses repeat with break to mimic DO UNTIL behavior
  • When sum reaches 50, the break statement exits the loop

Expected Output:

count sum
1 1
2 3
3 6
... ...
10 55

3. Nested DO Loops: Handling Multiple Levels

SAS Example: Simple Multiplication Table

/* Create a small multiplication table */
data multiplication;
  do row = 1 to 3;
    do col = 1 to 3;
      product = row * col;
      output;
    end;
  end;
run;

Explanation (SAS):

  • Creates a 3x3 multiplication table
  • Outer loop (row) goes from 1 to 3
  • For each row, inner loop (col) goes from 1 to 3
  • Calculates the product of row * col

R Example: Simple Multiplication Table

# Create a small multiplication table
mult_table <- data.frame(row = numeric(),
                        col = numeric(),
                        product = numeric())

for (row in 1:3) {
  for (col in 1:3) {
    product <- row * col
    mult_table <- rbind(mult_table,
                       data.frame(row = row,
                                 col = col,
                                 product = product))
  }
}

Explanation (R):

  • Does the same as the SAS example
  • Uses nested for loops - one for rows and one for columns
  • For each combination, calculates and stores the product

Expected Output:

row col product
1 1 1
1 2 2
1 3 3
2 1 2
2 2 4
2 3 6
3 1 3
3 2 6
3 3 9

4. DO Loops with Arrays: Working with Multiple Variables

SAS Example: Temperature Conversion

data temperatures;
  /* Store 3 temperatures in Celsius */
  array celsius[3] _temporary_ (0, 25, 100);
  /* Create variables for Fahrenheit values */
  array fahrenheit[3] temp1 temp2 temp3;
  
  /* Convert each temperature from C to F */
  do i = 1 to 3;
    /* F = C*9/5 + 32 */
    fahrenheit[i] = celsius[i] * 9/5 + 32;
  end;
  
  drop i;
run;

Explanation (SAS):

  • Creates an array with 3 Celsius temperatures
  • Creates another array to hold Fahrenheit values
  • The DO loop processes each temperature:
    • Converts from Celsius to Fahrenheit
    • Stores the result in the corresponding position

R Example: Temperature Conversion

# Celsius temperatures
celsius <- c(0, 25, 100)

# Create data frame to store results
temperatures <- data.frame(
  celsius = celsius,
  fahrenheit = numeric(3)  # Empty vector for Fahrenheit values
)

# Convert each temperature
for (i in 1:3) {
  temperatures$fahrenheit[i] <- temperatures$celsius[i] * 9/5 + 32
}

Explanation (R):

  • Creates a vector with Celsius temperatures
  • Makes a data frame with a column for Celsius and a column for Fahrenheit
  • The for loop converts each temperature one by one

Expected Output:

celsius fahrenheit
0 32
25 77
100 212

5. DO WHILE and DO UNTIL: Simple Examples

SAS Example: Doubling a Number

/* Double a number until it exceeds 100 */
data doubling;
  value = 2;
  iteration = 0;
  
  /* Keep doubling until we exceed 100 */
  do while (value <= 100);
    iteration + 1;
    value = value * 2;
    output;
  end;
run;

Explanation (SAS):

  • Starts with value = 2
  • Doubles the value each time through the loop
  • Continues until the value exceeds 100
  • DO WHILE checks the condition at the beginning of each iteration

R Example: Doubling a Number

# Double a number until it exceeds 100
doubling <- data.frame(iteration = numeric(),
                      value = numeric())

value <- 2
iteration <- 0

while (value <= 100) {
  iteration <- iteration + 1
  value <- value * 2
  doubling <- rbind(doubling,
                   data.frame(iteration = iteration,
                             value = value))
}

Explanation (R):

  • Same logic as the SAS example
  • While loop checks the condition before each iteration
  • Stops when value exceeds 100

Expected Output:

iteration value
1 4
2 8
3 16
4 32
5 64
6 128

6. Simple Data Generation: Using Loops to Create Data

SAS Example: Simple Sequence of Dates

/* Create a series of weekly dates */
data weekly_dates;
  format date_value date9.;
  
  /* Start from January 1, 2023 */
  date_value = '01JAN2023'd;
  
  /* Create 5 weekly dates */
  do week = 1 to 5;
    /* Add 7 days for each week */
    date_value = date_value + 7;
    output;
  end;
run;

Explanation (SAS):

  • Starts with January 1, 2023
  • For each of 5 weeks, adds 7 days to get the next date
  • Creates a dataset with 5 weekly dates

R Example: Simple Sequence of Dates

# Create a series of weekly dates
library(lubridate)

# Start date: January 1, 2023
start_date <- ymd("2023-01-01")

# Create data frame with 5 weekly dates
weekly_dates <- data.frame(
  week = 1:5,
  date_value = start_date + days(7 * 1:5)
)

Explanation (R):

  • Uses the lubridate package for date handling
  • Starts with January 1, 2023
  • Creates a sequence of 5 weekly dates all at once using vectorization

Expected Output:

week date_value
1 2023-01-08
2 2023-01-15
3 2023-01-22
4 2023-01-29
5 2023-02-05

7. Using across() in R as an Alternative to DO Loops

Capability SAS DO Loops R across() Function
Process multiple columns Loop through array elements across() selects multiple columns
Apply same transformation Same code in loop body Single function applied to all selected columns
Column selection Must define arrays Flexible selection with helpers like starts_with()
Performance Row-by-row processing Optimized vectorized operations

SAS Example: Convert Multiple Columns to Uppercase

data patients_clean;
  set patients;
  
  /* Define array of character variables */
  array chars[3] name city state;
  
  /* Convert each to uppercase */
  do i = 1 to 3;
    chars[i] = upcase(chars[i]);
  end;
  
  drop i;
run;

Explanation (SAS):

  • Creates an array of three character columns: name, city, and state
  • Uses a DO loop to iterate through each column
  • Applies the UPCASE function to convert each value to uppercase
  • Each column is processed one at a time

R Example: Convert Multiple Columns to Uppercase

library(dplyr)

patients_clean <- patients %>%
  mutate(across(c(name, city, state), toupper))

Explanation (R):

  • Uses across() to select multiple columns (name, city, and state)
  • Applies the toupper() function to all selected columns at once
  • No need for a loop - one line replaces the entire DO loop structure

Example with Column Selection Patterns

# Convert all character columns to uppercase
patients_clean <- patients %>%
  mutate(across(where(is.character), toupper))

# Convert only columns that start with "addr_" to uppercase
patients_clean <- patients %>%
  mutate(across(starts_with("addr_"), toupper))

Explanation:

  • where(is.character) selects all character columns
  • starts_with("addr_") selects columns with names starting with "addr_"
  • This flexibility makes it easy to apply transformations to groups of columns

Example with Custom Functions

# Round all numeric columns to 1 decimal place
data_rounded <- data %>%
  mutate(across(where(is.numeric), ~round(., 1)))

Explanation:

  • Selects all numeric columns using where(is.numeric)
  • Uses a formula (~) to define a custom function that rounds to 1 decimal
  • The dot (.) represents each column's values

Key Benefits over DO Loops:

  • More concise code - often just one line instead of a multi-line loop
  • More readable - clearly shows what's being transformed and how
  • Usually faster performance with vectorized operations
  • Less error-prone - no need to manage loop counters or array indices

Input Example:

name city state age
John Smith New York NY 45
Mary Jones Dallas TX 36

Expected Output:

name city state age
JOHN SMITH NEW YORK NY 45
MARY JONES DALLAS TX 36

8. Apply Family Functions in R: Loop Alternatives

Capability SAS DO Loops R Apply Functions
Apply to all rows Row-by-row DO loop apply(data, 1, function)
Apply to all columns Loop through array elements apply(data, 2, function)
Apply to list elements Loop through array lapply(list, function)
Apply with name preservation Manual naming in loop sapply(list, function) or mapply(function, ...)
Apply with result combination Collect results, then process vapply() for type safety or map_dfr() to combine

SAS Example: Calculate Row Sums

data row_stats;
  set scores;
  
  /* Calculate sum for each row using a loop through columns */
  array score_cols[4] q1 q2 q3 q4;
  row_sum = 0;
  
  do i = 1 to 4;
    row_sum = row_sum + score_cols[i];
  end;
  
  drop i;
run;

Explanation (SAS):

  • Uses a DO loop to iterate through each column in the array
  • Adds each score to the running total
  • Creates a new variable with the sum of all scores

R Example: Using apply() Functions

# Method 1: Using rowSums() (vectorized)
scores$row_sum <- rowSums(scores[, c("q1", "q2", "q3", "q4")])

# Method 2: Using apply()
scores$row_sum <- apply(scores[, c("q1", "q2", "q3", "q4")], 1, sum)

Explanation (R):

  • First method uses the specialized rowSums() function - fastest and most concise
  • Second method uses apply() with 1 indicating operation across rows
  • Both eliminate the need for explicit loops

Example: Processing List Elements

# Create a list of data frames
student_groups <- list(
  group_a = data.frame(id = 1:3, score = c(85, 92, 78)),
  group_b = data.frame(id = 4:6, score = c(88, 76, 94))
)

# Calculate average score for each group using lapply
group_means <- lapply(student_groups, function(group) mean(group$score))

# Get result as a named vector instead of a list
group_means_vector <- sapply(student_groups, function(group) mean(group$score))

Key Benefits of Apply Functions:

  • More concise code than explicit for loops
  • Often faster execution (though not always)
  • Clearer intent - function describes what you're doing, not just how
  • Avoid creating temporary variables for storing intermediate results
  • Consistent return structure with different apply variants

When to Use Apply vs Loops:

  • Use apply family when working with existing data structures
  • Use loops when algorithm requires fine-grained control or state tracking
  • Use lapply for lists, sapply for simplified output, vapply for type safety
  • For data frames, consider across() (from dplyr) first for column operations

Input Example:

id q1 q2 q3 q4
1 85 90 78 88
2 92 85 90 95

Expected Output:

id q1 q2 q3 q4 row_sum
1 85 90 78 88 341
2 92 85 90 95 362

9. Summary: SAS DO Loops vs R Equivalents

Capability SAS DO Loops R Approaches
Basic iteration ✓ DO i = start TO end ✓ for (i in start:end) {}
Step values ✓ DO i = start TO end BY step ✓ for (i in seq(start, end, by=step)) {}
Conditional loops ✓ DO WHILE / DO UNTIL ✓ while / repeat with break
Arrays with loops ✓ Natural integration ✓ Vector indexing with for loops
Nested loops ✓ Common practice ✓ Available but often avoidable
Data generation ✓ Intuitive row-by-row ✓ Vectorized operations usually better
across() function ✗ Not available ✓ Powerful column selection and transformation

Key Points:

  • SAS uses DO loops for repetitive operations and data creation
  • R has similar loops but also offers vectorized alternatives that are often better
  • In SAS, loops naturally work with the data step's row-by-row processing
  • In R, try to use vectorized operations first, then loops if necessary
  • The apply family functions (apply, lapply, sapply, etc.) provide powerful alternatives to explicit loops
  • The across() function makes it easy to apply operations to multiple columns at once
  • Both languages can accomplish the same tasks, but with different approaches
  • For beginners: start with simple loops, then learn more advanced techniques

**Resource download links**

1.4.12.-DO-LOOP-Statements-in-SAS-vs-R-equivalent.zip