contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.7 Slicing Data in SAS vs. R

Slicing refers to selecting specific rows from a dataset—by position, condition, rank, group, or time.

  • In SAS, this is usually done using _N_, FIRSTOBS=, OBS=, BY-group logic, or PROC SQL/SORT.
  • In R, slicing is handled by functions like slice(), filter(), slice_head(), slice_max(), and so on—especially with dplyr.

Each use case below is illustrated with:

  • Concept
  • SAS code + explanation
  • R code + explanation
  • Input table
  • Expected output

1. Select Specific Row Numbers

Concept Pick rows 2, 5, and 7.

SAS Code

data sales_data;
  input Region $ Sales Profit;
  datalines;
East 1200 200
West 1300 210
East 1400 220
North 1350 215
South 1250 205
West 1280 202
East 1340 208
;
run;

data want;
  set sales_data;
  if _n_ in (2,5,7);
run;

SAS Code Explanation

  • data want;
    • Starts a new data step to create the dataset want.
  • set sales_data;
    • Reads each row from sales_data sequentially.
  • if _n_ in (2,5,7);
    • _n_ is an automatic variable indicating the current row number.
    • This condition keeps only rows 2, 5, and 7.
  • run;
    • Ends the data step and writes the output.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(c(2, 5, 7))

R Code Explanation

  • sales_data %>%
    • Passes the sales_data data frame into the next function using the pipe.
  • slice(c(2, 5, 7))
    • Selects rows at positions 2, 5, and 7.
    • Returns a new data frame with only those rows.

Input

Row Region Sales Profit
1 East 1200 200
2 West 1300 210
3 East 1400 220
4 North 1350 215
5 South 1250 205
6 West 1280 202
7 East 1340 208

Output

Region Sales Profit
West 1300 210
South 1250 205
East 1340 208

2. Slice a Range of Rows

Concept Select rows 3 through 6.

SAS Code

proc print data=sales_data(firstobs=3 obs=6);
run;

SAS Code Explanation

  • proc print data=sales_data(firstobs=3 obs=6);
    • firstobs=3 tells SAS to start at row 3.
    • obs=6 tells SAS to stop at row 6.
    • Only rows 3, 4, 5, and 6 are printed.
  • run;
    • Executes the procedure.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(3:6)

R Code Explanation

  • sales_data %>%
    • Pipes the data frame to the next function.
  • slice(3:6)
    • Selects rows 3, 4, 5, and 6 by position.

Output

Region Sales Profit
East 1400 220
North 1350 215
South 1250 205
West 1280 202

3. Filter Then Slice

Concept Filter on a condition, then pick top rows from the filtered set.

SAS Code

data high_profit;
  set sales_data;
  if Profit > 205 and _n_ <= 2;
run;

SAS Code Explanation

  • data high_profit;
    • Creates a new dataset for high-profit rows.
  • set sales_data;
    • Reads each row from sales_data.
  • if Profit > 205 and _n_ <= 2;
    • Keeps only rows where Profit is greater than 205 and the row number is 1 or 2 (after filtering).
  • run;
    • Ends the data step.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  filter(Profit > 205) %>%
  slice(1:2)

R Code Explanation

  • filter(Profit > 205)
    • Filters rows where Profit is greater than 205.
  • slice(1:2)
    • After filtering, selects the first two rows from the result.

Output

Region Sales Profit
East 1400 220
North 1350 215

4. Top N Rows by Value

Concept Pick top 3 rows by Sales.

SAS Code

proc sort data=sales_data out=sorted;
  by descending Sales;
run;

data top3;
  set sorted(obs=3);
run;

SAS Code Explanation

  • proc sort data=sales_data out=sorted; by descending Sales; run;
    • Sorts sales_data in descending order by Sales and saves as sorted.
  • data top3; set sorted(obs=3); run;
    • Reads only the first 3 rows from the sorted data set into top3.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  arrange(desc(Sales)) %>%
  slice_head(n = 3)

R Code Explanation

  • arrange(desc(Sales))
    • Sorts the data frame in descending order by Sales.
  • slice_head(n = 3)
    • Selects the first 3 rows from the sorted data.

Output

Region Sales Profit
East 1400 220
North 1350 215
East 1340 208

5. First Row per Group

Concept Pick first observation within each Region.

SAS Code

proc sort data=sales_data; by Region; run;

data first_in_group;
  set sales_data;
  by Region;
  if first.Region;
run;

SAS Code Explanation

  • proc sort data=sales_data; by Region; run;
    • Sorts the data by Region so that group boundaries are contiguous.
  • data first_in_group; set sales_data; by Region; if first.Region; run;
    • For each group of Region, keeps only the first row (where first.Region is true).

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  group_by(Region) %>%
  slice(1)

R Code Explanation

  • group_by(Region)
    • Groups the data by Region.
  • slice(1)
    • Selects the first row from each group.

Output

Region Sales Profit
East 1200 200
North 1350 215
South 1250 205
West 1300 210

6. Drop Rows by Position

Concept Drop rows 1 and 7.

SAS Code

data want;
  set sales_data;
  if _n_ not in (1,7);
run;

SAS Code Explanation

  • data want; set sales_data; if _n_ not in (1,7); run;
    • Reads all rows except those at positions 1 and 7.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(-c(1, 7))

R Code Explanation

  • slice(-c(1, 7))
    • Negative indices in slice() drop the specified rows.

7. Row with Maximum Value

Concept Get the row with the highest Profit.

SAS Code

proc sql outobs=1;
  select * from sales_data
  order by Profit desc;
quit;

SAS Code Explanation

  • proc sql outobs=1;
    • Limits the output to 1 row.
  • select * from sales_data order by Profit desc;
    • Orders by Profit descending, so the first row is the maximum.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice_max(Profit, n = 1)

R Code Explanation

  • slice_max(Profit, n = 1)
    • Selects the row with the maximum value of Profit.

8. Top-N Rows per Group

Concept Get top 2 Sales rows within each Region.

SAS Code

proc sort data=sales_data;
  by Region descending Sales;
run;

data top2;
  set sales_data;
  by Region;
  retain counter 0;
  if first.Region then counter = 1;
  else counter + 1;
  if counter <= 2;
run;

SAS Code Explanation

  • proc sort data=sales_data; by Region descending Sales; run;
    • Sorts by Region and then by Sales descending within each region.
  • retain counter 0;
    • Initializes a counter variable.
  • if first.Region then counter = 1; else counter + 1;
    • Resets counter at the start of each group, increments otherwise.
  • if counter <= 2;
    • Keeps only the top 2 rows per group.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  group_by(Region) %>%
  slice_max(Sales, n = 2)

R Code Explanation

  • group_by(Region)
    • Groups by Region.
  • slice_max(Sales, n = 2)
    • Selects the top 2 rows by Sales within each group.

9. Bottom-N per Group

Concept Pick row with lowest Profit in each Category.

SAS Code

data sales_data;
  input Category $ Profit;
  datalines;
A 100
A 90
B 80
B 70
;
run;

proc sort data=sales_data;
  by Category Profit;
run;

data bottom1;
  set sales_data;
  by Category;
  if first.Category;
run;

SAS Code Explanation

  • data sales_data;
    • Creates a dataset with Category and Profit.
  • proc sort data=sales_data; by Category Profit; run;
    • Sorts by Category and then by Profit ascending.
  • if first.Category;
    • Keeps the first row in each category, which is the lowest Profit.

R Code

library(dplyr)
sales_data <- data.frame(
  Category = c("A", "A", "B", "B"),
  Profit = c(100, 90, 80, 70)
)
sales_data %>%
  group_by(Category) %>%
  slice_min(Profit, n = 1)

R Code Explanation

  • group_by(Category)
    • Groups by Category.
  • slice_min(Profit, n = 1)
    • Selects the row with the minimum Profit in each group.

10. Time-Based Slicing (Last 30 Days)

Concept Keep rows where the Date is within the last 30 days.

SAS Code

data sales_data;
  input Date : yymmdd10. Sales;
  format Date yymmdd10.;
  datalines;
2023-05-01 1000
2023-06-15 1100
2023-07-01 1200
2023-07-28 1500
;
run;

data recent;
  set sales_data;
  if Date >= today() - 30;
run;

SAS Code Explanation

  • data sales_data;
    • Creates a dataset with Date and Sales.
  • input Date : yymmdd10. Sales;
    • Reads Date in YYYY-MM-DD format.
  • format Date yymmdd10.;
    • Sets the display format for Date.
  • data recent; set sales_data; if Date >= today() - 30; run;
    • Keeps only rows where Date is within the last 30 days from today.

R Code

library(dplyr)
sales_data <- data.frame(
  Date = as.Date(c("2023-05-01", "2023-06-15", "2023-07-01", "2023-07-28")),
  Sales = c(1000, 1100, 1200, 1500)
)
sales_data %>%
  filter(Date >= Sys.Date() - 30)

R Code Explanation

  • filter(Date >= Sys.Date() - 30)
    • Keeps rows where Date is within the last 30 days.
    • Make sure Date is of type Date in R.

Input

Date Sales
2023-05-01 1000
2023-06-15 1100
2023-07-01 1200
2023-07-28 1500

Output (assuming today = 2023-07-30)

Date Sales
2023-07-01 1200
2023-07-28 1500

Summary: SAS vs. R for Slicing

Scenario SAS R (dplyr)
Fixed rows if _n_ in (2,5,7) slice(c(2,5,7))
Range firstobs=3 obs=6 slice(3:6)
Filter then slice if Profit>205 and _n_<=2 filter(Profit > 205) %>% slice(1:2)
Top-N sort desc + obs=3 arrange(desc(Sales)) %>% slice_head(n = 3)
First per group if first.Region group_by(Region) %>% slice(1)
Drop rows if _n_ not in (1,7) slice(-c(1,7))
Max value row order by Profit desc, outobs=1 slice_max(Profit, n = 1)
Top-N per group by group + retain counter group_by() %>% slice_max(Sales, n = 2)
Bottom-N per group by group ascending sort + first.group group_by() %>% slice_min(Profit, n = 1)
Time-based window slicing if Date >= today() - 30 filter(Date >= Sys.Date() - 30)

**Resource download links**

1.4.7.-Slicing-Data-in-SAS-vs-R.zip