1.4.7. Slicing Data in SAS vs. R

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.7 Slicing Data in SAS vs. R

Slicing refers to selecting specific rows from a dataset—by position, condition, rank, group, or time.

In SAS, this is usually done using _N_, FIRSTOBS=, OBS=, BY-group logic, or PROC SQL/SORT.
In R, slicing is handled by functions like slice(), filter(), slice_head(), slice_max(), and so on—especially with dplyr.

Each use case below is illustrated with:

Concept
SAS code + explanation
R code + explanation
Input table
Expected output

1. Select Specific Row Numbers

Concept Pick rows 2, 5, and 7.

SAS Code

data sales_data;
  input Region $ Sales Profit;
  datalines;
East 1200 200
West 1300 210
East 1400 220
North 1350 215
South 1250 205
West 1280 202
East 1340 208
;
run;

data want;
  set sales_data;
  if _n_ in (2,5,7);
run;

SAS Code Explanation

data want;
- Starts a new data step to create the dataset want.
set sales_data;
- Reads each row from sales_data sequentially.
if _n_ in (2,5,7);
- _n_ is an automatic variable indicating the current row number.
- This condition keeps only rows 2, 5, and 7.
run;
- Ends the data step and writes the output.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(c(2, 5, 7))

R Code Explanation

sales_data %>%
- Passes the sales_data data frame into the next function using the pipe.
slice(c(2, 5, 7))
- Selects rows at positions 2, 5, and 7.
- Returns a new data frame with only those rows.

Input

Row	Region	Sales	Profit
1	East	1200	200
2	West	1300	210
3	East	1400	220
4	North	1350	215
5	South	1250	205
6	West	1280	202
7	East	1340	208

Output

Region	Sales	Profit
West	1300	210
South	1250	205
East	1340	208

2. Slice a Range of Rows

Concept Select rows 3 through 6.

SAS Code

proc print data=sales_data(firstobs=3 obs=6);
run;

SAS Code Explanation

proc print data=sales_data(firstobs=3 obs=6);
- firstobs=3 tells SAS to start at row 3.
- obs=6 tells SAS to stop at row 6.
- Only rows 3, 4, 5, and 6 are printed.
run;
- Executes the procedure.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(3:6)

R Code Explanation

sales_data %>%
- Pipes the data frame to the next function.
slice(3:6)
- Selects rows 3, 4, 5, and 6 by position.

Output

Region	Sales	Profit
East	1400	220
North	1350	215
South	1250	205
West	1280	202

3. Filter Then Slice

Concept Filter on a condition, then pick top rows from the filtered set.

SAS Code

data high_profit;
  set sales_data;
  if Profit > 205 and _n_ <= 2;
run;

SAS Code Explanation

data high_profit;
- Creates a new dataset for high-profit rows.
set sales_data;
- Reads each row from sales_data.
if Profit > 205 and _n_ <= 2;
- Keeps only rows where Profit is greater than 205 and the row number is 1 or 2 (after filtering).
run;
- Ends the data step.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  filter(Profit > 205) %>%
  slice(1:2)

R Code Explanation

filter(Profit > 205)
- Filters rows where Profit is greater than 205.
slice(1:2)
- After filtering, selects the first two rows from the result.

Output

Region	Sales	Profit
East	1400	220
North	1350	215

4. Top N Rows by Value

Concept Pick top 3 rows by Sales.

SAS Code

proc sort data=sales_data out=sorted;
  by descending Sales;
run;

data top3;
  set sorted(obs=3);
run;

SAS Code Explanation

proc sort data=sales_data out=sorted; by descending Sales; run;
- Sorts sales_data in descending order by Sales and saves as sorted.
data top3; set sorted(obs=3); run;
- Reads only the first 3 rows from the sorted data set into top3.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  arrange(desc(Sales)) %>%
  slice_head(n = 3)

R Code Explanation

arrange(desc(Sales))
- Sorts the data frame in descending order by Sales.
slice_head(n = 3)
- Selects the first 3 rows from the sorted data.

Output

Region	Sales	Profit
East	1400	220
North	1350	215
East	1340	208

5. First Row per Group

Concept Pick first observation within each Region.

SAS Code

proc sort data=sales_data; by Region; run;

data first_in_group;
  set sales_data;
  by Region;
  if first.Region;
run;

SAS Code Explanation

proc sort data=sales_data; by Region; run;
- Sorts the data by Region so that group boundaries are contiguous.
data first_in_group; set sales_data; by Region; if first.Region; run;
- For each group of Region, keeps only the first row (where first.Region is true).

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  group_by(Region) %>%
  slice(1)

R Code Explanation

group_by(Region)
- Groups the data by Region.
slice(1)
- Selects the first row from each group.

Output

Region	Sales	Profit
East	1200	200
North	1350	215
South	1250	205
West	1300	210

6. Drop Rows by Position

Concept Drop rows 1 and 7.

SAS Code

data want;
  set sales_data;
  if _n_ not in (1,7);
run;

SAS Code Explanation

data want; set sales_data; if _n_ not in (1,7); run;
- Reads all rows except those at positions 1 and 7.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice(-c(1, 7))

R Code Explanation

slice(-c(1, 7))
- Negative indices in slice() drop the specified rows.

7. Row with Maximum Value

Concept Get the row with the highest Profit.

SAS Code

proc sql outobs=1;
  select * from sales_data
  order by Profit desc;
quit;

SAS Code Explanation

proc sql outobs=1;
- Limits the output to 1 row.
select * from sales_data order by Profit desc;
- Orders by Profit descending, so the first row is the maximum.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  slice_max(Profit, n = 1)

R Code Explanation

slice_max(Profit, n = 1)
- Selects the row with the maximum value of Profit.

8. Top-N Rows per Group

Concept Get top 2 Sales rows within each Region.

SAS Code

proc sort data=sales_data;
  by Region descending Sales;
run;

data top2;
  set sales_data;
  by Region;
  retain counter 0;
  if first.Region then counter = 1;
  else counter + 1;
  if counter <= 2;
run;

SAS Code Explanation

proc sort data=sales_data; by Region descending Sales; run;
- Sorts by Region and then by Sales descending within each region.
retain counter 0;
- Initializes a counter variable.
if first.Region then counter = 1; else counter + 1;
- Resets counter at the start of each group, increments otherwise.
if counter <= 2;
- Keeps only the top 2 rows per group.

R Code

library(dplyr)
sales_data <- data.frame(
  Region = c("East", "West", "East", "North", "South", "West", "East"),
  Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
  Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
  group_by(Region) %>%
  slice_max(Sales, n = 2)

R Code Explanation

group_by(Region)
- Groups by Region.
slice_max(Sales, n = 2)
- Selects the top 2 rows by Sales within each group.

9. Bottom-N per Group

Concept Pick row with lowest Profit in each Category.

SAS Code

data sales_data;
  input Category $ Profit;
  datalines;
A 100
A 90
B 80
B 70
;
run;

proc sort data=sales_data;
  by Category Profit;
run;

data bottom1;
  set sales_data;
  by Category;
  if first.Category;
run;

SAS Code Explanation

data sales_data;
- Creates a dataset with Category and Profit.
proc sort data=sales_data; by Category Profit; run;
- Sorts by Category and then by Profit ascending.
if first.Category;
- Keeps the first row in each category, which is the lowest Profit.

R Code

library(dplyr)
sales_data <- data.frame(
  Category = c("A", "A", "B", "B"),
  Profit = c(100, 90, 80, 70)
)
sales_data %>%
  group_by(Category) %>%
  slice_min(Profit, n = 1)

R Code Explanation

group_by(Category)
- Groups by Category.
slice_min(Profit, n = 1)
- Selects the row with the minimum Profit in each group.

10. Time-Based Slicing (Last 30 Days)

Concept Keep rows where the Date is within the last 30 days.

SAS Code

data sales_data;
  input Date : yymmdd10. Sales;
  format Date yymmdd10.;
  datalines;
2023-05-01 1000
2023-06-15 1100
2023-07-01 1200
2023-07-28 1500
;
run;

data recent;
  set sales_data;
  if Date >= today() - 30;
run;

SAS Code Explanation

data sales_data;
- Creates a dataset with Date and Sales.
input Date : yymmdd10. Sales;
- Reads Date in YYYY-MM-DD format.
format Date yymmdd10.;
- Sets the display format for Date.
data recent; set sales_data; if Date >= today() - 30; run;
- Keeps only rows where Date is within the last 30 days from today.

R Code

library(dplyr)
sales_data <- data.frame(
  Date = as.Date(c("2023-05-01", "2023-06-15", "2023-07-01", "2023-07-28")),
  Sales = c(1000, 1100, 1200, 1500)
)
sales_data %>%
  filter(Date >= Sys.Date() - 30)

R Code Explanation

filter(Date >= Sys.Date() - 30)
- Keeps rows where Date is within the last 30 days.
- Make sure Date is of type Date in R.

Input

Date	Sales
2023-05-01	1000
2023-06-15	1100
2023-07-01	1200
2023-07-28	1500

Output (assuming today = 2023-07-30)

Date	Sales
2023-07-01	1200
2023-07-28	1500

Summary: SAS vs. R for Slicing

Scenario	SAS	R (dplyr)
Fixed rows	`if _n_ in (2,5,7)`	`slice(c(2,5,7))`
Range	`firstobs=3 obs=6`	`slice(3:6)`
Filter then slice	`if Profit>205 and _n_<=2`	`filter(Profit > 205) %>% slice(1:2)`
Top-N	`sort desc + obs=3`	`arrange(desc(Sales)) %>% slice_head(n = 3)`
First per group	`if first.Region`	`group_by(Region) %>% slice(1)`
Drop rows	`if _n_ not in (1,7)`	`slice(-c(1,7))`
Max value row	`order by Profit desc, outobs=1`	`slice_max(Profit, n = 1)`
Top-N per group	`by group + retain counter`	`group_by() %>% slice_max(Sales, n = 2)`
Bottom-N per group	`by group ascending sort + first.group`	`group_by() %>% slice_min(Profit, n = 1)`
Time-based window slicing	`if Date >= today() - 30`	`filter(Date >= Sys.Date() - 30)`

Resource download links

1.4.7.-Slicing-Data-in-SAS-vs-R.zip

⁂

1.4. Migrating from SAS to R: A Skill Conversion Guide