1.4. Migrating from SAS to R: A Skill Conversion Guide
1.4.7 Slicing Data in SAS vs. R
Slicing refers to selecting specific rows from a dataset—by position, condition, rank, group, or time.
- In SAS, this is usually done using
_N_,FIRSTOBS=,OBS=,BY-group logic, orPROC SQL/SORT. - In R, slicing is handled by functions like
slice(),filter(),slice_head(),slice_max(), and so on—especially withdplyr.
Each use case below is illustrated with:
- Concept
- SAS code + explanation
- R code + explanation
- Input table
- Expected output
1. Select Specific Row Numbers
Concept Pick rows 2, 5, and 7.
SAS Code
data sales_data;
input Region $ Sales Profit;
datalines;
East 1200 200
West 1300 210
East 1400 220
North 1350 215
South 1250 205
West 1280 202
East 1340 208
;
run;
data want;
set sales_data;
if _n_ in (2,5,7);
run;
SAS Code Explanation
data want;- Starts a new data step to create the dataset
want.
- Starts a new data step to create the dataset
set sales_data;- Reads each row from
sales_datasequentially.
- Reads each row from
if _n_ in (2,5,7);_n_is an automatic variable indicating the current row number.- This condition keeps only rows 2, 5, and 7.
run;- Ends the data step and writes the output.
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
slice(c(2, 5, 7))
R Code Explanation
sales_data %>%- Passes the
sales_datadata frame into the next function using the pipe.
- Passes the
slice(c(2, 5, 7))- Selects rows at positions 2, 5, and 7.
- Returns a new data frame with only those rows.
Input
| Row | Region | Sales | Profit |
|---|---|---|---|
| 1 | East | 1200 | 200 |
| 2 | West | 1300 | 210 |
| 3 | East | 1400 | 220 |
| 4 | North | 1350 | 215 |
| 5 | South | 1250 | 205 |
| 6 | West | 1280 | 202 |
| 7 | East | 1340 | 208 |
Output
| Region | Sales | Profit |
|---|---|---|
| West | 1300 | 210 |
| South | 1250 | 205 |
| East | 1340 | 208 |
2. Slice a Range of Rows
Concept Select rows 3 through 6.
SAS Code
proc print data=sales_data(firstobs=3 obs=6);
run;
SAS Code Explanation
proc print data=sales_data(firstobs=3 obs=6);firstobs=3tells SAS to start at row 3.obs=6tells SAS to stop at row 6.- Only rows 3, 4, 5, and 6 are printed.
run;- Executes the procedure.
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
slice(3:6)
R Code Explanation
sales_data %>%- Pipes the data frame to the next function.
slice(3:6)- Selects rows 3, 4, 5, and 6 by position.
Output
| Region | Sales | Profit |
|---|---|---|
| East | 1400 | 220 |
| North | 1350 | 215 |
| South | 1250 | 205 |
| West | 1280 | 202 |
3. Filter Then Slice
Concept Filter on a condition, then pick top rows from the filtered set.
SAS Code
data high_profit;
set sales_data;
if Profit > 205 and _n_ <= 2;
run;
SAS Code Explanation
data high_profit;- Creates a new dataset for high-profit rows.
set sales_data;- Reads each row from
sales_data.
- Reads each row from
if Profit > 205 and _n_ <= 2;- Keeps only rows where
Profitis greater than 205 and the row number is 1 or 2 (after filtering).
- Keeps only rows where
run;- Ends the data step.
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
filter(Profit > 205) %>%
slice(1:2)
R Code Explanation
filter(Profit > 205)- Filters rows where
Profitis greater than 205.
- Filters rows where
slice(1:2)- After filtering, selects the first two rows from the result.
Output
| Region | Sales | Profit |
|---|---|---|
| East | 1400 | 220 |
| North | 1350 | 215 |
4. Top N Rows by Value
Concept
Pick top 3 rows by Sales.
SAS Code
proc sort data=sales_data out=sorted;
by descending Sales;
run;
data top3;
set sorted(obs=3);
run;
SAS Code Explanation
proc sort data=sales_data out=sorted; by descending Sales; run;- Sorts
sales_datain descending order bySalesand saves assorted.
- Sorts
data top3; set sorted(obs=3); run;- Reads only the first 3 rows from the sorted data set into
top3.
- Reads only the first 3 rows from the sorted data set into
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
arrange(desc(Sales)) %>%
slice_head(n = 3)
R Code Explanation
arrange(desc(Sales))- Sorts the data frame in descending order by
Sales.
- Sorts the data frame in descending order by
slice_head(n = 3)- Selects the first 3 rows from the sorted data.
Output
| Region | Sales | Profit |
|---|---|---|
| East | 1400 | 220 |
| North | 1350 | 215 |
| East | 1340 | 208 |
5. First Row per Group
Concept
Pick first observation within each Region.
SAS Code
proc sort data=sales_data; by Region; run;
data first_in_group;
set sales_data;
by Region;
if first.Region;
run;
SAS Code Explanation
proc sort data=sales_data; by Region; run;- Sorts the data by
Regionso that group boundaries are contiguous.
- Sorts the data by
data first_in_group; set sales_data; by Region; if first.Region; run;- For each group of
Region, keeps only the first row (wherefirst.Regionis true).
- For each group of
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
group_by(Region) %>%
slice(1)
R Code Explanation
group_by(Region)- Groups the data by
Region.
- Groups the data by
slice(1)- Selects the first row from each group.
Output
| Region | Sales | Profit |
|---|---|---|
| East | 1200 | 200 |
| North | 1350 | 215 |
| South | 1250 | 205 |
| West | 1300 | 210 |
6. Drop Rows by Position
Concept Drop rows 1 and 7.
SAS Code
data want;
set sales_data;
if _n_ not in (1,7);
run;
SAS Code Explanation
data want; set sales_data; if _n_ not in (1,7); run;- Reads all rows except those at positions 1 and 7.
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
slice(-c(1, 7))
R Code Explanation
slice(-c(1, 7))- Negative indices in
slice()drop the specified rows.
- Negative indices in
7. Row with Maximum Value
Concept
Get the row with the highest Profit.
SAS Code
proc sql outobs=1;
select * from sales_data
order by Profit desc;
quit;
SAS Code Explanation
proc sql outobs=1;- Limits the output to 1 row.
select * from sales_data order by Profit desc;- Orders by
Profitdescending, so the first row is the maximum.
- Orders by
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
slice_max(Profit, n = 1)
R Code Explanation
slice_max(Profit, n = 1)- Selects the row with the maximum value of
Profit.
- Selects the row with the maximum value of
8. Top-N Rows per Group
Concept
Get top 2 Sales rows within each Region.
SAS Code
proc sort data=sales_data;
by Region descending Sales;
run;
data top2;
set sales_data;
by Region;
retain counter 0;
if first.Region then counter = 1;
else counter + 1;
if counter <= 2;
run;
SAS Code Explanation
proc sort data=sales_data; by Region descending Sales; run;- Sorts by
Regionand then bySalesdescending within each region.
- Sorts by
retain counter 0;- Initializes a counter variable.
if first.Region then counter = 1; else counter + 1;- Resets counter at the start of each group, increments otherwise.
if counter <= 2;- Keeps only the top 2 rows per group.
R Code
library(dplyr)
sales_data <- data.frame(
Region = c("East", "West", "East", "North", "South", "West", "East"),
Sales = c(1200, 1300, 1400, 1350, 1250, 1280, 1340),
Profit = c(200, 210, 220, 215, 205, 202, 208)
)
sales_data %>%
group_by(Region) %>%
slice_max(Sales, n = 2)
R Code Explanation
group_by(Region)- Groups by
Region.
- Groups by
slice_max(Sales, n = 2)- Selects the top 2 rows by
Saleswithin each group.
- Selects the top 2 rows by
9. Bottom-N per Group
Concept
Pick row with lowest Profit in each Category.
SAS Code
data sales_data;
input Category $ Profit;
datalines;
A 100
A 90
B 80
B 70
;
run;
proc sort data=sales_data;
by Category Profit;
run;
data bottom1;
set sales_data;
by Category;
if first.Category;
run;
SAS Code Explanation
data sales_data;- Creates a dataset with
CategoryandProfit.
- Creates a dataset with
proc sort data=sales_data; by Category Profit; run;- Sorts by
Categoryand then byProfitascending.
- Sorts by
if first.Category;- Keeps the first row in each category, which is the lowest
Profit.
- Keeps the first row in each category, which is the lowest
R Code
library(dplyr)
sales_data <- data.frame(
Category = c("A", "A", "B", "B"),
Profit = c(100, 90, 80, 70)
)
sales_data %>%
group_by(Category) %>%
slice_min(Profit, n = 1)
R Code Explanation
group_by(Category)- Groups by
Category.
- Groups by
slice_min(Profit, n = 1)- Selects the row with the minimum
Profitin each group.
- Selects the row with the minimum
10. Time-Based Slicing (Last 30 Days)
Concept
Keep rows where the Date is within the last 30 days.
SAS Code
data sales_data;
input Date : yymmdd10. Sales;
format Date yymmdd10.;
datalines;
2023-05-01 1000
2023-06-15 1100
2023-07-01 1200
2023-07-28 1500
;
run;
data recent;
set sales_data;
if Date >= today() - 30;
run;
SAS Code Explanation
data sales_data;- Creates a dataset with
DateandSales.
- Creates a dataset with
input Date : yymmdd10. Sales;- Reads
DateinYYYY-MM-DDformat.
- Reads
format Date yymmdd10.;- Sets the display format for
Date.
- Sets the display format for
data recent; set sales_data; if Date >= today() - 30; run;- Keeps only rows where
Dateis within the last 30 days from today.
- Keeps only rows where
R Code
library(dplyr)
sales_data <- data.frame(
Date = as.Date(c("2023-05-01", "2023-06-15", "2023-07-01", "2023-07-28")),
Sales = c(1000, 1100, 1200, 1500)
)
sales_data %>%
filter(Date >= Sys.Date() - 30)
R Code Explanation
filter(Date >= Sys.Date() - 30)- Keeps rows where
Dateis within the last 30 days. - Make sure
Dateis of typeDatein R.
- Keeps rows where
Input
| Date | Sales |
|---|---|
| 2023-05-01 | 1000 |
| 2023-06-15 | 1100 |
| 2023-07-01 | 1200 |
| 2023-07-28 | 1500 |
Output (assuming today = 2023-07-30)
| Date | Sales |
|---|---|
| 2023-07-01 | 1200 |
| 2023-07-28 | 1500 |
Summary: SAS vs. R for Slicing
| Scenario | SAS | R (dplyr) |
|---|---|---|
| Fixed rows | if _n_ in (2,5,7) |
slice(c(2,5,7)) |
| Range | firstobs=3 obs=6 |
slice(3:6) |
| Filter then slice | if Profit>205 and _n_<=2 |
filter(Profit > 205) %>% slice(1:2) |
| Top-N | sort desc + obs=3 |
arrange(desc(Sales)) %>% slice_head(n = 3) |
| First per group | if first.Region |
group_by(Region) %>% slice(1) |
| Drop rows | if _n_ not in (1,7) |
slice(-c(1,7)) |
| Max value row | order by Profit desc, outobs=1 |
slice_max(Profit, n = 1) |
| Top-N per group | by group + retain counter |
group_by() %>% slice_max(Sales, n = 2) |
| Bottom-N per group | by group ascending sort + first.group |
group_by() %>% slice_min(Profit, n = 1) |
| Time-based window slicing | if Date >= today() - 30 |
filter(Date >= Sys.Date() - 30) |
**Resource download links**
1.4.7.-Slicing-Data-in-SAS-vs-R.zip