1.5. SAS PROCEDURES -> Equivalent in R
1.5.4. PROC SORT in SAS® and arrange() in R
1. Basic Sorting by a Single Column
Goal: Sort patients by AGE (ascending).
Input
| PATID | AGE | SITE |
|---|---|---|
| P01 | 45 | 101 |
| P02 | 68 | 102 |
| P03 | 28 | 103 |
SAS®
proc sort data=patients;
by age;
run;
- Explanation: The
BY AGE;statement tells SAS to sort the data starting from the smallest to the largest value of theAGEvariable. The originalpatientsdataset is overwritten unlessOUT=is specified.
R
# Dummy data for execution
library(dplyr)
patients <- tibble::tibble(
PATID = c("P01", "P02", "P03"),
AGE = c(45, 68, 28),
SITE = c(101, 102, 103)
)
patients <- patients %>%
arrange(AGE)
- Explanation:
arrange(AGE)arranges rows in ascending order ofAGE. The%>%pipe operator passes the originalpatientsdataset into the function chain.
Output
| PATID | AGE | SITE |
|---|---|---|
| P03 | 28 | 103 |
| P01 | 45 | 101 |
| P02 | 68 | 102 |
2. Sort by Multiple Columns (Primary + Secondary)
Goal: Sort by SITE, then within each SITE by descending AGE.
Input
| PATID | AGE | SITE |
|---|---|---|
| P01 | 45 | 101 |
| P04 | 55 | 101 |
| P03 | 28 | 102 |
| P02 | 68 | 102 |
SAS®
proc sort data=patients;
by site descending age;
run;
- Explanation: This tells SAS to first sort by
SITEin ascending order (default), and then within each site, order patients by descendingAGE.
R
patients <- tibble::tibble(
PATID = c("P01", "P04", "P03", "P02"),
AGE = c(45, 55, 28, 68),
SITE = c(101, 101, 102, 102)
)
patients <- patients %>%
arrange(SITE, desc(AGE))
- Explanation:
arrange(SITE, desc(AGE))groups data bySITEand orders within each group from highest to lowestAGE. Like SAS, sorting left to right defines priority.
Output
| PATID | AGE | SITE |
|---|---|---|
| P04 | 55 | 101 |
| P01 | 45 | 101 |
| P02 | 68 | 102 |
| P03 | 28 | 102 |
3. Custom Sort Order Using Age Groups
Goal: Sort by custom-defined age_group, then by actual age.
Input
| PATID | AGE | SITE |
|---|---|---|
| P01 | 45 | 101 |
| P02 | 68 | 102 |
| P03 | 15 | 103 |
SAS®
data patients;
set patients;
if age >= 65 then group = "Senior";
else if age >= 30 then group = "Adult";
else group = "Youth";
run;
proc sort data=patients;
by group age;
run;
- Explanation: A
DATAstep first creates a new variablegroupbased onAGE. ThenPROC SORTorders by this group and secondarily by age.
R
patients <- tibble::tibble(
PATID = c("P01", "P02", "P03"),
AGE = c(45, 68, 15),
SITE = c(101, 102, 103)
)
patients <- patients %>%
mutate(group = case_when(
AGE >= 65 ~ "Senior",
AGE >= 30 ~ "Adult",
TRUE ~ "Youth"
),
group = factor(group, levels = c("Senior", "Adult", "Youth"))) %>%
arrange(group, AGE)
- Explanation:
mutate()creates a newgroupvariable usingcase_when(). The factor levels define the custom sort order.arrange()then sorts by group label and age.
Output
| PATID | AGE | SITE | group |
|---|---|---|---|
| P02 | 68 | 102 | Senior |
| P01 | 45 | 101 | Adult |
| P03 | 15 | 103 | Youth |
4. How Missing Values Are Handled
Input
| PATID | AGE | SITE |
|---|---|---|
| P01 | 45 | NA |
| P02 | 68 | 101 |
| P03 | 28 | 103 |
SAS®
proc sort data=patients;
by site;
run;
- Explanation: SAS considers missing values (represented as a dot
.) less than any other value, so they appear first by default in ascending order.
R
patients <- tibble::tibble(
PATID = c("P01", "P02", "P03"),
AGE = c(45, 68, 28),
SITE = c(NA, 101, 103)
)
patients <- patients %>%
arrange(SITE)
- Explanation: In R,
NAvalues are always placed last in sorting, whether numeric or character, even indesc()unless custom handling is added.
Output – SAS®
| PATID | SITE |
|---|---|
| P01 | . |
| P02 | 101 |
| P03 | 103 |
Output – R
| PATID | SITE |
|---|---|
| P02 | 101 |
| P03 | 103 |
| P01 | NA |
**Resource download links**
1.5.4.-PROC-SORT-in-SAS-and-arrange()-in-R.zip
⁂