contact@a2zlearners.com

1.5. SAS PROCEDURES -> Equivalent in R

1.5.4. PROC SORT in SAS® and arrange() in R

1. Basic Sorting by a Single Column

Goal: Sort patients by AGE (ascending).

Input

PATID AGE SITE
P01 45 101
P02 68 102
P03 28 103

SAS®

proc sort data=patients;
  by age;
run;
  • Explanation: The BY AGE; statement tells SAS to sort the data starting from the smallest to the largest value of the AGE variable. The original patients dataset is overwritten unless OUT= is specified.

R

# Dummy data for execution
library(dplyr)
patients <- tibble::tibble(
  PATID = c("P01", "P02", "P03"),
  AGE = c(45, 68, 28),
  SITE = c(101, 102, 103)
)
patients <- patients %>%
  arrange(AGE)
  • Explanation: arrange(AGE) arranges rows in ascending order of AGE. The %>% pipe operator passes the original patients dataset into the function chain.

Output

PATID AGE SITE
P03 28 103
P01 45 101
P02 68 102

2. Sort by Multiple Columns (Primary + Secondary)

Goal: Sort by SITE, then within each SITE by descending AGE.

Input

PATID AGE SITE
P01 45 101
P04 55 101
P03 28 102
P02 68 102

SAS®

proc sort data=patients;
  by site descending age;
run;
  • Explanation: This tells SAS to first sort by SITE in ascending order (default), and then within each site, order patients by descending AGE.

R

patients <- tibble::tibble(
  PATID = c("P01", "P04", "P03", "P02"),
  AGE = c(45, 55, 28, 68),
  SITE = c(101, 101, 102, 102)
)
patients <- patients %>%
  arrange(SITE, desc(AGE))
  • Explanation: arrange(SITE, desc(AGE)) groups data by SITE and orders within each group from highest to lowest AGE. Like SAS, sorting left to right defines priority.

Output

PATID AGE SITE
P04 55 101
P01 45 101
P02 68 102
P03 28 102

3. Custom Sort Order Using Age Groups

Goal: Sort by custom-defined age_group, then by actual age.

Input

PATID AGE SITE
P01 45 101
P02 68 102
P03 15 103

SAS®

data patients;
  set patients;
  if age >= 65 then group = "Senior";
  else if age >= 30 then group = "Adult";
  else group = "Youth";
run;

proc sort data=patients;
  by group age;
run;
  • Explanation: A DATA step first creates a new variable group based on AGE. Then PROC SORT orders by this group and secondarily by age.

R

patients <- tibble::tibble(
  PATID = c("P01", "P02", "P03"),
  AGE = c(45, 68, 15),
  SITE = c(101, 102, 103)
)
patients <- patients %>%
  mutate(group = case_when(
    AGE >= 65 ~ "Senior",
    AGE >= 30 ~ "Adult",
    TRUE ~ "Youth"
  ),
  group = factor(group, levels = c("Senior", "Adult", "Youth"))) %>%
  arrange(group, AGE)
  • Explanation: mutate() creates a new group variable using case_when(). The factor levels define the custom sort order. arrange() then sorts by group label and age.

Output

PATID AGE SITE group
P02 68 102 Senior
P01 45 101 Adult
P03 15 103 Youth

4. How Missing Values Are Handled

Input

PATID AGE SITE
P01 45 NA
P02 68 101
P03 28 103

SAS®

proc sort data=patients;
  by site;
run;
  • Explanation: SAS considers missing values (represented as a dot .) less than any other value, so they appear first by default in ascending order.

R

patients <- tibble::tibble(
  PATID = c("P01", "P02", "P03"),
  AGE = c(45, 68, 28),
  SITE = c(NA, 101, 103)
)
patients <- patients %>%
  arrange(SITE)
  • Explanation: In R, NA values are always placed last in sorting, whether numeric or character, even in desc() unless custom handling is added.

Output – SAS®

PATID SITE
P01 .
P02 101
P03 103

Output – R

PATID SITE
P02 101
P03 103
P01 NA

**Resource download links**

1.5.4.-PROC-SORT-in-SAS-and-arrange()-in-R.zip