contact@a2zlearners.com

1.5. SAS PROCEDURES -> Equivalent in R

1.5.2. PROC FREQ - frequency analysis in SAS and R

Produce summaries of grade distribution:

  1. Overall frequency
  2. By department
  3. Cross-tab by department & gender
  4. Percentages within group
  5. Cumulative insights, pivoting, and formatting

Input Table: students

We’ll assume the following dataset for both SAS and R:

DEPT GENDER GRADE
Math M A
Math F A
Math M B
Math M B
Math F C
Physics F A
Physics M B
Physics F C
Physics F A
Biology M A
Biology F A
Biology M B
Biology F B
Biology F A
Biology F C

1. Overall Frequency of GRADE

Goal Count how many students got A, B, or C—across all departments.

SAS Code

proc freq data=students;
  tables grade;
run;
  • proc freq: Launches the frequency procedure.
  • tables grade;: Requests frequency counts for the GRADE column.
  • The output includes counts, percentages, cumulative counts and percentages.

R Code

students <- tibble::tibble(
  DEPT = c("Math", "Math", "Math", "Math", "Math", "Physics", "Physics", "Physics", "Physics", "Biology", "Biology", "Biology", "Biology", "Biology", "Biology"),
  GENDER = c("M", "F", "M", "M", "F", "F", "M", "F", "F", "M", "F", "M", "F", "F", "F"),
  GRADE = c("A", "A", "B", "B", "C", "A", "B", "C", "A", "A", "A", "B", "B", "A", "C")
)
students %>% count(GRADE)
  • count(GRADE): Counts how many times each GRADE value appears.

Output

GRADE Count
A 6
B 4
C 2

2. Grade Frequency by Department

Goal Get grade distribution within each department.

SAS Code

proc sort data=students; by dept;

proc freq data=students noprint;
  by dept;
  tables grade / out=grade_by_dept(drop=percent);
run;
  • proc sort: Required before using BY dept; in PROC FREQ.
  • noprint: Suppresses printed output and sends results to a dataset.
  • out=...: Saves the count output; we drop PERCENT here.

R Code

students %>%
  group_by(DEPT) %>%
  count(GRADE, name = "Count")
  • group_by(DEPT): Sets department as the grouping key.
  • count(GRADE): Within each department, count how often each grade appears.
  • name = "Count": Renames the default n column for readability.

Output

DEPT GRADE Count
Math A 2
Math B 2
Math C 1
Physics A 2
Physics B 1
Physics C 1
Biology A 3
Biology B 2
Biology C 1

3. Cross-tabulation: GRADE × GENDER within DEPT

Goal See how many males/females got each grade in each department.

SAS Code

proc freq data=students noprint;
  by dept;
  tables grade*gender / out=cross_tab(drop=percent);
run;
  • grade*gender: Requests cross-tab between GRADE and GENDER.
  • Grouping still applies to each DEPT due to BY dept.

R Code

students %>%
  group_by(DEPT) %>%
  count(GRADE, GENDER, name = "Count")
  • count(GRADE, GENDER): Creates 2D frequency tables per department.

Output

DEPT GRADE GENDER Count
Math A M 1
Math A F 1
Math B M 2
Math C F 1
Physics A F 2
Physics B M 1
Physics C F 1
Biology A M 1
Biology A F 2
Biology B M 1
Biology B F 1
Biology C F 1

4. Add Percentages within Group

Goal Include what percent of each department received each grade.

R Code

students %>%
  group_by(DEPT) %>%
  count(GRADE, name = "Count") %>%
  mutate(Percent = round(Count / sum(Count) * 100, 1))
  • mutate(...): Adds a new column Percent.
  • Count / sum(Count): Proportion of each grade within its department.
  • round(..., 1): Keep results clean to 1 decimal place.

Output

DEPT GRADE Count Percent
Math A 2 40.0
Math B 2 40.0
Math C 1 20.0
Physics A 2 50.0
Physics B 1 25.0
Physics C 1 25.0
Biology A 3 50.0
Biology B 2 33.3
Biology C 1 16.7

5. Cumulative Metrics (Optional)

students %>%
  group_by(DEPT) %>%
  count(GRADE, name = "Count") %>%
  arrange(DEPT, desc(Count)) %>%
  mutate(
    Percent = round(Count / sum(Count) * 100, 1),
    CumCount = cumsum(Count),
    CumPercent = round(cumsum(Count) / sum(Count) * 100, 1)
  )
  • arrange(desc(Count)): Sort grades by most common per department.
  • cumsum(Count): Builds cumulative totals.
  • CumPercent: Useful for Pareto analysis or targeting thresholds.

6. Wide Format for Reporting

library(tidyr)

students %>%
  count(DEPT, GRADE) %>%
  pivot_wider(names_from = GRADE, values_from = n, values_fill = 0)
  • pivot_wider(): Converts long → wide format for tabular reports.
  • values_fill = 0: Ensures missing values show as 0.

Output

DEPT A B C
Biology 3 2 1
Math 2 2 1
Physics 2 1 1

7. Visualization: Stacked Bar Chart

library(ggplot2)

students %>%
  count(DEPT, GRADE) %>%
  ggplot(aes(x = DEPT, y = n, fill = GRADE)) +
  geom_col(position = "stack") +
  labs(title = "Grade Distribution by Department", y = "Student Count")
  • geom_col() uses pre-counted data.
  • fill = GRADE stacks by grade per department.
  • Helpful for presentations and spotting trends visually.

Output: 1.5.2.-frequency-analysis-R-bar-chart


**Summary Table: SAS vs R**

Feature SAS (PROC FREQ) R (dplyr::count() + friends)
1-way Frequency tables var; count(var)
2-way Grouped Count BY var; tables var2; group_by(var1) %>% count(var2)
Cross-tab tables var1*var2; count(var1, var2)
Percentages Built-in mutate(percent = n / sum(n))
Cumulative stats Custom logic Easy with cumsum()
Pivot for reporting Manual PROC TABULATE/TABLE pivot_wider()
Visuals External (e.g., PROC SGPLOT) Native with ggplot2

**Resource download links**

1.5.2.-PROC-FREQ-frequency-analysis-in-SAS-and-R.zip