1.5.3. PROC MEANS - Compute Descriptive Statistics in SAS and R

1.5. SAS PROCEDURES -> Equivalent in R

1.5.3. PROC MEANS - Compute Descriptive Statistics in SAS and R

Example Input Table: `scores`

STUDENT	SUBJECT	SCORE
A001	Math	78
A002	Math	85
A003	Math	91
A004	Math	62
A005	Math	74
A006	Science	83
A007	Science	77
A008	Science	65
A009	Science	90
A010	Science	NA
A011	History	69
A012	History	74
A013	History	80
A014	History	85
A015	History	77

Goal

Compute descriptive statistics for SCORE overall and within each SUBJECT:

Count (n)
Mean, standard deviation
25th and 75th percentiles (q1, q3)
Number of missing values

1. Overall Summary

SAS Code

proc means data=scores noprint;
  var score;
  output n=n mean=mean std=sd q1=q1 q3=q3 nmiss=nmissing out=overall_stats;
run;

Explanation:

proc means: Starts the summary procedure.
noprint: Suppresses printing the output to the Results window.
var score;: Specifies the variable to summarize.
output ... out=...: Stores summary results (n, mean, std, etc.) into a new dataset named overall_stats.

R Code

# Dummy data for execution
library(dplyr)
scores <- tibble::tibble(
  STUDENT = c("A001","A002","A003","A004","A005","A006","A007","A008","A009","A010","A011","A012","A013","A014","A015"),
  SUBJECT = c("Math","Math","Math","Math","Math","Science","Science","Science","Science","Science","History","History","History","History","History"),
  SCORE = c(78,85,91,62,74,83,77,65,90,NA,69,74,80,85,77)
)
overall_stats <- scores %>%
  dplyr::summarize(
    n = n(),
    mean = mean(SCORE, na.rm = TRUE),
    sd = sd(SCORE, na.rm = TRUE),
    q1 = quantile(SCORE, 0.25, na.rm = TRUE),
    q3 = quantile(SCORE, 0.75, na.rm = TRUE),
    nmissing = sum(is.na(SCORE))
  )

Explanation:

summarize(): Reduces the entire data frame to a single-row summary.
n(): Counts rows (non-missing).
mean(), sd(): Basic statistics (exclude NAs).
quantile(..., 0.25 / 0.75): Computes the 25th and 75th percentiles.
sum(is.na()): Counts how many values are missing in the column.

Output Table

n	mean	sd	q1	q3	nmissing
14	77.64	8.31	73.0	83.5	1

2. Summary by SUBJECT

SAS Code

proc sort data=scores; by subject;

proc means data=scores noprint;
  by subject;
  var score;
  output n=n mean=mean std=sd q1=q1 q3=q3 nmiss=nmissing out=by_subject_stats;
run;

Explanation:

proc sort ... by subject;: Required step before using BY in SAS.
BY SUBJECT: Tells SAS to compute statistics separately for each group.
output ... out=by_subject_stats;: Saves the results for each SUBJECT to a new dataset.

R Code

by_subject_stats <- scores %>%
  group_by(SUBJECT) %>%
  dplyr::summarize(
    n = n(),
    mean = mean(SCORE, na.rm = TRUE),
    sd = sd(SCORE, na.rm = TRUE),
    q1 = quantile(SCORE, 0.25, na.rm = TRUE),
    q3 = quantile(SCORE, 0.75, na.rm = TRUE),
    nmissing = sum(is.na(SCORE))
  )

Explanation:

group_by(SUBJECT): Groups the data frame by subject category.
summarize(...): Applies each summary function to the grouped values.

Output Table

SUBJECT	n	mean	sd	q1	q3	nmissing
History	5	77.0	6.52	74.0	80.0	0
Math	5	78.0	10.87	74.0	85.0	0
Science	4	78.75	10.45	72.0	86.25	1

3. Alternatives to `dplyr::summarize()`

Package	Function	Description
`psych`	`describe()`	One-liner for all basic stats
`skimr`	`skim()`	Pretty summaries grouped by variable
`data.table`	`DT[, .(…), by=]`	Fastest summary on large datasets
`Hmisc`	`describe()`	Summary plus metadata/label info

What More Can You Explore

1. Add More Summary Metrics

min(SCORE), max(SCORE), IQR(), median(), var()
Helps validate outliers or spread

2. Multi-Grouping

Add group_by(SUBJECT, GENDER) to replicate BY across multiple class variables.

3. Visualize Results

Boxplots, histograms, or violin plots for score distributions
Plot mean ± sd error bars by subject

4. Export to Reports

Use gt::gt(), flextable, or kable() for presentation-ready summary tables
Save as Excel or HTML

5. Build a Custom summary_table() Function Reusable for any dataset:

summary_table <- function(data, var, group = NULL) {
  if (!is.null(group)) data <- data %>% group_by(across(all_of(group)))
  data %>%
    dplyr::summarize(
      n = sum(!is.na(.data[[var]])),
      mean = mean(.data[[var]], na.rm = TRUE),
      sd = sd(.data[[var]], na.rm = TRUE),
      q1 = quantile(.data[[var]], 0.25, na.rm = TRUE),
      q3 = quantile(.data[[var]], 0.75, na.rm = TRUE),
      nmiss = sum(is.na(.data[[var]])),
      .groups = "drop"
    )
}

overall_stats <- summary_table(scores, "SCORE")

Resource download links

1.5.3.-PROC-MEANS-Compute-Descriptive-Statistics-in-SAS-and-R.zip

⁂

1.5. SAS PROCEDURES -> Equivalent in R

1.5.3. PROC MEANS - Compute Descriptive Statistics in SAS and R

**Example Input Table:** `scores`

**Goal**

1. Overall Summary

2. Summary by SUBJECT

3. Alternatives to dplyr::summarize()

**What More Can You Explore**

**Resource download links**

Example Input Table: `scores`

Goal

3. Alternatives to `dplyr::summarize()`

What More Can You Explore

Resource download links