2.3.7. Performing Updates Across Columns
1. Introduction
Often, you need to apply the same operation to multiple columns in a data frame—such as calculating summary statistics, transforming values, or filtering out missing data. The across() function from dplyr makes these tasks concise and powerful, especially when used with mutate(), summarize(), or filter().
2. Why Use across()?
- Apply the same function to multiple columns at once.
- Avoid repetitive code and make scripts easier to maintain.
- Select columns flexibly using tidyselect helpers (e.g.,
starts_with(),ends_with(),:). - Combine with
mutate(),summarize(), andfilter()for versatile data wrangling.
3. Summarizing Across Columns
Suppose you want to compute the mean for several columns at once.
Input Table:
| USUBJID | LB1 | LB2 | LB3 |
|---|---|---|---|
| 01-701-101 | 10 | 20 | 30 |
| 01-701-102 | 15 | 25 | NA |
| 01-701-103 | NA | 22 | 33 |
R Code:
library(dplyr)
dm %>%
summarize(across(LB1:LB3, mean, na.rm = TRUE))
across(LB1:LB3, mean, na.rm = TRUE)computes the mean of each column from LB1 to LB3, ignoring NAs.
Output Table:
| LB1 | LB2 | LB3 |
|---|---|---|
| 12.5 | 22.3 | 31.5 |
4. Filtering Rows Based on Multiple Columns
You can filter rows to keep only those without missing values in a set of columns.
R Code:
dm %>%
filter(across(LB1:LB2, ~ !is.na(.)))
- Keeps rows where both LB1 and LB2 are not NA.
Output Table:
| USUBJID | LB1 | LB2 | LB3 |
|---|---|---|---|
| 01-701-101 | 10 | 20 | 30 |
| 01-701-102 | 15 | 25 | NA |
5. Filtering Rows with Many Columns
If you want to filter out rows with missing values in several columns:
R Code:
dm %>%
filter(across(LB1:LB3, ~ !is.na(.)))
- Keeps only rows where all columns LB1 to LB3 are not NA.
Output Table:
| USUBJID | LB1 | LB2 | LB3 |
|---|---|---|---|
| 01-701-101 | 10 | 20 | 30 |
6. Updating Multiple Columns with mutate() and across()
Suppose you want to replace all NAs in several columns with zero.
R Code:
library(tidyr)
dm %>%
mutate(across(LB1:LB3, ~ replace_na(., 0)))
mutate(across(LB1:LB3, ~ replace_na(., 0)))replaces NAs with 0 in columns LB1 to LB3.
Output Table:
| USUBJID | LB1 | LB2 | LB3 |
|---|---|---|---|
| 01-701-101 | 10 | 20 | 30 |
| 01-701-102 | 15 | 25 | 0 |
| 01-701-103 | 0 | 22 | 33 |
7. Beyond the Basics
Below are advanced ways to use across() for powerful data wrangling, with input/output tables and detailed explanations.
Apply multiple functions at once
You can apply several summary functions to each column using
across()with a list of functions.Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
dm %>% summarize(across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE))))across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE)))applies both mean and standard deviation to each column.- The result is a single-row summary with columns for mean and sd for each LB column.
Output Table:
LB1_mean LB1_sd LB2_mean LB2_sd LB3_mean LB3_sd 12.5 3.54 22.3 2.52 31.5 2.12 - Explanation:
- Each column is summarized with both mean and standard deviation, ignoring NAs.
Use tidyselect helpers
You can select columns dynamically using helpers like
starts_with().Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
dm %>% mutate(across(starts_with("LB"), ~ . * 2))across(starts_with("LB"), ~ . * 2)multiplies all columns starting with "LB" by 2.
Output Table:
USUBJID LB1 LB2 LB3 01-701-101 20 40 60 01-701-102 30 50 NA 01-701-103 NA 44 66 - Explanation:
- All LB columns are doubled, NAs remain unchanged.
Conditional updates
You can use
ifelse()or similar logic to update values conditionally across columns.Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
dm %>% mutate(across(LB1:LB3, ~ ifelse(. < 15, NA, .)))- For each value in LB1 to LB3, if the value is less than 15, it is set to NA; otherwise, it is kept.
Output Table:
USUBJID LB1 LB2 LB3 01-701-101 NA 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 - Explanation:
- All values less than 15 are replaced with NA.
Row-wise operations with c_across()
You can perform calculations across columns for each row using
rowwise()andc_across().Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
dm %>% rowwise() %>% mutate(row_sum = sum(c_across(LB1:LB3), na.rm = TRUE))rowwise()makes operations work row-by-row.c_across(LB1:LB3)selects LB1 to LB3 for each row.sum(..., na.rm = TRUE)sums the values, ignoring NAs.
Output Table:
USUBJID LB1 LB2 LB3 row_sum 01-701-101 10 20 30 60 01-701-102 15 25 NA 40 01-701-103 NA 22 33 55 - Explanation:
- Each row gets a new column with the sum of LB1, LB2, and LB3, ignoring missing values.
Call an internal function (simple multiplication example)
You can use a custom function defined within your script to apply to multiple columns.
Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
multiply_by_3 <- function(x) x * 3 dm %>% mutate(across(LB1:LB3, multiply_by_3))- Defines a simple function
multiply_by_3that multiplies input by 3. mutate(across(LB1:LB3, multiply_by_3))applies this function to each column LB1 to LB3.
Output Table:
USUBJID LB1 LB2 LB3 01-701-101 30 60 90 01-701-102 45 75 NA 01-701-103 NA 66 99 - Explanation:
- Each value in LB1 to LB3 is multiplied by 3 using the custom function.
- Defines a simple function
Call an external function (from a package)
You can also use functions from external packages directly in
across().Input Table:
USUBJID LB1 LB2 LB3 01-701-101 10 20 30 01-701-102 15 25 NA 01-701-103 NA 22 33 R Code:
library(scales) dm %>% mutate(across(LB1:LB3, scales::rescale))scales::rescalerescales each column to the 0-1 range.mutate(across(LB1:LB3, scales::rescale))applies this transformation to each column.
Output Table:
USUBJID LB1 LB2 LB3 01-701-101 0.00 0.00 0.00 01-701-102 1.00 1.00 NA 01-701-103 NA 0.2857 1.00 - Explanation:
- Each column is rescaled independently to the 0-1 range using the external
scalespackage.
- Each column is rescaled independently to the 0-1 range using the external
Summary
- These advanced uses of
across()allow you to:- Apply multiple functions to many columns at once.
- Select columns flexibly using tidyselect helpers.
- Apply conditional logic or transformations across columns.
- Perform row-wise calculations using
rowwise()andc_across().
- Input and output tables help you see exactly what each code block does and how it transforms your data.
- These advanced uses of
8. Conclusion
- The
across()function is essential for efficient, readable, and powerful data wrangling. - Use it with
mutate(),summarize(), andfilter()to apply operations across many columns. - Combine with tidyselect helpers and custom functions for advanced workflows.
- Greatly reduces code repetition and increases the clarity of your data manipulation tasks.
**Resource download links**
2.3.7.-Performing-Updates-Across-Columns.zip