2.3.7. Performing Updates Across Columns

1. Introduction

Often, you need to apply the same operation to multiple columns in a data frame—such as calculating summary statistics, transforming values, or filtering out missing data. The across() function from dplyr makes these tasks concise and powerful, especially when used with mutate(), summarize(), or filter().

2. Why Use across()?

Apply the same function to multiple columns at once.
Avoid repetitive code and make scripts easier to maintain.
Select columns flexibly using tidyselect helpers (e.g., starts_with(), ends_with(), :).
Combine with mutate(), summarize(), and filter() for versatile data wrangling.

3. Summarizing Across Columns

Suppose you want to compute the mean for several columns at once.

Input Table:

USUBJID	LB1	LB2	LB3
01-701-101	10	20	30
01-701-102	15	25	NA
01-701-103	NA	22	33

R Code:

library(dplyr)
dm %>%
  summarize(across(LB1:LB3, mean, na.rm = TRUE))

across(LB1:LB3, mean, na.rm = TRUE) computes the mean of each column from LB1 to LB3, ignoring NAs.

Output Table:

LB1	LB2	LB3
12.5	22.3	31.5

4. Filtering Rows Based on Multiple Columns

You can filter rows to keep only those without missing values in a set of columns.

R Code:

dm %>%
  filter(across(LB1:LB2, ~ !is.na(.)))

Keeps rows where both LB1 and LB2 are not NA.

Output Table:

USUBJID	LB1	LB2	LB3
01-701-101	10	20	30
01-701-102	15	25	NA

5. Filtering Rows with Many Columns

If you want to filter out rows with missing values in several columns:

R Code:

dm %>%
  filter(across(LB1:LB3, ~ !is.na(.)))

Keeps only rows where all columns LB1 to LB3 are not NA.

Output Table:

USUBJID	LB1	LB2	LB3
01-701-101	10	20	30

6. Updating Multiple Columns with mutate() and across()

Suppose you want to replace all NAs in several columns with zero.

R Code:

library(tidyr)
dm %>%
  mutate(across(LB1:LB3, ~ replace_na(., 0)))

mutate(across(LB1:LB3, ~ replace_na(., 0))) replaces NAs with 0 in columns LB1 to LB3.

Output Table:

USUBJID	LB1	LB2	LB3
01-701-101	10	20	30
01-701-102	15	25	0
01-701-103	0	22	33

7. Beyond the Basics

Below are advanced ways to use across() for powerful data wrangling, with input/output tables and detailed explanations.

Apply multiple functions at once

You can apply several summary functions to each column using across() with a list of functions.

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
dm %>%
  summarize(across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE))))
```
- across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE))) applies both mean and standard deviation to each column.
- The result is a single-row summary with columns for mean and sd for each LB column.
Output Table:

LB1_mean LB1_sd LB2_mean LB2_sd LB3_mean LB3_sd

12.5 3.54 22.3 2.52 31.5 2.12
- Explanation:
  - Each column is summarized with both mean and standard deviation, ignoring NAs.

LB1_mean	LB1_sd	LB2_mean	LB2_sd	LB3_mean	LB3_sd
12.5	3.54	22.3	2.52	31.5	2.12

Use tidyselect helpers

You can select columns dynamically using helpers like starts_with().

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
dm %>%
  mutate(across(starts_with("LB"), ~ . * 2))
```
- across(starts_with("LB"), ~ . * 2) multiplies all columns starting with "LB" by 2.
Output Table:

USUBJID LB1 LB2 LB3

01-701-101 20 40 60

01-701-102 30 50 NA

01-701-103 NA 44 66
- Explanation:
  - All LB columns are doubled, NAs remain unchanged.

USUBJID	LB1	LB2	LB3
01-701-101	20	40	60
01-701-102	30	50	NA
01-701-103	NA	44	66

Conditional updates

You can use ifelse() or similar logic to update values conditionally across columns.

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
dm %>%
  mutate(across(LB1:LB3, ~ ifelse(. < 15, NA, .)))
```
- For each value in LB1 to LB3, if the value is less than 15, it is set to NA; otherwise, it is kept.
Output Table:

USUBJID LB1 LB2 LB3

01-701-101 NA 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33
- Explanation:
  - All values less than 15 are replaced with NA.

USUBJID	LB1	LB2	LB3
01-701-101	NA	20	30
01-701-102	15	25	NA
01-701-103	NA	22	33

Row-wise operations with c_across()

You can perform calculations across columns for each row using rowwise() and c_across().

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
dm %>%
  rowwise() %>%
  mutate(row_sum = sum(c_across(LB1:LB3), na.rm = TRUE))
```
- rowwise() makes operations work row-by-row.
- c_across(LB1:LB3) selects LB1 to LB3 for each row.
- sum(..., na.rm = TRUE) sums the values, ignoring NAs.
Output Table:

USUBJID LB1 LB2 LB3 row_sum

01-701-101 10 20 30 60

01-701-102 15 25 NA 40

01-701-103 NA 22 33 55
- Explanation:
  - Each row gets a new column with the sum of LB1, LB2, and LB3, ignoring missing values.

USUBJID	LB1	LB2	LB3	row_sum
01-701-101	10	20	30	60
01-701-102	15	25	NA	40
01-701-103	NA	22	33	55

Call an internal function (simple multiplication example)

You can use a custom function defined within your script to apply to multiple columns.

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
multiply_by_3 <- function(x) x * 3

dm %>%
  mutate(across(LB1:LB3, multiply_by_3))
```
- Defines a simple function multiply_by_3 that multiplies input by 3.
- mutate(across(LB1:LB3, multiply_by_3)) applies this function to each column LB1 to LB3.
Output Table:

USUBJID LB1 LB2 LB3

01-701-101 30 60 90

01-701-102 45 75 NA

01-701-103 NA 66 99
- Explanation:
  - Each value in LB1 to LB3 is multiplied by 3 using the custom function.

USUBJID	LB1	LB2	LB3
01-701-101	30	60	90
01-701-102	45	75	NA
01-701-103	NA	66	99

Call an external function (from a package)

You can also use functions from external packages directly in across().

Input Table:

USUBJID LB1 LB2 LB3

01-701-101 10 20 30

01-701-102 15 25 NA

01-701-103 NA 22 33

R Code:
```
library(scales)
dm %>%
  mutate(across(LB1:LB3, scales::rescale))
```
- scales::rescale rescales each column to the 0-1 range.
- mutate(across(LB1:LB3, scales::rescale)) applies this transformation to each column.
Output Table:

USUBJID LB1 LB2 LB3

01-701-101 0.00 0.00 0.00

01-701-102 1.00 1.00 NA

01-701-103 NA 0.2857 1.00
- Explanation:
  - Each column is rescaled independently to the 0-1 range using the external scales package.

USUBJID	LB1	LB2	LB3
01-701-101	0.00	0.00	0.00
01-701-102	1.00	1.00	NA
01-701-103	NA	0.2857	1.00

Summary
- These advanced uses of across() allow you to:
  - Apply multiple functions to many columns at once.
  - Select columns flexibly using tidyselect helpers.
  - Apply conditional logic or transformations across columns.
  - Perform row-wise calculations using rowwise() and c_across().
- Input and output tables help you see exactly what each code block does and how it transforms your data.

8. Conclusion

The across() function is essential for efficient, readable, and powerful data wrangling.
Use it with mutate(), summarize(), and filter() to apply operations across many columns.
Combine with tidyselect helpers and custom functions for advanced workflows.
Greatly reduces code repetition and increases the clarity of your data manipulation tasks.

Resource download links

2.3.7.-Performing-Updates-Across-Columns.zip

⁂