contact@a2zlearners.com

2.3.7. Performing Updates Across Columns

1. Introduction

Often, you need to apply the same operation to multiple columns in a data frame—such as calculating summary statistics, transforming values, or filtering out missing data. The across() function from dplyr makes these tasks concise and powerful, especially when used with mutate(), summarize(), or filter().


2. Why Use across()?

  • Apply the same function to multiple columns at once.
  • Avoid repetitive code and make scripts easier to maintain.
  • Select columns flexibly using tidyselect helpers (e.g., starts_with(), ends_with(), :).
  • Combine with mutate(), summarize(), and filter() for versatile data wrangling.

3. Summarizing Across Columns

Suppose you want to compute the mean for several columns at once.

Input Table:

USUBJID LB1 LB2 LB3
01-701-101 10 20 30
01-701-102 15 25 NA
01-701-103 NA 22 33

R Code:

library(dplyr)
dm %>%
  summarize(across(LB1:LB3, mean, na.rm = TRUE))
  • across(LB1:LB3, mean, na.rm = TRUE) computes the mean of each column from LB1 to LB3, ignoring NAs.

Output Table:

LB1 LB2 LB3
12.5 22.3 31.5

4. Filtering Rows Based on Multiple Columns

You can filter rows to keep only those without missing values in a set of columns.

R Code:

dm %>%
  filter(across(LB1:LB2, ~ !is.na(.)))
  • Keeps rows where both LB1 and LB2 are not NA.

Output Table:

USUBJID LB1 LB2 LB3
01-701-101 10 20 30
01-701-102 15 25 NA

5. Filtering Rows with Many Columns

If you want to filter out rows with missing values in several columns:

R Code:

dm %>%
  filter(across(LB1:LB3, ~ !is.na(.)))
  • Keeps only rows where all columns LB1 to LB3 are not NA.

Output Table:

USUBJID LB1 LB2 LB3
01-701-101 10 20 30

6. Updating Multiple Columns with mutate() and across()

Suppose you want to replace all NAs in several columns with zero.

R Code:

library(tidyr)
dm %>%
  mutate(across(LB1:LB3, ~ replace_na(., 0)))
  • mutate(across(LB1:LB3, ~ replace_na(., 0))) replaces NAs with 0 in columns LB1 to LB3.

Output Table:

USUBJID LB1 LB2 LB3
01-701-101 10 20 30
01-701-102 15 25 0
01-701-103 0 22 33

7. Beyond the Basics

Below are advanced ways to use across() for powerful data wrangling, with input/output tables and detailed explanations.


  • Apply multiple functions at once

    You can apply several summary functions to each column using across() with a list of functions.

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    dm %>%
      summarize(across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE))))
    
    • across(LB1:LB3, list(mean = ~mean(., na.rm=TRUE), sd = ~sd(., na.rm=TRUE))) applies both mean and standard deviation to each column.
    • The result is a single-row summary with columns for mean and sd for each LB column.

    Output Table:

    LB1_mean LB1_sd LB2_mean LB2_sd LB3_mean LB3_sd
    12.5 3.54 22.3 2.52 31.5 2.12
    • Explanation:
      • Each column is summarized with both mean and standard deviation, ignoring NAs.

  • Use tidyselect helpers

    You can select columns dynamically using helpers like starts_with().

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    dm %>%
      mutate(across(starts_with("LB"), ~ . * 2))
    
    • across(starts_with("LB"), ~ . * 2) multiplies all columns starting with "LB" by 2.

    Output Table:

    USUBJID LB1 LB2 LB3
    01-701-101 20 40 60
    01-701-102 30 50 NA
    01-701-103 NA 44 66
    • Explanation:
      • All LB columns are doubled, NAs remain unchanged.

  • Conditional updates

    You can use ifelse() or similar logic to update values conditionally across columns.

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    dm %>%
      mutate(across(LB1:LB3, ~ ifelse(. < 15, NA, .)))
    
    • For each value in LB1 to LB3, if the value is less than 15, it is set to NA; otherwise, it is kept.

    Output Table:

    USUBJID LB1 LB2 LB3
    01-701-101 NA 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33
    • Explanation:
      • All values less than 15 are replaced with NA.

  • Row-wise operations with c_across()

    You can perform calculations across columns for each row using rowwise() and c_across().

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    dm %>%
      rowwise() %>%
      mutate(row_sum = sum(c_across(LB1:LB3), na.rm = TRUE))
    
    • rowwise() makes operations work row-by-row.
    • c_across(LB1:LB3) selects LB1 to LB3 for each row.
    • sum(..., na.rm = TRUE) sums the values, ignoring NAs.

    Output Table:

    USUBJID LB1 LB2 LB3 row_sum
    01-701-101 10 20 30 60
    01-701-102 15 25 NA 40
    01-701-103 NA 22 33 55
    • Explanation:
      • Each row gets a new column with the sum of LB1, LB2, and LB3, ignoring missing values.

  • Call an internal function (simple multiplication example)

    You can use a custom function defined within your script to apply to multiple columns.

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    multiply_by_3 <- function(x) x * 3
    
    dm %>%
      mutate(across(LB1:LB3, multiply_by_3))
    
    • Defines a simple function multiply_by_3 that multiplies input by 3.
    • mutate(across(LB1:LB3, multiply_by_3)) applies this function to each column LB1 to LB3.

    Output Table:

    USUBJID LB1 LB2 LB3
    01-701-101 30 60 90
    01-701-102 45 75 NA
    01-701-103 NA 66 99
    • Explanation:
      • Each value in LB1 to LB3 is multiplied by 3 using the custom function.

  • Call an external function (from a package)

    You can also use functions from external packages directly in across().

    Input Table:

    USUBJID LB1 LB2 LB3
    01-701-101 10 20 30
    01-701-102 15 25 NA
    01-701-103 NA 22 33

    R Code:

    library(scales)
    dm %>%
      mutate(across(LB1:LB3, scales::rescale))
    
    • scales::rescale rescales each column to the 0-1 range.
    • mutate(across(LB1:LB3, scales::rescale)) applies this transformation to each column.

    Output Table:

    USUBJID LB1 LB2 LB3
    01-701-101 0.00 0.00 0.00
    01-701-102 1.00 1.00 NA
    01-701-103 NA 0.2857 1.00
    • Explanation:
      • Each column is rescaled independently to the 0-1 range using the external scales package.

  • Summary

    • These advanced uses of across() allow you to:
      • Apply multiple functions to many columns at once.
      • Select columns flexibly using tidyselect helpers.
      • Apply conditional logic or transformations across columns.
      • Perform row-wise calculations using rowwise() and c_across().
    • Input and output tables help you see exactly what each code block does and how it transforms your data.

8. Conclusion

  • The across() function is essential for efficient, readable, and powerful data wrangling.
  • Use it with mutate(), summarize(), and filter() to apply operations across many columns.
  • Combine with tidyselect helpers and custom functions for advanced workflows.
  • Greatly reduces code repetition and increases the clarity of your data manipulation tasks.

**Resource download links**

2.3.7.-Performing-Updates-Across-Columns.zip