1.3. Writing Code
1.3.2. Pipe Operators in R
Pipe operators allow you to write cleaner, more readable code by chaining operations together. Instead of nesting functions or storing intermediate results, you can express a sequence of actions as a straightforward flow. Pipes are especially useful for data transformation and analysis workflows.
1. %>% — magrittr Pipe (from {magrittr} / {dplyr})
- Passes the result on the left as the first argument to the function on the right.
- Enables readable, step-by-step pipelines.
- Supports the use of the dot (
.) as a placeholder for the left-hand side value, useful for functions where the first argument is not the main input. - Widely used in the tidyverse ecosystem.
Example:
library(dplyr)
labs <- tibble::tibble(
patient_id = 1:5,
status = c("active", "inactive", "active", "active", "inactive"),
weight = c(60, 72, 85, 90, 78)
)
labs2 <- labs %>%
filter(status == "active") %>%
mutate(weight_kg = weight / 1000) %>%
arrange(desc(weight_kg))
Input Table (labs):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 2 | inactive | 72 |
| 3 | active | 85 |
| 4 | active | 90 |
| 5 | inactive | 78 |
Output Table (labs2):
| patient_id | status | weight | weight_kg |
|---|---|---|---|
| 4 | active | 90 | 0.09 |
| 3 | active | 85 | 0.085 |
| 1 | active | 60 | 0.06 |
- Here,
labsis filtered for active status, a new column is created, and the result is sorted—all in one readable chain.
Using the dot placeholder:
labs <- tibble::tibble(
patient_id = 1:5,
weight = c(60, 72, NA, 85, 90)
)
labs %>%
select(weight) %>%
sum(., na.rm = TRUE)
#Output:
[1] 307
#That’s the sum of: 60 + 72 + 85 + 90 Missing value (NA) is excluded due to na.rm = TRUE.
Input Table (labs):
| patient_id | weight |
|---|---|
| 1 | 60 |
| 2 | 72 |
| 3 | NA |
| 4 | 85 |
| 5 | 90 |
Output:
307
sum(., na.rm = TRUE): Sums the values from the previous result (the weight column), using the dot . to explicitly pass that vector as the first argument to sum().
Without %>%:
labs2 <- arrange(
mutate(
filter(labs, status == "active"),
weight_kg = weight / 1000
),
desc(weight_kg)
)
Input Table (labs):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 2 | inactive | 72 |
| 3 | active | 85 |
| 4 | active | 90 |
| 5 | inactive | 78 |
Output Table (labs2):
| patient_id | status | weight | weight_kg |
|---|---|---|---|
| 4 | active | 90 | 0.09 |
| 3 | active | 85 | 0.085 |
| 1 | active | 60 | 0.06 |
With intermediate steps:
step1 <- filter(labs, status == "active")
step2 <- mutate(step1, weight_kg = weight / 1000)
labs2 <- arrange(step2, desc(weight_kg))
Input Table (labs):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 2 | inactive | 72 |
| 3 | active | 85 |
| 4 | active | 90 |
| 5 | inactive | 78 |
Intermediate Table (step1):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 3 | active | 85 |
| 4 | active | 90 |
Intermediate Table (step2):
| patient_id | status | weight | weight_kg |
|---|---|---|---|
| 1 | active | 60 | 0.06 |
| 3 | active | 85 | 0.085 |
| 4 | active | 90 | 0.09 |
Output Table (labs2):
| patient_id | status | weight | weight_kg |
|---|---|---|---|
| 4 | active | 90 | 0.09 |
| 3 | active | 85 | 0.085 |
| 1 | active | 60 | 0.06 |
2. |> — Base R Pipe (from R 4.1.0 onward)
- Passes the result on the left to the first argument of the function on the right.
- No need for external packages.
- Does not support the dot placeholder (
.), but can use anonymous functions for more complex cases. - Slightly faster than
%>%and integrates natively with base R.
Basic usage:
labs <- tibble(
patient_id = 1:5,
status = c("active", "inactive", "active", "active", "inactive"),
weight = c(60, 72, 85, 90, 78)
)
labs2 <- labs |>
subset(status == "active") |>
head(3)
Input Table (labs):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 2 | inactive | 72 |
| 3 | active | 85 |
| 4 | active | 90 |
| 5 | inactive | 78 |
Output Table (labs2):
| patient_id | status | weight |
|---|---|---|
| 1 | active | 60 |
| 3 | active | 85 |
| 4 | active | 90 |
With anonymous function for flexible argument placement:
to_upper_phrase <- function(x) {
paste("Programming with", toupper(x))
}
"clinical" |> (function(x) to_upper_phrase(x))()
#OR
"clinical" |> (to_upper_phrase)()
Input:
"clinical"
Output:
"Programming with CLINICAL"
3. %T>% — Tee Pipe (from magrittr)
- Executes a side effect (e.g., plotting, printing) and returns the original input to the next step in the pipeline.
- Useful for debugging, logging, or visualization within a pipeline.
Example:
library(magrittr)
iris %>%
{ plot(.$Sepal.Length, .$Sepal.Width) } %T>%
summary()
Input Table (iris):
(First 3 rows shown)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
Output:
- A scatter plot of Sepal.Length vs Sepal.Width is displayed (side effect).
- The summary of the
irisdata frame is returned.
4. %<>% — Compound Assignment Pipe (from magrittr)
- Pipes and reassigns the result back into the original object.
- Equivalent to
x <- x %>% ... - Useful for updating objects in place.
Example:
library(magrittr)
x <- 1:5
x %<>% sqrt()
print(x)
# [1] 1.000000 1.414214 1.732051 2.000000 2.236068
Input:
x = 1:5
Output:
x = 1.000000, 1.414214, 1.732051, 2.000000, 2.236068
5. %$% — Exposition Pipe (from magrittr)
- Exposes the variables of a data frame to the right-hand side expression, so you can refer to columns directly by name.
- Useful for concise code in modeling or plotting.
Example:
library(magrittr)
iris %$%
cor(Sepal.Length, Sepal.Width)
#[1] -0.1176
Input Table (iris):
(First 3 rows shown)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
Output:
[1] -0.1176
6. Error Handling Scenarios and Warnings with Pipes
- Null or Missing Data:
If a pipe step returnsNULLor an unexpected value, subsequent steps may fail or produce misleading results. Always check for missing or unexpected data before piping.
library(dplyr)
df <- data.frame(a = c(1, 2, NA), b = c(4, NA, 6))
df %>%
filter(a > 1) %>%
summarise(mean_b = mean(b, na.rm = TRUE))
# If filter returns 0 rows, summarise will fail or return NA.
Input Table (df):
| a | b |
|---|---|
| 1 | 4 |
| 2 | NA |
| NA | 6 |
Output Table after filter(a > 1):
| a | b |
|---|---|
| 2 | NA |
| NA | 6 |
Output Table after summarise(mean_b = mean(b, na.rm = TRUE)):
| mean_b |
|---|
| 6 |
- Non-Standard Evaluation Pitfalls:
Some functions (especially in base R) do not work well with pipes due to non-standard evaluation or argument positions. Use anonymous functions or the dot placeholder (.) where needed.
# This will not work as expected:
c(1, 2, 3) %>% sum(na.rm = TRUE)
# Correct way using the dot placeholder:
c(1, 2, 3) %>% sum(., na.rm = TRUE)
# With base pipe, use anonymous function:
c(1, 2, 3) |> (\(x) sum(x, na.rm = TRUE))()
Input:
c(1, 2, 3)
Output:
6
- Side Effects in Pipes:
Avoid relying on side effects (like printing or plotting) inside pipes unless using%T>%. Side effects can make debugging harder.
library(magrittr)
iris %>%
{ plot(.$Sepal.Length, .$Sepal.Width) } %T>%
summary()
# Plot is created as a side effect, but summary is still returned.
- Error Propagation:
If an error occurs in any step of the pipe, the entire pipeline fails. UsetryCatch()orpurrr::possibly()for safer error handling in pipelines.
library(dplyr)
library(purrr)
safe_log <- possibly(log, otherwise = NA_real_)
c(-1, 0, 1) %>%
map_dbl(safe_log)
# Returns NA for invalid log inputs instead of stopping the pipeline.
Input:
c(-1, 0, 1)
Output:
NA, -Inf, 0
- Overly Long Chains:
Very long or complex pipes can be hard to debug. Break them into smaller steps or assign intermediate results for clarity.
# Hard to debug:
result <- df %>% filter(a > 1) %>% mutate(c = a + b) %>% group_by(c) %>% summarise(mean_b = mean(b))
# Easier to debug:
step1 <- filter(df, a > 1)
step2 <- mutate(step1, c = a + b)
result <- step2 %>% group_by(c) %>% summarise(mean_b = mean(b))
- Assignment Confusion:
Remember that%>%does not reassign by default; use%<>%or explicit assignment if you want to update the original object.
library(magrittr)
x <- 1:5
x %>% sqrt() # x is unchanged
x %<>% sqrt() # x is updated in place
Input:
x = 1:5
Output after x %>% sqrt():
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 (but x is unchanged)
Output after x %<>% sqrt():
x = 1.000000 1.414214 1.732051 2.000000 2.236068
- Base Pipe (
|>) Limitations:
The base R pipe does not support the dot placeholder (.). For complex argument placement, use anonymous functions.
# This will not work:
c(1, 2, 3) |> sum(., na.rm = TRUE) # Error
# Correct:
c(1, 2, 3) |> (\(x) sum(x, na.rm = TRUE))()
- Data Masking:
Pipes may mask variables from the global environment, leading to unexpected results if variable names overlap. Be explicit with variable references when needed.
x <- 10
df <- data.frame(x = 1:3)
df %>% mutate(y = x + 1) # Uses df$x, not global x
Input Table (df):
| x |
|---|
| 1 |
| 2 |
| 3 |
Output Table:
| x | y |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
- Debugging:
Debugging inside pipes can be challenging. Use intermediate assignments or insertprint()/str()calls with%T>%to inspect data at each stage.
library(magrittr)
df %>%
filter(a > 1) %T>%
{ print(.) } %>%
summarise(mean_b = mean(b, na.rm = TRUE))
# Prints intermediate result after filtering.
Input Table (df):
| a | b |
|---|---|
| 1 | 4 |
| 2 | NA |
| NA | 6 |
Output Table after filter(a > 1):
| a | b |
|---|---|
| 2 | NA |
| NA | 6 |
Output Table after summarise(mean_b = mean(b, na.rm = TRUE)):
| mean_b |
|---|
| 6 |
Comparison: Pipe Operator Features
| Operator | Package | Supports Placeholder | Assignment | Side Effects | Exposes Columns | Base R | Example |
|---|---|---|---|---|---|---|---|
%>% |
magrittr | Yes (.) |
No | No | No | No | df %>% mutate(...) |
|> |
base R | No | No | No | No | Yes | df |> head() |
%T>% |
magrittr | Yes | No | Yes | No | No | df %T>% plot() |
%<>% |
magrittr | Yes | Yes | No | No | No | x %<>% sqrt() |
%$% |
magrittr | N/A | No | No | Yes | No | df %$% cor(x, y) |
%|>% |
pipeR | Yes | No | No | No | No | x %|>% f |
**Resource download links**
1.3.2.-Pipe-Operators-in-R.zip