2.2.1. Introduction to Tibbles
1. What is a Tibble?
A tibble is a modern, user-friendly version of R's data.frame. Tibbles are the default table format in the tidyverse and make data analysis easier and less error-prone.
Key Features:
- No automatic conversion of character columns to factors.
- Keeps column names as you enter them (including spaces/special characters).
- No row names.
- Prints only the first 10 rows and columns that fit your screen.
- Clearer error messages and safer subsetting.
2. Tibble vs. Data Frame
- Tibbles keep character columns as strings; data.frames may convert them to factors.
- Tibbles keep your column names unchanged; data.frames may alter them.
- Tibbles print a preview; data.frames may print the whole dataset.
3. Creating a Tibble
library(tibble)
ae <- tibble(
USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
AESEQ = c(1, 2, 1),
AETERM = c("Headache", "Nausea", "Dizziness"),
AESEV = c("MILD", "MODERATE", "SEVERE"),
AESER = c("N", "N", "Y")
)
ae
Output:
# A tibble: 3 × 5
USUBJID AESEQ AETERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-01 1 Headache MILD N
2 ABC123-01 2 Nausea MODERATE N
3 ABC123-02 1 Dizziness SEVERE Y
4. Converting Data Frames to Tibbles
Many R packages and datasets use the traditional data.frame. To convert a data.frame to a tibble, use as_tibble():
ae_df <- data.frame(
USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
AESEQ = c(1, 2, 1),
AETERM = c("Headache", "Nausea", "Dizziness"),
AESEV = c("MILD", "MODERATE", "SEVERE"),
AESER = c("N", "N", "Y")
)
as_tibble(ae_df)
class(ae_df) # Check class
class(as_tibble(ae_df))
Output:
> as_tibble(ae_df)
# A tibble: 3 × 5
USUBJID AESEQ AETERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-01 1 Headache MILD N
2 ABC123-01 2 Nausea MODERATE N
3 ABC123-02 1 Dizziness SEVERE Y
>
> class(ae_df) # Check class
[1] "data.frame"
>
> class(as_tibble(ae_df))
[1] "tbl_df" "tbl" "data.frame"
5. Creating a Tibble from Base R Data
The tibble package is part of the tidyverse. Load it with:
library(tidyverse)
For example, suppose you have an SDTM DM (Demographics) domain as a base R data.frame. To convert it to a tibble:
dm_df <- data.frame(
STUDYID = rep("XYZ1001", 15),
USUBJID = paste0("XYZ1001-", sprintf("%03d", 1:15)),
AGE = c(34, 58, 47, 29, 53, 41, 62, 38, 50, 45, 36, 55, 49, 60, 42),
SEX = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M"),
RACE = c("WHITE", "ASIAN", "BLACK", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN")
)
as_tibble(dm_df)
Output:
# A tibble: 15 × 5
STUDYID USUBJID AGE SEX RACE
<chr> <chr> <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-001 34 M WHITE
2 XYZ1001 XYZ1001-002 58 F ASIAN
3 XYZ1001 XYZ1001-003 47 M BLACK
4 XYZ1001 XYZ1001-004 29 F WHITE
5 XYZ1001 XYZ1001-005 53 M BLACK
6 XYZ1001 XYZ1001-006 41 F ASIAN
7 XYZ1001 XYZ1001-007 62 M WHITE
8 XYZ1001 XYZ1001-008 38 F BLACK
9 XYZ1001 XYZ1001-009 50 M ASIAN
10 XYZ1001 XYZ1001-010 45 F WHITE
11 XYZ1001 XYZ1001-011 36 M BLACK
12 XYZ1001 XYZ1001-012 55 F ASIAN
13 XYZ1001 XYZ1001-013 49 M WHITE
14 XYZ1001 XYZ1001-014 60 F BLACK
15 XYZ1001 XYZ1001-015 42 M ASIAN
- Tibbles print only the first 10 rows by default, making it easier to view large datasets.
- Column types are shown under the column names.
To see more rows or all columns, use:
as_tibble(dm_df) %>% print(n = 5, width = Inf)
Output:
# A tibble: 15 × 5
STUDYID USUBJID AGE SEX RACE
<chr> <chr> <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-001 34 M WHITE
2 XYZ1001 XYZ1001-002 58 F ASIAN
3 XYZ1001 XYZ1001-003 47 M BLACK
4 XYZ1001 XYZ1001-004 29 F WHITE
5 XYZ1001 XYZ1001-005 53 M BLACK
# ℹ 10 more rows
# ℹ Use `print(n = ...)` to see more rows
You can also use View(dm_df) in RStudio for a spreadsheet-like view.
6. Sampling and Slicing Rows
Use dplyr functions to view parts of your tibble:
Random sample:
slice_sample(dm_df, n = 2)Output (will vary):
# A tibble: 2 × 5 STUDYID USUBJID AGE SEX RACE <chr> <chr> <dbl> <chr> <chr> 1 XYZ1001 XYZ1001-007 62 M WHITE 2 XYZ1001 XYZ1001-011 36 M BLACKFirst rows:
slice_head(dm_df, n = 2)Output:
# A tibble: 2 × 5 STUDYID USUBJID AGE SEX RACE <chr> <chr> <dbl> <chr> <chr> 1 XYZ1001 XYZ1001-001 34 M WHITE 2 XYZ1001 XYZ1001-002 58 F ASIANLast rows:
slice_tail(dm_df, n = 2)Output:
# A tibble: 2 × 5 STUDYID USUBJID AGE SEX RACE <chr> <chr> <dbl> <chr> <chr> 1 XYZ1001 XYZ1001-014 60 F BLACK 2 XYZ1001 XYZ1001-015 42 M ASIAN
7. Creating a Tibble from Scratch
You can create a tibble directly. For example, to create a simple SDTM EX (Exposure) domain:
ex <- tibble(
USUBJID = c("XYZ1001-001", "XYZ1001-002", "XYZ1001-003"),
EXTRT = c("DrugA", "DrugA", "DrugB"),
EXDOSE = c(50, 50, 100),
EXDOSU = "mg"
)
ex
- If you provide a single value (like
EXDOSU = "mg"), it is repeated for all rows.
8. Using Non-Standard Column Names
Tibbles allow column names that are not allowed in data.frames. For example, you might want to use SDTM variable labels or special characters:
tibble(
`Subject ID` = c("XYZ1001-001", "XYZ1001-002"),
`Visit 1 (mg)` = c(50, 100),
`Dose/Day` = c("QD", "BID")
)
Output:
# A tibble: 2 × 3
`Subject ID` `Visit 1 (mg)` `Dose/Day`
<chr> <dbl> <chr>
1 XYZ1001-001 50 QD
2 XYZ1001-002 100 BID
- Use backticks for names with spaces, numbers, or symbols.
9. Basic Tibble Operations
# Assuming 'ae' is defined as:
ae <- tibble(
USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
AESEQ = c(1, 2, 1),
AETERM = c("Headache", "Nausea", "Dizziness"),
AESEV = c("MILD", "MODERATE", "SEVERE"),
AESER = c("N", "N", "Y")
)
# Input:
ae
# Output:
# A tibble: 3 × 5
# USUBJID AESEQ AETERM AESEV AESER
# <chr> <dbl> <chr> <chr> <chr>
# 1 ABC123-01 1 Headache MILD N
# 2 ABC123-01 2 Nausea MODERATE N
# 3 ABC123-02 1 Dizziness SEVERE Y
- Preview data:
print(ae, n = 20)
#OR
glimpse(ae)
Output:
> print(ae, n = 20)
# A tibble: 3 × 5
USUBJID AESEQ AETERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-01 1 Headache MILD N
2 ABC123-01 2 Nausea MODERATE N
3 ABC123-02 1 Dizziness SEVERE Y
> glimpse(ae)
Rows: 3
Columns: 5
$ USUBJID <chr> "ABC123-01", "ABC123-01", "ABC123-02"
$ AESEQ <dbl> 1, 2, 1
$ AETERM <chr> "Headache", "Nausea", "Dizziness"
$ AESEV <chr> "MILD", "MODERATE", "SEVERE"
$ AESER <chr> "N", "N", "Y"
- Access columns:
ae$AETERM
#OR
ae[["AETERM"]]
Output:
[1] "Headache" "Nausea" "Dizziness"
- Rename columns:
rename(ae, AE_TERM = AETERM)
Output:
# A tibble: 3 × 5
USUBJID AESEQ AE_TERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-01 1 Headache MILD N
2 ABC123-01 2 Nausea MODERATE N
3 ABC123-02 1 Dizziness SEVERE Y
- Add/modify columns:
mutate(ae, AESEV_NUM = as.integer(factor(AESEV, levels = c("MILD", "MODERATE", "SEVERE"))))
Output:
# A tibble: 3 × 6
USUBJID AESEQ AETERM AESEV AESER AESEV_NUM
<chr> <dbl> <chr> <chr> <chr> <int>
1 ABC123-01 1 Headache MILD N 1
2 ABC123-01 2 Nausea MODERATE N 2
3 ABC123-02 1 Dizziness SEVERE Y 3
- Filter rows:
filter(ae, AESEV == "SEVERE")
Output:
# A tibble: 1 × 5
USUBJID AESEQ AETERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-02 1 Dizziness SEVERE Y
- Select columns:
select(ae, USUBJID, AETERM)
Output:
# A tibble: 3 × 2
USUBJID AETERM
<chr> <chr>
1 ABC123-01 Headache
2 ABC123-01 Nausea
3 ABC123-02 Dizziness
- Arrange rows:
arrange(ae, desc(AESEQ))
Output:
# A tibble: 3 × 5
USUBJID AESEQ AETERM AESEV AESER
<chr> <dbl> <chr> <chr> <chr>
1 ABC123-01 2 Nausea MODERATE N
2 ABC123-01 1 Headache MILD N
3 ABC123-02 1 Dizziness SEVERE Y
- Summarize:
ae %>% group_by(AESEV) %>% summarise(n = n())
Output:
# A tibble: 3 × 2
AESEV n
<chr> <int>
1 MILD 1
2 MODERATE 1
3 SEVERE 1
- Convert back to data.frame:
af <- as.data.frame(ae)
af
class(af)
Output:
USUBJID AESEQ AETERM AESEV AESER
1 ABC123-01 1 Headache MILD N
2 ABC123-01 2 Nausea MODERATE N
3 ABC123-02 1 Dizziness SEVERE Y
[1] "data.frame"
10. Subsetting Tibbles
Tibbles support several ways to subset data. Here are detailed examples:
By column name using
$:
Returns a vector.ae$AETERMOutput:
[1] "Headache" "Nausea" "Dizziness"By column name or position using
[[and[:
Returns a vector.ae[["AETERM"]]Output:
[1] "Headache" "Nausea" "Dizziness"ae[[3]]Output:
[1] "Headache" "Nausea" "Dizziness"By column(s) using
select():
Returns a tibble.select(ae, AETERM, AESEV)Output:
# A tibble: 3 × 2 AETERM AESEV <chr> <chr> 1 Headache MILD 2 Nausea MODERATE 3 Dizziness SEVEREselect(ae, 3:4)Output:
# A tibble: 3 × 2 AETERM AESEV <chr> <chr> 1 Headache MILD 2 Nausea MODERATE 3 Dizziness SEVEREBy row(s) using
slice():
Returns a tibble.slice(ae, 1)Output:
# A tibble: 1 × 5 USUBJID AESEQ AETERM AESEV AESER <chr> <dbl> <chr> <chr> <chr> 1 ABC123-01 1 Headache MILD Nslice(ae, 2:3)Output:
# A tibble: 2 × 5 USUBJID AESEQ AETERM AESEV AESER <chr> <dbl> <chr> <chr> <chr> 1 ABC123-01 2 Nausea MODERATE N 2 ABC123-02 1 Dizziness SEVERE YBy row and column using
[rows, cols]:
Returns a tibble.ae[1, 3]Output:
# A tibble: 1 × 1 AETERM <chr> 1 Headacheae[1:2, c("AETERM", "AESEV")]Output:
# A tibble: 2 × 2 AETERM AESEV <chr> <chr> 1 Headache MILD 2 Nausea MODERATEWith logical conditions using
filter():
Returns a tibble.filter(ae, AESEV == "SEVERE")Output:
# A tibble: 1 × 5 USUBJID AESEQ AETERM AESEV AESER <chr> <dbl> <chr> <chr> <chr> 1 ABC123-02 1 Dizziness SEVERE Yfilter(ae, AESER == "Y" & AESEV == "SEVERE")Output:
# A tibble: 1 × 5 USUBJID AESEQ AETERM AESEV AESER <chr> <dbl> <chr> <chr> <chr> 1 ABC123-02 1 Dizziness SEVERE YWith logical vectors for rows and columns:
ae[c(TRUE, FALSE, TRUE), c("USUBJID", "AETERM")]Output:
# A tibble: 2 × 2 USUBJID AETERM <chr> <chr> 1 ABC123-01 Headache 2 ABC123-02 DizzinessUsing
pull()to extract a column as a vector:pull(ae, AETERM)Output:
[1] "Headache" "Nausea" "Dizziness"Note:
Subsetting with[ , ]on tibbles always returns a tibble, unless you use[[or$which return a vector.
11. Tips for Beginners
- Use
read_csv()to import data as a tibble. - Use
View(ae)in RStudio to see your data in a spreadsheet view. - Use
as_tibble()to convert any data.frame to a tibble. - Tibbles work well with the
%>%pipe for chaining commands.
Tibbles are the recommended way to work with tabular data in R, especially for clinical and tidyverse workflows. Start using tibbles for easier, safer, and more readable data analysis.
**Resource download links**
2.2.1.-Introduction-to-Tibbles