2.2.1. Introduction to Tibbles

1. What is a Tibble?

A tibble is a modern, user-friendly version of R's data.frame. Tibbles are the default table format in the tidyverse and make data analysis easier and less error-prone.

Key Features:

No automatic conversion of character columns to factors.
Keeps column names as you enter them (including spaces/special characters).
No row names.
Prints only the first 10 rows and columns that fit your screen.
Clearer error messages and safer subsetting.

2. Tibble vs. Data Frame

Tibbles keep character columns as strings; data.frames may convert them to factors.
Tibbles keep your column names unchanged; data.frames may alter them.
Tibbles print a preview; data.frames may print the whole dataset.

3. Creating a Tibble

library(tibble)
ae <- tibble(
  USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
  AESEQ = c(1, 2, 1),
  AETERM = c("Headache", "Nausea", "Dizziness"),
  AESEV = c("MILD", "MODERATE", "SEVERE"),
  AESER = c("N", "N", "Y")
)
ae

Output:

# A tibble: 3 × 5
  USUBJID   AESEQ AETERM    AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     1 Headache  MILD     N    
2 ABC123-01     2 Nausea    MODERATE N    
3 ABC123-02     1 Dizziness SEVERE   Y

4. Converting Data Frames to Tibbles

Many R packages and datasets use the traditional data.frame. To convert a data.frame to a tibble, use as_tibble():

ae_df <- data.frame(
  USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
  AESEQ = c(1, 2, 1),
  AETERM = c("Headache", "Nausea", "Dizziness"),
  AESEV = c("MILD", "MODERATE", "SEVERE"),
  AESER = c("N", "N", "Y")
)
as_tibble(ae_df)

class(ae_df)  # Check class

class(as_tibble(ae_df))

Output:

> as_tibble(ae_df)
# A tibble: 3 × 5
  USUBJID   AESEQ AETERM    AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     1 Headache  MILD     N    
2 ABC123-01     2 Nausea    MODERATE N    
3 ABC123-02     1 Dizziness SEVERE   Y    
> 
> class(ae_df)  # Check class
[1] "data.frame"
> 
> class(as_tibble(ae_df))
[1] "tbl_df"     "tbl"        "data.frame"

5. Creating a Tibble from Base R Data

The tibble package is part of the tidyverse. Load it with:

library(tidyverse)

For example, suppose you have an SDTM DM (Demographics) domain as a base R data.frame. To convert it to a tibble:

dm_df <- data.frame(
  STUDYID = rep("XYZ1001", 15),
  USUBJID = paste0("XYZ1001-", sprintf("%03d", 1:15)),
  AGE = c(34, 58, 47, 29, 53, 41, 62, 38, 50, 45, 36, 55, 49, 60, 42),
  SEX = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M"),
  RACE = c("WHITE", "ASIAN", "BLACK", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN", "WHITE", "BLACK", "ASIAN")
)
as_tibble(dm_df)

Output:

# A tibble: 15 × 5
   STUDYID USUBJID       AGE SEX   RACE 
   <chr>   <chr>       <dbl> <chr> <chr>
 1 XYZ1001 XYZ1001-001    34 M     WHITE
 2 XYZ1001 XYZ1001-002    58 F     ASIAN
 3 XYZ1001 XYZ1001-003    47 M     BLACK
 4 XYZ1001 XYZ1001-004    29 F     WHITE
 5 XYZ1001 XYZ1001-005    53 M     BLACK
 6 XYZ1001 XYZ1001-006    41 F     ASIAN
 7 XYZ1001 XYZ1001-007    62 M     WHITE
 8 XYZ1001 XYZ1001-008    38 F     BLACK
 9 XYZ1001 XYZ1001-009    50 M     ASIAN
10 XYZ1001 XYZ1001-010    45 F     WHITE
11 XYZ1001 XYZ1001-011    36 M     BLACK
12 XYZ1001 XYZ1001-012    55 F     ASIAN
13 XYZ1001 XYZ1001-013    49 M     WHITE
14 XYZ1001 XYZ1001-014    60 F     BLACK
15 XYZ1001 XYZ1001-015    42 M     ASIAN

Tibbles print only the first 10 rows by default, making it easier to view large datasets.
Column types are shown under the column names.

To see more rows or all columns, use:

as_tibble(dm_df) %>% print(n = 5, width = Inf)

Output:

# A tibble: 15 × 5
  STUDYID USUBJID       AGE SEX   RACE 
  <chr>   <chr>       <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-001    34 M     WHITE
2 XYZ1001 XYZ1001-002    58 F     ASIAN
3 XYZ1001 XYZ1001-003    47 M     BLACK
4 XYZ1001 XYZ1001-004    29 F     WHITE
5 XYZ1001 XYZ1001-005    53 M     BLACK
# ℹ 10 more rows
# ℹ Use `print(n = ...)` to see more rows

You can also use View(dm_df) in RStudio for a spreadsheet-like view.

6. Sampling and Slicing Rows

Use dplyr functions to view parts of your tibble:

Random sample:

slice_sample(dm_df, n = 2)

Output (will vary):

# A tibble: 2 × 5
  STUDYID USUBJID       AGE SEX   RACE 
  <chr>   <chr>       <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-007    62 M     WHITE
2 XYZ1001 XYZ1001-011    36 M     BLACK

First rows:

slice_head(dm_df, n = 2)

Output:

# A tibble: 2 × 5
  STUDYID USUBJID       AGE SEX   RACE 
  <chr>   <chr>       <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-001    34 M     WHITE
2 XYZ1001 XYZ1001-002    58 F     ASIAN

Last rows:

slice_tail(dm_df, n = 2)

Output:

# A tibble: 2 × 5
  STUDYID USUBJID       AGE SEX   RACE 
  <chr>   <chr>       <dbl> <chr> <chr>
1 XYZ1001 XYZ1001-014    60 F     BLACK
2 XYZ1001 XYZ1001-015    42 M     ASIAN

7. Creating a Tibble from Scratch

You can create a tibble directly. For example, to create a simple SDTM EX (Exposure) domain:

ex <- tibble(
  USUBJID = c("XYZ1001-001", "XYZ1001-002", "XYZ1001-003"),
  EXTRT = c("DrugA", "DrugA", "DrugB"),
  EXDOSE = c(50, 50, 100),
  EXDOSU = "mg"
)
ex

If you provide a single value (like EXDOSU = "mg"), it is repeated for all rows.

8. Using Non-Standard Column Names

Tibbles allow column names that are not allowed in data.frames. For example, you might want to use SDTM variable labels or special characters:

tibble(
  `Subject ID` = c("XYZ1001-001", "XYZ1001-002"),
  `Visit 1 (mg)` = c(50, 100),
  `Dose/Day` = c("QD", "BID")
)

Output:

# A tibble: 2 × 3
  `Subject ID` `Visit 1 (mg)` `Dose/Day`
  <chr>                 <dbl> <chr>     
1 XYZ1001-001              50 QD        
2 XYZ1001-002             100 BID

Use backticks for names with spaces, numbers, or symbols.

9. Basic Tibble Operations

# Assuming 'ae' is defined as:
ae <- tibble(
  USUBJID = c("ABC123-01", "ABC123-01", "ABC123-02"),
  AESEQ = c(1, 2, 1),
  AETERM = c("Headache", "Nausea", "Dizziness"),
  AESEV = c("MILD", "MODERATE", "SEVERE"),
  AESER = c("N", "N", "Y")
)
# Input:
ae
# Output:
# A tibble: 3 × 5
#  USUBJID   AESEQ AETERM    AESEV    AESER
#  <chr>     <dbl> <chr>     <chr>    <chr>
# 1 ABC123-01     1 Headache  MILD     N    
# 2 ABC123-01     2 Nausea    MODERATE N    
# 3 ABC123-02     1 Dizziness SEVERE   Y

Preview data:

print(ae, n = 20)

#OR

glimpse(ae)

Output:

> print(ae, n = 20)
# A tibble: 3 × 5
  USUBJID   AESEQ AETERM    AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     1 Headache  MILD     N    
2 ABC123-01     2 Nausea    MODERATE N    
3 ABC123-02     1 Dizziness SEVERE   Y    
> glimpse(ae)
Rows: 3
Columns: 5
$ USUBJID <chr> "ABC123-01", "ABC123-01", "ABC123-02"
$ AESEQ   <dbl> 1, 2, 1
$ AETERM  <chr> "Headache", "Nausea", "Dizziness"
$ AESEV   <chr> "MILD", "MODERATE", "SEVERE"
$ AESER   <chr> "N", "N", "Y"

Access columns:

ae$AETERM

#OR

ae[["AETERM"]]

Output:

[1] "Headache"  "Nausea"    "Dizziness"

Rename columns:

rename(ae, AE_TERM = AETERM)

Output:

# A tibble: 3 × 5
  USUBJID   AESEQ AE_TERM   AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     1 Headache  MILD     N    
2 ABC123-01     2 Nausea    MODERATE N    
3 ABC123-02     1 Dizziness SEVERE   Y

Add/modify columns:

mutate(ae, AESEV_NUM = as.integer(factor(AESEV, levels = c("MILD", "MODERATE", "SEVERE"))))

Output:

# A tibble: 3 × 6
  USUBJID   AESEQ AETERM    AESEV    AESER AESEV_NUM
  <chr>     <dbl> <chr>     <chr>    <chr>     <int>
1 ABC123-01     1 Headache  MILD     N             1
2 ABC123-01     2 Nausea    MODERATE N             2
3 ABC123-02     1 Dizziness SEVERE   Y             3

Filter rows:

filter(ae, AESEV == "SEVERE")

Output:

# A tibble: 1 × 5
  USUBJID   AESEQ AETERM    AESEV  AESER
  <chr>     <dbl> <chr>     <chr>  <chr>
1 ABC123-02     1 Dizziness SEVERE Y

Select columns:

select(ae, USUBJID, AETERM)

Output:

# A tibble: 3 × 2
  USUBJID   AETERM   
  <chr>     <chr>    
1 ABC123-01 Headache 
2 ABC123-01 Nausea   
3 ABC123-02 Dizziness

Arrange rows:

arrange(ae, desc(AESEQ))

Output:

# A tibble: 3 × 5
  USUBJID   AESEQ AETERM    AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     2 Nausea    MODERATE N    
2 ABC123-01     1 Headache  MILD     N    
3 ABC123-02     1 Dizziness SEVERE   Y

Summarize:

ae %>% group_by(AESEV) %>% summarise(n = n())

Output:

# A tibble: 3 × 2
  AESEV        n
  <chr>    <int>
1 MILD         1
2 MODERATE     1
3 SEVERE       1

Convert back to data.frame:

af <- as.data.frame(ae)
af
class(af)

Output:

    USUBJID AESEQ    AETERM    AESEV AESER
1 ABC123-01     1 Headache     MILD     N
2 ABC123-01     2    Nausea MODERATE     N
3 ABC123-02     1 Dizziness   SEVERE     Y
[1] "data.frame"

10. Subsetting Tibbles

Tibbles support several ways to subset data. Here are detailed examples:

By column name using $:
Returns a vector.

ae$AETERM

Output:

[1] "Headache"  "Nausea"    "Dizziness"

By column name or position using [[ and [:
Returns a vector.

ae[["AETERM"]]

Output:

[1] "Headache"  "Nausea"    "Dizziness"

ae[[3]]

Output:

[1] "Headache"  "Nausea"    "Dizziness"

By column(s) using select():
Returns a tibble.

select(ae, AETERM, AESEV)

Output:

# A tibble: 3 × 2
  AETERM    AESEV   
  <chr>     <chr>   
1 Headache  MILD    
2 Nausea    MODERATE
3 Dizziness SEVERE

select(ae, 3:4)

Output:

# A tibble: 3 × 2
  AETERM    AESEV   
  <chr>     <chr>   
1 Headache  MILD    
2 Nausea    MODERATE
3 Dizziness SEVERE

By row(s) using slice():
Returns a tibble.

slice(ae, 1)

Output:

# A tibble: 1 × 5
  USUBJID   AESEQ AETERM   AESEV AESER
  <chr>     <dbl> <chr>    <chr> <chr>
1 ABC123-01     1 Headache MILD  N

slice(ae, 2:3)

Output:

# A tibble: 2 × 5
  USUBJID   AESEQ AETERM    AESEV    AESER
  <chr>     <dbl> <chr>     <chr>    <chr>
1 ABC123-01     2 Nausea    MODERATE N    
2 ABC123-02     1 Dizziness SEVERE   Y

By row and column using [rows, cols]:
Returns a tibble.

ae[1, 3]

Output:

# A tibble: 1 × 1
  AETERM  
  <chr>   
1 Headache

ae[1:2, c("AETERM", "AESEV")]

Output:

# A tibble: 2 × 2
  AETERM   AESEV   
  <chr>    <chr>   
1 Headache MILD    
2 Nausea   MODERATE

With logical conditions using filter():
Returns a tibble.

filter(ae, AESEV == "SEVERE")

Output:

# A tibble: 1 × 5
  USUBJID   AESEQ AETERM    AESEV  AESER
  <chr>     <dbl> <chr>     <chr>  <chr>
1 ABC123-02     1 Dizziness SEVERE Y

filter(ae, AESER == "Y" & AESEV == "SEVERE")

Output:

# A tibble: 1 × 5
  USUBJID   AESEQ AETERM    AESEV  AESER
  <chr>     <dbl> <chr>     <chr>  <chr>
1 ABC123-02     1 Dizziness SEVERE Y

With logical vectors for rows and columns:

ae[c(TRUE, FALSE, TRUE), c("USUBJID", "AETERM")]

Output:

# A tibble: 2 × 2
  USUBJID   AETERM   
  <chr>     <chr>    
1 ABC123-01 Headache 
2 ABC123-02 Dizziness

Using pull() to extract a column as a vector:

pull(ae, AETERM)

Output:

[1] "Headache"  "Nausea"    "Dizziness"

Note:
Subsetting with [ , ] on tibbles always returns a tibble, unless you use [[ or $ which return a vector.

11. Tips for Beginners

Use read_csv() to import data as a tibble.
Use View(ae) in RStudio to see your data in a spreadsheet view.
Use as_tibble() to convert any data.frame to a tibble.
Tibbles work well with the %>% pipe for chaining commands.

Tibbles are the recommended way to work with tabular data in R, especially for clinical and tidyverse workflows. Start using tibbles for easier, safer, and more readable data analysis.

Resource download links

2.2.1.-Introduction-to-Tibbles

⁂