contact@a2zlearners.com

2.6.5. Scatterplots in ggplot2

1. Introduction

Scatterplots are a fundamental plot type for visualizing the relationship between two numeric variables. In clinical data analysis (e.g., ADaM or SDTM datasets), scatterplots are often used to explore relationships such as age vs. cholesterol, lab values vs. baseline, or other continuous endpoints. In ggplot2, scatterplots are created using geom_point(). This section demonstrates how to build, customize, and interpret scatterplots using clinical data.


2. Creating a Basic Scatterplot

  • Use geom_point() to plot two numeric variables from a clinical dataset.
  • The x argument sets the variable on the x-axis; y sets the y-axis.

R Code:

library(ggplot2)
# Example ADaM-like dataset with easy-to-understand variables
adam <- data.frame(
  USUBJID = paste0("SUBJ", 1:30),
  AGE = sample(30:80, 30, replace = TRUE),
  HEIGHT = round(rnorm(30, mean = 165, sd = 10), 1), # in cm
  WEIGHT = round(rnorm(30, mean = 70, sd = 12), 1)   # in kg
)
ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT))

Expected Outcome:

2.6.5.Scatterplot1.png

A scatterplot showing the relationship between patient age and weight.


3. Customizing Scatterplots with Aesthetics

  • Aesthetics control how points look (color, size, shape).
  • Map clinical variables to aesthetics inside aes() to visualize additional dimensions.

Point Color by Sex

set.seed(123)
adam$SEX <- sample(c("M", "F"), nrow(adam), replace = TRUE)
ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT, color = SEX))

2.6.5.Scatterplot2.png

  • Colors points by sex, adding a legend automatically.

Point Size by Height

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT, size = HEIGHT))

2.6.5.Scatterplot3.png

  • Maps height to point size (useful for highlighting taller subjects).

Manual Point Color

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT), color = "blue")

2.6.5.Scatterplot4.png

  • All points are colored blue (set outside aes()).

Manual Point Size

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT), size = 3)

2.6.5.Scatterplot5.png

  • All points are larger (set outside aes()).

Point Shape by Sex

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT, shape = SEX))

2.6.5.Scatterplot6.png

  • Maps sex to shape.

Manual Point Shape

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT), shape = 17)

2.6.5.Scatterplot7.png

  • All points are triangles.

4. Faceting: Breaking Down by Category

  • Faceting creates subplots for each level of a clinical variable (e.g., sex).
  • Use facet_wrap() to split the scatterplot by a variable.

R Code:

ggplot(data = adam) +
  geom_point(mapping = aes(x = AGE, y = WEIGHT)) +
  facet_wrap(~SEX)

Expected Outcome:

2.6.5.Scatterplot8.png

Two scatterplots, one for each sex, showing age vs. weight.


5. Input and Output Table for Scatterplot Variations

R Code Example Input Data Output (Plot/Description)
geom_point() adam Basic scatterplot
color = SEX adam Colored by sex
size = HEIGHT adam Size by height
facet_wrap(~SEX) adam Subplots by sex

6. Exploring Beyond Basic Scatterplots

  • Add trend lines with geom_smooth().
  • Use transparency (alpha) to reduce overplotting.
  • Combine with other geoms (e.g., geom_jitter()).
  • Annotate points with geom_text() or geom_label().
  • Save your plot with ggsave().

R Code Example: Add a Trend Line

ggplot(adam, aes(x = AGE, y = WEIGHT)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red")

2.6.5.Scatterplot9.png

  • Adds a linear regression line to the scatterplot.

7. Practice Problems

  1. Create a scatterplot of weight vs. age, colored by sex.
  2. Make a scatterplot with point size mapped to height.
  3. Facet a scatterplot by sex.
  4. Add a trend line to a scatterplot.
  5. Save your scatterplot as a PNG file.

8. Further Reading and Resources


**Resource download links**

2.6.5.-Scatterplots-in-ggplot2.zip