2.2.2. Working with JSON Data in R
1. Introduction to JSON
- JSON (JavaScript Object Notation) is a lightweight, text-based format for storing and exchanging data.
- Unlike tabular formats (CSV, Excel), JSON supports nested and hierarchical data structures.
- JSON is widely used for data interchange between web browsers and servers, APIs, and configuration files.
- JSON data is organized as key-value pairs and can contain objects, arrays, and nested structures.
Example of JSON Structure (SDTM Subject):
{
"USUBJID": "1015",
"SEX": "M",
"AGE": 56,
"ARM": "Placebo"
}
- Keys are always strings; values can be strings, numbers, booleans, arrays, or other objects.
- JSON can represent complex, nested data (e.g., lists within lists, objects within objects).
2. JSON vs Tabular Data
- Tabular data (CSV, Excel, tibbles) is organized in rows and columns.
- JSON can represent both tabular and non-tabular (nested) data.
- JSON is ideal for hierarchical data, such as SDTM datasets with supplemental qualifiers, relationships, or when representing subject-level and visit-level data together.
3. Importing JSON Data in R
3.1. Minimal Example: JSON as a String
JSON string representing SDTM subjects
json <- '
[
{"USUBJID": "1015", "SEX": "M", "AGE": 56, "ARM": "Placebo"},
{"USUBJID": "1016", "SEX": "F", "AGE": 62, "ARM": "Active"},
{"USUBJID": "1017", "SEX": "M", "AGE": 48, "ARM": "Placebo"}
]'
json
Output:
> json
[1] "\n[\n {\"USUBJID\": \"1015\", \"SEX\": \"M\", \"AGE\": 56, \"ARM\": \"Placebo\"},\n {\"USUBJID\": \"1016\", \"SEX\": \"F\", \"AGE\": 62, \"ARM\": \"Active\"},\n {\"USUBJID\": \"1017\", \"SEX\": \"M\", \"AGE\": 48, \"ARM\": \"Placebo\"}\n]"
3.2. Converting JSON to Data Frame
- Use the
jsonlitepackage for easy conversion.
#install.packages("jsonlite")
library(jsonlite)
sdtm_df <- fromJSON(json)
sdtm_df
Output (Data Frame):
USUBJID SEX AGE ARM
1 1015 M 56 Placebo
2 1016 F 62 Active
3 1017 M 48 Placebo
fromJSON()automatically parses JSON arrays into data frames when possible.
3.3. Importing JSON from a File
- Use
read_json()from thejsonlitepackage to read JSON files.
# Read JSON file as list (retains nested structure)
sdtm_list <- read_json("sdtm_subjects.json")
# Read JSON file and simplify to vectors/data frames where possible
sdtm_df <- read_json("sdtm_subjects.json", simplifyVector = TRUE)
Input Example (sdtm_subjects.json):
[
{"USUBJID": "1018", "SEX": "F", "AGE": 51, "ARM": "Active"},
{"USUBJID": "1019", "SEX": "M", "AGE": 60, "ARM": "Placebo"}
]
Output as list
> sdtm_list
[[1]]
[[1]]$USUBJID
[1] "1018"
[[1]]$SEX
[1] "F"
[[1]]$AGE
[1] 51
[[1]]$ARM
[1] "Active"
[[2]]
[[2]]$USUBJID
[1] "1019"
[[2]]$SEX
[1] "M"
[[2]]$AGE
[1] 60
[[2]]$ARM
[1] "Placebo"
Output (with simplifyVector = TRUE):
> sdtm_df
USUBJID SEX AGE ARM
1 1018 F 51 Active
2 1019 M 60 Placebo
4. Exporting Data to JSON in R
- Use
toJSON()from thejsonlitepackage to convert R objects (data frames, lists) to JSON.
library(jsonlite)
json_out <- toJSON(sdtm_df)
json_out
Output (JSON string):
[{"USUBJID":"1018","SEX":"F","AGE":51,"ARM":"Active"},
{"USUBJID":"1019","SEX":"M","AGE":60,"ARM":"Placebo"}]
- You can write this JSON string to a file using
write()orwriteLines():
write(json_out, file = "output_sdtm.json")
5. Input and Output Table Summary
| Operation | R Function | Input Example | Output Example |
|---|---|---|---|
| Import JSON string | fromJSON() |
SDTM JSON string | Data frame / list |
| Import JSON file | read_json() |
.json file | List or data frame |
| Export to JSON string | toJSON() |
Data frame / list | JSON string |
| Export to JSON file | write() |
JSON string | .json file |
6. Beyond the Basics: Exploring JSON in R
Handling deeply nested data:
fromJSON()andread_json()can parse nested lists and objects.- Use
flatten = TRUEinfromJSON()to flatten nested data frames.
Example (SDTM with Nested Visits):
nested_json <- ' [ { "USUBJID": "1020", "VISITS": [ {"VISITNUM": 1, "VISIT": "Screening", "AGE": 55}, {"VISITNUM": 2, "VISIT": "Baseline", "AGE": 55} ] }, { "USUBJID": "1021", "VISITS": [ {"VISITNUM": 1, "VISIT": "Screening", "AGE": 60}, {"VISITNUM": 2, "VISIT": "Baseline", "AGE": 60} ] } ]' library(jsonlite) df_flat <- fromJSON(nested_json, flatten = TRUE) df_flatOutput:
> df_flat USUBJID VISITS 1 1020 1, 2, Screening, Baseline, 55, 55 2 1021 1, 2, Screening, Baseline, 60, 60Pretty printing:
- Use
toJSON(..., pretty = TRUE)for human-readable JSON output.
Example:
pretty_json <- toJSON(sdtm_df, pretty = TRUE) cat(pretty_json)Output:
[ { "USUBJID": "1018", "SEX": "F", "AGE": 51, "ARM": "Active" }, { "USUBJID": "1019", "SEX": "M", "AGE": 60, "ARM": "Placebo" } ]- Use
Customizing output:
- Control how missing values, factors, and dates are encoded with
toJSON()arguments.
Example:
df2 <- data.frame(USUBJID = c("1022", "1023"), SEX = c("F", NA), AGE = c(NA, 45), ARM = c("Active", "Placebo"), stringsAsFactors = TRUE) toJSON(df2, na = "null", factor = "string")Output:
[ {"USUBJID":"1022","SEX":"F","AGE":null,"ARM":"Active"}, {"USUBJID":"1023","SEX":null,"AGE":45,"ARM":"Placebo"} ]- Control how missing values, factors, and dates are encoded with
Working with APIs:
- Many clinical data repositories or web APIs return SDTM data in JSON; use
httr::GET()andcontent(..., as = "text")to retrieve and parse JSON.
Example:
# install.packages("httr") library(httr) library(jsonlite) resp <- GET("https://clinicaltrials.gov/api/v2/studies?query.term=diabetes&fields=NCTId|Condition|Phase&pageSize=2&format=json") sdtm_json <- content(resp, as = "text", encoding = "UTF-8") sdtm_data <- fromJSON(sdtm_json, flatten = TRUE) head(sdtm_data)Output (partial):
$studies protocolSection.identificationModule.nctId protocolSection.conditionsModule.conditions protocolSection.designModule.phases 1 NCT00929539 Type II Diabetes Mellitus PHASE2 2 NCT00308139 Type 2 Diabetes Mellitus PHASE3 $nextPageToken [1] "NF0g5JKAkg"- Many clinical data repositories or web APIs return SDTM data in JSON; use
Validating JSON:
- Use
validate()fromjsonliteto check if a string is valid JSON.
Example:
invalid_json <- '{"USUBJID": "9999", "AGE": 50,}' # Note the trailing comma validate(invalid_json)Output:
[1] FALSE- Use
7. Summary and Best Practices
- JSON is ideal for hierarchical and nested data, and is widely used for data exchange.
- Use
jsonlitefor robust JSON import/export in R. - Always inspect the structure of imported JSON, especially for nested data.
- Use
simplifyVector = TRUEorflatten = TRUEto convert nested JSON to data frames when possible. - For reproducibility, document the structure of your JSON data and any transformations applied.
**Resource download links**
2.2.2.-Working-with-JSON-Data-in-R.zip
⁂