contact@a2zlearners.com

2.2.2. Working with JSON Data in R

1. Introduction to JSON

  • JSON (JavaScript Object Notation) is a lightweight, text-based format for storing and exchanging data.
  • Unlike tabular formats (CSV, Excel), JSON supports nested and hierarchical data structures.
  • JSON is widely used for data interchange between web browsers and servers, APIs, and configuration files.
  • JSON data is organized as key-value pairs and can contain objects, arrays, and nested structures.

Example of JSON Structure (SDTM Subject):

{
  "USUBJID": "1015",
  "SEX": "M",
  "AGE": 56,
  "ARM": "Placebo"
}
  • Keys are always strings; values can be strings, numbers, booleans, arrays, or other objects.
  • JSON can represent complex, nested data (e.g., lists within lists, objects within objects).

2. JSON vs Tabular Data

  • Tabular data (CSV, Excel, tibbles) is organized in rows and columns.
  • JSON can represent both tabular and non-tabular (nested) data.
  • JSON is ideal for hierarchical data, such as SDTM datasets with supplemental qualifiers, relationships, or when representing subject-level and visit-level data together.

3. Importing JSON Data in R

3.1. Minimal Example: JSON as a String

JSON string representing SDTM subjects

json <- '
[
  {"USUBJID": "1015", "SEX": "M", "AGE": 56, "ARM": "Placebo"},
  {"USUBJID": "1016", "SEX": "F", "AGE": 62, "ARM": "Active"},
  {"USUBJID": "1017", "SEX": "M", "AGE": 48, "ARM": "Placebo"}
]'
json

Output:

> json
[1] "\n[\n  {\"USUBJID\": \"1015\", \"SEX\": \"M\", \"AGE\": 56, \"ARM\": \"Placebo\"},\n  {\"USUBJID\": \"1016\", \"SEX\": \"F\", \"AGE\": 62, \"ARM\": \"Active\"},\n  {\"USUBJID\": \"1017\", \"SEX\": \"M\", \"AGE\": 48, \"ARM\": \"Placebo\"}\n]"
3.2. Converting JSON to Data Frame
  • Use the jsonlite package for easy conversion.
#install.packages("jsonlite")
library(jsonlite)
sdtm_df <- fromJSON(json)
sdtm_df

Output (Data Frame):

  USUBJID SEX AGE     ARM
1    1015   M  56 Placebo
2    1016   F  62  Active
3    1017   M  48 Placebo
  • fromJSON() automatically parses JSON arrays into data frames when possible.

3.3. Importing JSON from a File
  • Use read_json() from the jsonlite package to read JSON files.
# Read JSON file as list (retains nested structure)
sdtm_list <- read_json("sdtm_subjects.json")

# Read JSON file and simplify to vectors/data frames where possible
sdtm_df <- read_json("sdtm_subjects.json", simplifyVector = TRUE)

Input Example (sdtm_subjects.json):

[
  {"USUBJID": "1018", "SEX": "F", "AGE": 51, "ARM": "Active"},
  {"USUBJID": "1019", "SEX": "M", "AGE": 60, "ARM": "Placebo"}
]

Output as list

> sdtm_list
[[1]]
[[1]]$USUBJID
[1] "1018"

[[1]]$SEX
[1] "F"

[[1]]$AGE
[1] 51

[[1]]$ARM
[1] "Active"


[[2]]
[[2]]$USUBJID
[1] "1019"

[[2]]$SEX
[1] "M"

[[2]]$AGE
[1] 60

[[2]]$ARM
[1] "Placebo"

Output (with simplifyVector = TRUE):

> sdtm_df
  USUBJID SEX AGE     ARM
1    1018   F  51  Active
2    1019   M  60 Placebo

4. Exporting Data to JSON in R

  • Use toJSON() from the jsonlite package to convert R objects (data frames, lists) to JSON.
library(jsonlite)
json_out <- toJSON(sdtm_df)
json_out

Output (JSON string):

[{"USUBJID":"1018","SEX":"F","AGE":51,"ARM":"Active"},
 {"USUBJID":"1019","SEX":"M","AGE":60,"ARM":"Placebo"}]
  • You can write this JSON string to a file using write() or writeLines():
write(json_out, file = "output_sdtm.json")

5. Input and Output Table Summary

Operation R Function Input Example Output Example
Import JSON string fromJSON() SDTM JSON string Data frame / list
Import JSON file read_json() .json file List or data frame
Export to JSON string toJSON() Data frame / list JSON string
Export to JSON file write() JSON string .json file

6. Beyond the Basics: Exploring JSON in R

  • Handling deeply nested data:

    • fromJSON() and read_json() can parse nested lists and objects.
    • Use flatten = TRUE in fromJSON() to flatten nested data frames.

    Example (SDTM with Nested Visits):

    nested_json <- '
    [
      {
        "USUBJID": "1020",
        "VISITS": [
          {"VISITNUM": 1, "VISIT": "Screening", "AGE": 55},
          {"VISITNUM": 2, "VISIT": "Baseline", "AGE": 55}
        ]
      },
      {
        "USUBJID": "1021",
        "VISITS": [
          {"VISITNUM": 1, "VISIT": "Screening", "AGE": 60},
          {"VISITNUM": 2, "VISIT": "Baseline", "AGE": 60}
        ]
      }
    ]'
    library(jsonlite)
    df_flat <- fromJSON(nested_json, flatten = TRUE)
    df_flat
    

    Output:

    > df_flat
      USUBJID                            VISITS
    1    1020 1, 2, Screening, Baseline, 55, 55
    2    1021 1, 2, Screening, Baseline, 60, 60
    
  • Pretty printing:

    • Use toJSON(..., pretty = TRUE) for human-readable JSON output.

    Example:

    pretty_json <- toJSON(sdtm_df, pretty = TRUE)
    cat(pretty_json)
    

    Output:

    [
      {
        "USUBJID": "1018",
        "SEX": "F",
        "AGE": 51,
        "ARM": "Active"
      },
      {
        "USUBJID": "1019",
        "SEX": "M",
        "AGE": 60,
        "ARM": "Placebo"
      }
    ]
    
  • Customizing output:

    • Control how missing values, factors, and dates are encoded with toJSON() arguments.

    Example:

    df2 <- data.frame(USUBJID = c("1022", "1023"),
                      SEX = c("F", NA),
                      AGE = c(NA, 45),
                      ARM = c("Active", "Placebo"),
                      stringsAsFactors = TRUE)
    toJSON(df2, na = "null", factor = "string")
    

    Output:

    [
      {"USUBJID":"1022","SEX":"F","AGE":null,"ARM":"Active"},
      {"USUBJID":"1023","SEX":null,"AGE":45,"ARM":"Placebo"}
    ]
    
  • Working with APIs:

    • Many clinical data repositories or web APIs return SDTM data in JSON; use httr::GET() and content(..., as = "text") to retrieve and parse JSON.

    Example:

    # install.packages("httr")
    library(httr)
    library(jsonlite)
    
    resp <- GET("https://clinicaltrials.gov/api/v2/studies?query.term=diabetes&fields=NCTId|Condition|Phase&pageSize=2&format=json")
    sdtm_json <- content(resp, as = "text", encoding = "UTF-8")
    sdtm_data <- fromJSON(sdtm_json, flatten = TRUE)
    head(sdtm_data)
    

    Output (partial):

    $studies
    protocolSection.identificationModule.nctId protocolSection.conditionsModule.conditions protocolSection.designModule.phases
    1                                NCT00929539                   Type II Diabetes Mellitus                              PHASE2
    2                                NCT00308139                    Type 2 Diabetes Mellitus                              PHASE3
    
    $nextPageToken
    [1] "NF0g5JKAkg"
    
  • Validating JSON:

    • Use validate() from jsonlite to check if a string is valid JSON.

    Example:

    invalid_json <- '{"USUBJID": "9999", "AGE": 50,}' # Note the trailing comma
    validate(invalid_json)
    

    Output:

    [1] FALSE
    

7. Summary and Best Practices

  • JSON is ideal for hierarchical and nested data, and is widely used for data exchange.
  • Use jsonlite for robust JSON import/export in R.
  • Always inspect the structure of imported JSON, especially for nested data.
  • Use simplifyVector = TRUE or flatten = TRUE to convert nested JSON to data frames when possible.
  • For reproducibility, document the structure of your JSON data and any transformations applied.

**Resource download links**

2.2.2.-Working-with-JSON-Data-in-R.zip