contact@a2zlearners.com

1.5. SAS PROCEDURES -> Equivalent in R

1.5.9. PROC IMPORT Excel in SAS vs R equivalent

1. Basic Excel Import: Sheet-Level Read

Capability SAS (PROC IMPORT) R (readxl)
Read first sheet Defaults to first sheet unless specified read_excel("file.xlsx")
Read named sheet sheet="AESI1" option in PROC IMPORT read_excel("file.xlsx", sheet = "AESI1")
With header getnames = yes col_names = TRUE (default)
Without header getnames = no col_names = FALSE
Output SAS dataset Tibble (data.frame)

Dummy xlsx file creation for the below examples

library(writexl)

# Create dummy data for AESI1 and AESI2 sheets
AESI1 <- data.frame(
  ID = 1:5,
  Name = c("Alice", "Bob", "Carol", "David", "Eve"),
  Value = c(10, 20, 30, 40, 50)
)

AESI2 <- data.frame(
  ID = 6:10,
  Name = c("Frank", "Grace", "Heidi", "Ivan", "Judy"),
  Value = c(60, 70, 80, 90, 100)
)

# Write to aesidata.xlsx with two sheets
write_xlsx(list(AESI1 = AESI1, AESI2 = AESI2), "aesidata.xlsx")

# Install openxlsx if not already installed
install.packages("openxlsx")

# Load the package
library(openxlsx)

# Assuming df is your dataframe from before
df <- data.frame(
  Name = c("José", "Anaïs", "François", "Björk", "München"),
  Age = c(28, 35, 24, 40, 30),
  City = c("São Paulo", "Lyon", "Montréal", "Reykjavík", "München"),
  stringsAsFactors = FALSE
)

# Export to Excel file
write.xlsx(df, "non_utf8_file.xlsx")

SAS Example

proc import datafile="aesidata.xlsx"
    out=work.aesi1
    dbms=xlsx replace;
    sheet="AESI1";
    getnames=yes;
run;

Explanation (SAS):

  • Uses PROC IMPORT to read the Excel file "aesidata.xlsx".
  • The sheet="AESI1" option specifies which sheet to import.
  • The imported data is saved as the SAS dataset work.aesi1.
  • getnames=yes tells SAS to use the first row as column names.
  • dbms=xlsx specifies the Excel file format.

R Example

library(readxl)
aesi1 <- read_excel("aesidata.xlsx", sheet = "AESI1")

Output:

> aesi1
# A tibble: 5 × 3
     ID Name  Value
  <dbl> <chr> <dbl>
1     1 Alice    10
2     2 Bob      20
3     3 Carol    30
4     4 David    40
5     5 Eve      50

Explanation (R):

  • Loads the readxl package for Excel import.
  • Reads the sheet named "AESI1" from the file "aesidata.xlsx".
  • Stores the result as a tibble (data frame) in the variable aesi1.
  • By default, the first row is used as column names (col_names = TRUE).

2. Read Specific Cell Ranges

Capability SAS R
Cell range import range="AESI1$B2:D10" range = "B2:D10"
Flexible range syntax Limited to string format Can also use cellranger::cell_limits()
Named ranges range="NamedRange" Supported via openxlsx::read.xlsx()

SAS Example

proc import datafile="aesidata.xlsx"
    out=work.aesi_rng
    dbms=xlsx replace;
    range="AESI1$B2:D10";
    getnames=no;
run;

Explanation (SAS):

  • Uses PROC IMPORT to read only the specified cell range from the "AESI1" sheet.
  • The range="AESI1$B2:D10" option tells SAS to import only cells B2 to D10.
  • The imported data is saved as the SAS dataset work.aesi_rng.
  • getnames=no means the first row in the range is not treated as column names.
  • Useful for importing a subset of data from a larger Excel sheet.

R Example

aesi_rng <- read_excel("aesidata.xlsx", range = "B2:D10")

Output:

> aesi_rng
# A tibble: 8 × 3
  Alice  `10` ...3 
  <chr> <dbl> <lgl>
1 Bob      20 NA   
2 Carol    30 NA   
3 David    40 NA   
4 Eve      50 NA   
5 NA       NA NA   
6 NA       NA NA   
7 NA       NA NA   
8 NA       NA NA  

Explanation (R):

  • Reads only the cell range B2:D10 from the Excel file "aesidata.xlsx".
  • Imports just the specified block, skipping any data outside this range.
  • Useful for extracting a specific table or section from a larger sheet.
  • The result is stored in the variable aesi_rng as a tibble or data frame.

3. Skipping Rows / Multi-Row Headers

Capability SAS R
Skip rows datarow = 3 (starts reading from row 3) skip = 2
Promote custom row No direct equivalent janitor::row_to_names()

SAS Example

proc import datafile="aesidata.xlsx"
    out=work.aesi_hdr
    dbms=xlsx replace;
    datarow=3;
    getnames=yes;
run;

Explanation (SAS):

  • Uses PROC IMPORT to read the Excel file "aesidata.xlsx".
  • The datarow=3 option tells SAS to start reading data from the third row (skipping the first two rows).
  • getnames=yes means SAS will use the first row it reads (row 3 in the sheet) as column names.
  • The imported data is saved as the SAS dataset work.aesi_hdr.
  • Useful for Excel files with multi-row headers or notes above the actual data.

R Example

library(janitor)
aesi_hdr <- read_excel("aesidata.xlsx", skip = 2) %>%
  row_to_names(row_number = 1)

Output:

> aesi_hdr
# A tibble: 2 × 3
    `3` Carol  `30`
  <dbl> <chr> <dbl>
1     4 David    40
2     5 Eve      50

Explanation (R):

  • Loads the janitor package for data cleaning utilities.
  • Reads the Excel file "aesidata.xlsx", skipping the first 2 rows (skip = 2).
  • Uses row_to_names(row_number = 1) to promote the first row of the imported data as column names.
  • Useful when the actual headers are not in the first row of the sheet, but further down.

4. Reading Multiple Sheets Automatically

Capability SAS R
List sheets No native support; use LIBNAME XLSX to inspect manually excel_sheets("file.xlsx")
Read all sheets Requires macro with hardcoded or known sheet names lapply() or purrr::map() + read_excel()
Combine sheets Manual PROC APPEND or stacking DATA steps map_dfr() to stack + mutate(sheet = .x)

SAS Example

libname aelib xlsx "aesidata.xlsx";

data aesi1; set aelib.AESI1; run;
data aesi2; set aelib.AESI2; run;

libname aelib clear;

Explanation (SAS):

  • Assigns the Excel file "aesidata.xlsx" to the SAS library reference aelib using the XLSX engine.
  • Reads each sheet (e.g., AESI1, AESI2) as a SAS dataset using a data ...; set ...; run; step.
  • Each sheet must be referenced explicitly; SAS does not natively loop through all sheets.
  • After importing, clears the library reference with libname aelib clear;.
  • Useful for importing multiple known sheets from an Excel file into SAS datasets.

R Example

library(readxl)
sheets <- excel_sheets("aesidata.xlsx")
aesi_list <- setNames(lapply(sheets, function(s) read_excel("aesidata.xlsx", sheet = s)), sheets)

Output:

> aesi_list
$AESI1
# A tibble: 5 × 3
     ID Name  Value
  <dbl> <chr> <dbl>
1     1 Alice    10
2     2 Bob      20
3     3 Carol    30
4     4 David    40
5     5 Eve      50

$AESI2
# A tibble: 5 × 3
     ID Name  Value
  <dbl> <chr> <dbl>
1     6 Frank    60
2     7 Grace    70
3     8 Heidi    80
4     9 Ivan     90
5    10 Judy    100

Explanation (R):

  • Loads the readxl package for Excel reading functions.
  • Uses excel_sheets("aesidata.xlsx") to get a vector of all sheet names in the Excel file.
  • Applies read_excel() to each sheet name using lapply(), reading each sheet into a list element.
  • setNames(..., sheets) names each list element after its corresponding sheet.
  • The result, aesi_list, is a named list where each element is a data frame for one sheet.
  • Useful for batch-importing all sheets from an Excel file into R for further processing.

5. Type Control & Data Validation

Capability SAS R
Specify column types Not supported in PROC IMPORT; inferred only col_types = c("text", "numeric")
Data validation Use PROC CONTENTS, PROC COMPARE assertive, validate, or pointblank

SAS Example Post-import validation:

proc contents data=work.aesi1; run;
proc compare base=expected compare=work.aesi1; run;

Explanation (SAS):

  • proc contents displays metadata and variable types for the imported dataset work.aesi1.
  • proc compare compares the imported dataset to an expected dataset, checking for differences in data and structure.
  • These steps are used for validating the import and ensuring data integrity after reading from Excel.

R Example

read_excel("aesidata.xlsx", col_types = c("text", "text", "numeric"))

Output:

> read_excel("aesidata.xlsx", col_types = c("text", "text", "numeric"))
# A tibble: 5 × 3
  ID    Name  Value
  <chr> <chr> <dbl>
1 1     Alice    10
2 2     Bob      20
3 3     Carol    30
4 4     David    40
5 5     Eve      50

Explanation (R):

  • Calls read_excel() to import data from "aesidata.xlsx".
  • The col_types argument specifies the type for each column:
    • The first two columns are read as text (character).
    • The third column is read as numeric.
  • Ensures that columns are imported with the correct data types, preventing unwanted type guessing.
  • Useful for data validation and avoiding issues with mixed or ambiguous column types.

6. Metadata-Driven Import Automation

Control Sheet Example (in Excel)

sheet range col_types
AESI1 A1:C10 text,text,numeric
AESI2 A1:C10 text,text,numeric

SAS Macro Equivalent

%macro import_excel(sheet=, range=);
  proc import datafile="aesidata.xlsx"
      out=work.&sheet
      dbms=xlsx replace;
      sheet="&sheet";
      range="&range";
      getnames=yes;
  run;
%mend;

%import_excel(sheet=AESI1, range=AESI1$A1:C10);
%import_excel(sheet=AESI2, range=AESI2$A1:C10);

Explanation (SAS):

  • Defines a macro %import_excel to automate importing different sheets and ranges from the Excel file.
  • The macro takes sheet and range as parameters and runs PROC IMPORT with those values.
  • getnames=yes uses the first row in the range as column names.
  • Calls the macro for each sheet/range combination to import multiple datasets in a consistent, automated way.
  • Useful for metadata-driven or batch import scenarios in SAS.

R Example (using pmap from purrr)

library(readxl)
library(purrr)

# Dummy control/config sheet
config <- data.frame(
  sheet = c("AESI1", "AESI2"),
  range = c("A1:C6", "A1:C6"),
  col_types = c("text,text,numeric", "text,text,numeric")
)
write_xlsx(config, "config.xlsx")

control <- read_excel("config.xlsx")
data <- pmap(control, function(sheet, range, col_types) {
  read_excel("aesidata.xlsx", sheet = sheet, range = range,
             col_types = strsplit(col_types, ",")[[1]])
})

Output:

> data
[[1]]
# A tibble: 5 × 3
  ID    Name  Value
  <chr> <chr> <dbl>
1 1     Alice    10
2 2     Bob      20
3 3     Carol    30
4 4     David    40
5 5     Eve      50

[[2]]
# A tibble: 5 × 3
  ID    Name  Value
  <chr> <chr> <dbl>
1 6     Frank    60
2 7     Grace    70
3 8     Heidi    80
4 9     Ivan     90
5 10    Judy    100

Explanation (R):

  • Loads the readxl and purrr packages for Excel import and functional programming.
  • Reads a control/configuration sheet (config.xlsx) that specifies which sheets, ranges, and column types to import.
  • Uses pmap() to iterate over each row in the control sheet:
    • For each row, extracts the sheet, range, and col_types values.
    • Calls read_excel() with the specified sheet, range, and column types (splitting the comma-separated string into a vector).
  • The result, data, is a list of data frames, each imported according to the control sheet's instructions.
  • Enables flexible, metadata-driven, and automated Excel import workflows.

7.Reading Excel Files with Non-Standard Encodings

  • Excel files may contain special or non-English characters (e.g., accented letters, Cyrillic, Asian scripts) that require correct encoding to display properly.
  • If encoding is not handled, you may see garbled or incorrect text after import.

In R:

  • The readxl package does not have an explicit encoding argument and assumes UTF-8 or system default.
  • If you encounter encoding issues, try:
    • Using openxlsx::read.xlsx() which allows specifying the encoding argument.
    • Converting character columns after import with iconv().
  • Example:
    # 1. Set locale
    Sys.setlocale("LC_ALL", "en_US.UTF-8") # or your relevant UTF-8 locale
    
    # 2. Use readxl to read .xlsx
    library(readxl)
    df <- read_excel("non_utf8_file.xlsx")
    print(df)
    

Output:

> print(df)
# A tibble: 5 × 3
  Name       Age City     
  <chr>    <dbl> <chr>    
1 José        28 São Paulo
2 Anaïs       35 Lyon     
3 François    24 Montréal 
4 Björk       40 Reykjavík
5 München     30 München  

In SAS:

  • Use the ENCODING= option in PROC IMPORT to specify the file's character encoding.
  • This ensures SAS reads and displays non-standard characters correctly.
  • Example:
    proc import datafile="non_utf8_file.xlsx"
        out=work.data
        dbms=xlsx replace;
        encoding="latin1";
    run;
    
  • If you omit encoding=, SAS uses the session default, which may not match the file's encoding.

**Resource download links**

1.5.9.-PROC-IMPORT-Excel-in-SAS-vs-R-equivalent.zip