1.5. SAS PROCEDURES -> Equivalent in R
1.5.9. PROC IMPORT Excel in SAS vs R equivalent
1. Basic Excel Import: Sheet-Level Read
| Capability | SAS (PROC IMPORT) | R (readxl) |
|---|---|---|
| Read first sheet | Defaults to first sheet unless specified | read_excel("file.xlsx") |
| Read named sheet | sheet="AESI1" option in PROC IMPORT |
read_excel("file.xlsx", sheet = "AESI1") |
| With header | getnames = yes |
col_names = TRUE (default) |
| Without header | getnames = no |
col_names = FALSE |
| Output | SAS dataset | Tibble (data.frame) |
Dummy xlsx file creation for the below examples
library(writexl)
# Create dummy data for AESI1 and AESI2 sheets
AESI1 <- data.frame(
ID = 1:5,
Name = c("Alice", "Bob", "Carol", "David", "Eve"),
Value = c(10, 20, 30, 40, 50)
)
AESI2 <- data.frame(
ID = 6:10,
Name = c("Frank", "Grace", "Heidi", "Ivan", "Judy"),
Value = c(60, 70, 80, 90, 100)
)
# Write to aesidata.xlsx with two sheets
write_xlsx(list(AESI1 = AESI1, AESI2 = AESI2), "aesidata.xlsx")
# Install openxlsx if not already installed
install.packages("openxlsx")
# Load the package
library(openxlsx)
# Assuming df is your dataframe from before
df <- data.frame(
Name = c("José", "Anaïs", "François", "Björk", "München"),
Age = c(28, 35, 24, 40, 30),
City = c("São Paulo", "Lyon", "Montréal", "Reykjavík", "München"),
stringsAsFactors = FALSE
)
# Export to Excel file
write.xlsx(df, "non_utf8_file.xlsx")
SAS Example
proc import datafile="aesidata.xlsx"
out=work.aesi1
dbms=xlsx replace;
sheet="AESI1";
getnames=yes;
run;
Explanation (SAS):
- Uses
PROC IMPORTto read the Excel file"aesidata.xlsx". - The
sheet="AESI1"option specifies which sheet to import. - The imported data is saved as the SAS dataset
work.aesi1. getnames=yestells SAS to use the first row as column names.dbms=xlsxspecifies the Excel file format.
R Example
library(readxl)
aesi1 <- read_excel("aesidata.xlsx", sheet = "AESI1")
Output:
> aesi1
# A tibble: 5 × 3
ID Name Value
<dbl> <chr> <dbl>
1 1 Alice 10
2 2 Bob 20
3 3 Carol 30
4 4 David 40
5 5 Eve 50
Explanation (R):
- Loads the
readxlpackage for Excel import. - Reads the sheet named
"AESI1"from the file"aesidata.xlsx". - Stores the result as a tibble (data frame) in the variable
aesi1. - By default, the first row is used as column names (
col_names = TRUE).
2. Read Specific Cell Ranges
| Capability | SAS | R |
|---|---|---|
| Cell range import | range="AESI1$B2:D10" |
range = "B2:D10" |
| Flexible range syntax | Limited to string format | Can also use cellranger::cell_limits() |
| Named ranges | range="NamedRange" |
Supported via openxlsx::read.xlsx() |
SAS Example
proc import datafile="aesidata.xlsx"
out=work.aesi_rng
dbms=xlsx replace;
range="AESI1$B2:D10";
getnames=no;
run;
Explanation (SAS):
- Uses
PROC IMPORTto read only the specified cell range from the "AESI1" sheet. - The
range="AESI1$B2:D10"option tells SAS to import only cells B2 to D10. - The imported data is saved as the SAS dataset
work.aesi_rng. getnames=nomeans the first row in the range is not treated as column names.- Useful for importing a subset of data from a larger Excel sheet.
R Example
aesi_rng <- read_excel("aesidata.xlsx", range = "B2:D10")
Output:
> aesi_rng
# A tibble: 8 × 3
Alice `10` ...3
<chr> <dbl> <lgl>
1 Bob 20 NA
2 Carol 30 NA
3 David 40 NA
4 Eve 50 NA
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
Explanation (R):
- Reads only the cell range B2:D10 from the Excel file "aesidata.xlsx".
- Imports just the specified block, skipping any data outside this range.
- Useful for extracting a specific table or section from a larger sheet.
- The result is stored in the variable
aesi_rngas a tibble or data frame.
3. Skipping Rows / Multi-Row Headers
| Capability | SAS | R |
|---|---|---|
| Skip rows | datarow = 3 (starts reading from row 3) |
skip = 2 |
| Promote custom row | No direct equivalent | janitor::row_to_names() |
SAS Example
proc import datafile="aesidata.xlsx"
out=work.aesi_hdr
dbms=xlsx replace;
datarow=3;
getnames=yes;
run;
Explanation (SAS):
- Uses
PROC IMPORTto read the Excel file "aesidata.xlsx". - The
datarow=3option tells SAS to start reading data from the third row (skipping the first two rows). getnames=yesmeans SAS will use the first row it reads (row 3 in the sheet) as column names.- The imported data is saved as the SAS dataset
work.aesi_hdr. - Useful for Excel files with multi-row headers or notes above the actual data.
R Example
library(janitor)
aesi_hdr <- read_excel("aesidata.xlsx", skip = 2) %>%
row_to_names(row_number = 1)
Output:
> aesi_hdr
# A tibble: 2 × 3
`3` Carol `30`
<dbl> <chr> <dbl>
1 4 David 40
2 5 Eve 50
Explanation (R):
- Loads the
janitorpackage for data cleaning utilities. - Reads the Excel file "aesidata.xlsx", skipping the first 2 rows (
skip = 2). - Uses
row_to_names(row_number = 1)to promote the first row of the imported data as column names. - Useful when the actual headers are not in the first row of the sheet, but further down.
4. Reading Multiple Sheets Automatically
| Capability | SAS | R |
|---|---|---|
| List sheets | No native support; use LIBNAME XLSX to inspect manually | excel_sheets("file.xlsx") |
| Read all sheets | Requires macro with hardcoded or known sheet names | lapply() or purrr::map() + read_excel() |
| Combine sheets | Manual PROC APPEND or stacking DATA steps |
map_dfr() to stack + mutate(sheet = .x) |
SAS Example
libname aelib xlsx "aesidata.xlsx";
data aesi1; set aelib.AESI1; run;
data aesi2; set aelib.AESI2; run;
libname aelib clear;
Explanation (SAS):
- Assigns the Excel file "aesidata.xlsx" to the SAS library reference
aelibusing the XLSX engine. - Reads each sheet (e.g., AESI1, AESI2) as a SAS dataset using a
data ...; set ...; run;step. - Each sheet must be referenced explicitly; SAS does not natively loop through all sheets.
- After importing, clears the library reference with
libname aelib clear;. - Useful for importing multiple known sheets from an Excel file into SAS datasets.
R Example
library(readxl)
sheets <- excel_sheets("aesidata.xlsx")
aesi_list <- setNames(lapply(sheets, function(s) read_excel("aesidata.xlsx", sheet = s)), sheets)
Output:
> aesi_list
$AESI1
# A tibble: 5 × 3
ID Name Value
<dbl> <chr> <dbl>
1 1 Alice 10
2 2 Bob 20
3 3 Carol 30
4 4 David 40
5 5 Eve 50
$AESI2
# A tibble: 5 × 3
ID Name Value
<dbl> <chr> <dbl>
1 6 Frank 60
2 7 Grace 70
3 8 Heidi 80
4 9 Ivan 90
5 10 Judy 100
Explanation (R):
- Loads the
readxlpackage for Excel reading functions. - Uses
excel_sheets("aesidata.xlsx")to get a vector of all sheet names in the Excel file. - Applies
read_excel()to each sheet name usinglapply(), reading each sheet into a list element. setNames(..., sheets)names each list element after its corresponding sheet.- The result,
aesi_list, is a named list where each element is a data frame for one sheet. - Useful for batch-importing all sheets from an Excel file into R for further processing.
5. Type Control & Data Validation
| Capability | SAS | R |
|---|---|---|
| Specify column types | Not supported in PROC IMPORT; inferred only |
col_types = c("text", "numeric") |
| Data validation | Use PROC CONTENTS, PROC COMPARE |
assertive, validate, or pointblank |
SAS Example Post-import validation:
proc contents data=work.aesi1; run;
proc compare base=expected compare=work.aesi1; run;
Explanation (SAS):
proc contentsdisplays metadata and variable types for the imported datasetwork.aesi1.proc comparecompares the imported dataset to an expected dataset, checking for differences in data and structure.- These steps are used for validating the import and ensuring data integrity after reading from Excel.
R Example
read_excel("aesidata.xlsx", col_types = c("text", "text", "numeric"))
Output:
> read_excel("aesidata.xlsx", col_types = c("text", "text", "numeric"))
# A tibble: 5 × 3
ID Name Value
<chr> <chr> <dbl>
1 1 Alice 10
2 2 Bob 20
3 3 Carol 30
4 4 David 40
5 5 Eve 50
Explanation (R):
- Calls
read_excel()to import data from "aesidata.xlsx". - The
col_typesargument specifies the type for each column:- The first two columns are read as text (character).
- The third column is read as numeric.
- Ensures that columns are imported with the correct data types, preventing unwanted type guessing.
- Useful for data validation and avoiding issues with mixed or ambiguous column types.
6. Metadata-Driven Import Automation
Control Sheet Example (in Excel)
| sheet | range | col_types |
|---|---|---|
| AESI1 | A1:C10 | text,text,numeric |
| AESI2 | A1:C10 | text,text,numeric |
SAS Macro Equivalent
%macro import_excel(sheet=, range=);
proc import datafile="aesidata.xlsx"
out=work.&sheet
dbms=xlsx replace;
sheet="&sheet";
range="&range";
getnames=yes;
run;
%mend;
%import_excel(sheet=AESI1, range=AESI1$A1:C10);
%import_excel(sheet=AESI2, range=AESI2$A1:C10);
Explanation (SAS):
- Defines a macro
%import_excelto automate importing different sheets and ranges from the Excel file. - The macro takes
sheetandrangeas parameters and runsPROC IMPORTwith those values. getnames=yesuses the first row in the range as column names.- Calls the macro for each sheet/range combination to import multiple datasets in a consistent, automated way.
- Useful for metadata-driven or batch import scenarios in SAS.
R Example (using pmap from purrr)
library(readxl)
library(purrr)
# Dummy control/config sheet
config <- data.frame(
sheet = c("AESI1", "AESI2"),
range = c("A1:C6", "A1:C6"),
col_types = c("text,text,numeric", "text,text,numeric")
)
write_xlsx(config, "config.xlsx")
control <- read_excel("config.xlsx")
data <- pmap(control, function(sheet, range, col_types) {
read_excel("aesidata.xlsx", sheet = sheet, range = range,
col_types = strsplit(col_types, ",")[[1]])
})
Output:
> data
[[1]]
# A tibble: 5 × 3
ID Name Value
<chr> <chr> <dbl>
1 1 Alice 10
2 2 Bob 20
3 3 Carol 30
4 4 David 40
5 5 Eve 50
[[2]]
# A tibble: 5 × 3
ID Name Value
<chr> <chr> <dbl>
1 6 Frank 60
2 7 Grace 70
3 8 Heidi 80
4 9 Ivan 90
5 10 Judy 100
Explanation (R):
- Loads the
readxlandpurrrpackages for Excel import and functional programming. - Reads a control/configuration sheet (
config.xlsx) that specifies which sheets, ranges, and column types to import. - Uses
pmap()to iterate over each row in the control sheet:- For each row, extracts the
sheet,range, andcol_typesvalues. - Calls
read_excel()with the specified sheet, range, and column types (splitting the comma-separated string into a vector).
- For each row, extracts the
- The result,
data, is a list of data frames, each imported according to the control sheet's instructions. - Enables flexible, metadata-driven, and automated Excel import workflows.
7.Reading Excel Files with Non-Standard Encodings
- Excel files may contain special or non-English characters (e.g., accented letters, Cyrillic, Asian scripts) that require correct encoding to display properly.
- If encoding is not handled, you may see garbled or incorrect text after import.
In R:
- The
readxlpackage does not have an explicit encoding argument and assumes UTF-8 or system default. - If you encounter encoding issues, try:
- Using
openxlsx::read.xlsx()which allows specifying theencodingargument. - Converting character columns after import with
iconv().
- Using
- Example:
# 1. Set locale Sys.setlocale("LC_ALL", "en_US.UTF-8") # or your relevant UTF-8 locale # 2. Use readxl to read .xlsx library(readxl) df <- read_excel("non_utf8_file.xlsx") print(df)
Output:
> print(df)
# A tibble: 5 × 3
Name Age City
<chr> <dbl> <chr>
1 José 28 São Paulo
2 Anaïs 35 Lyon
3 François 24 Montréal
4 Björk 40 Reykjavík
5 München 30 München
In SAS:
- Use the
ENCODING=option inPROC IMPORTto specify the file's character encoding. - This ensures SAS reads and displays non-standard characters correctly.
- Example:
proc import datafile="non_utf8_file.xlsx" out=work.data dbms=xlsx replace; encoding="latin1"; run; - If you omit
encoding=, SAS uses the session default, which may not match the file's encoding.
**Resource download links**
1.5.9.-PROC-IMPORT-Excel-in-SAS-vs-R-equivalent.zip
⁂