1.4. Migrating from SAS to R: A Skill Conversion Guide
1.4.8. R Environment vs SAS Work Library
This guide compares SAS and R approaches for managing objects in memory, covering essential techniques from basic object deletion to advanced environment management strategies.
1. Basic Memory Management
| Capability | SAS | R |
|---|---|---|
| List objects | PROC CONTENTS data=_all_ nods; |
ls() |
| Remove all objects | PROC DATASETS lib=work kill; |
rm(list=ls()) |
| Check memory usage | PROC SETINIT; |
gc() or mem_used() |
| Check dataset size | PROC CONTENTS data=mydataset; |
object.size(mydf) |
| Free unused memory | Automatic | gc() |
SAS Example
/* List all datasets in the WORK library */
proc contents data=_all_ nods;
run;
/* Remove all datasets from WORK library */
proc datasets lib=work kill nolist;
quit;
Explanation (SAS):
PROC CONTENTS data=_all_provides information about all datasets in the default WORK librarynodssuppresses detailed dataset information, showing just a listPROC DATASETS lib=work killremoves all datasets from the WORK librarynolistsuppresses listing of deleted datasets- SAS automatically handles memory management and cleanup
- The WORK library is cleared automatically when the SAS session ends
R Example
# List all objects in the global environment
ls()
# Check memory usage
gc()
library(pryr)
mem_used()
# Remove all objects from the environment
rm(list = ls())
# Force garbage collection
gc()
Explanation (R):
ls()returns a character vector of all object names in the current environmentgc()triggers garbage collection and reports memory usage statisticsmem_used()from the pryr package shows current memory consumptionrm(list = ls())removes all objects from the current environment- Unlike SAS, R requires explicit cleanup of the environment
- Removing objects doesn't immediately free memory;
gc()helps reclaim unused memory
Input Environment:
| Object Name | Type | Size |
|---|---|---|
| patient_data | data.frame | 2.3 MB |
| lab_results | data.frame | 5.1 MB |
| demographic_model | lm (linear model) | 0.7 MB |
| plot_function | function | 0.1 MB |
Expected Output After Clearing:
> ls()
character(0)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 267456 14.3 665050 35.6 460950 24.7
Vcells 3626190 27.7 8388608 64.0 8313713 63.5
2. Selective Object Management
| Capability | SAS | R |
|---|---|---|
| Remove specific objects | PROC DATASETS; delete dataset1 dataset2; |
rm(object1, object2) |
| Remove objects by pattern | Requires macro | rm(list=ls(pattern="temp")) |
| Keep specific objects | Requires macro | rm(list=setdiff(ls(), c("obj1","obj2"))) |
| Save before removing | Use PROC COPY |
Use save() then rm() |
SAS Example
/* Delete specific datasets */
proc datasets library=work nolist;
delete patient_data lab_results;
quit;
/* Using a macro to keep only specific datasets */
%macro keep_only(datasets);
proc sql noprint;
select memname into :all_ds separated by ' '
from dictionary.tables
where libname = 'WORK' and memname not in (&datasets);
quit;
%if &all_ds ne %then %do;
proc datasets library=work nolist;
delete &all_ds;
quit;
%end;
%mend;
%keep_only('DEMOGRAPHIC_DATA','VITAL_SIGNS');
Explanation (SAS):
PROC DATASETSwith thedeletestatement removes specified datasets- SAS has no built-in pattern matching for deletion
- The
keep_onlymacro uses dictionary tables to identify all datasets except those specified - Selective deletion in SAS often requires custom macro programming
- Datasets can be copied to another library before deletion using
PROC COPY
R Example
# Remove specific objects
rm(patient_data, lab_results)
# Remove objects matching a pattern
rm(list = ls(pattern = "temp_"))
# Keep only specific objects
keep_objects <- c("demographic_data", "vital_signs")
rm(list = setdiff(ls(), keep_objects))
# Save objects before removing them
save(demographic_data, file = "demographic_data.RData")
rm(demographic_data)
Explanation (R):
rm()directly removes specified objectsls(pattern = "temp_")finds all objects with names containing "temp_"setdiff(ls(), keep_objects)creates a list of all objects except those to keepsave()preserves objects to disk before removal- R provides flexible options for selective object management
- Pattern matching allows for more dynamic object selection than SAS
Input Environment:
| Object Name | Type |
|---|---|
| patient_data | data.frame |
| lab_results | data.frame |
| temp_data1 | data.frame |
| temp_data2 | data.frame |
| demographic_data | data.frame |
| vital_signs | data.frame |
Expected Output After Keeping Only Specific Objects:
> ls()
[1] "demographic_data" "vital_signs"
3. Object Inspection Before Removal
| Capability | SAS | R |
|---|---|---|
| List object details | PROC CONTENTS data=_all_; |
lapply(mget(ls()), class) |
| Find large objects | PROC SQL with dictionary tables |
sort(sapply(ls(), function(x) object.size(get(x)))) |
| Identify memory hogs | Manual tracking | lobstr::obj_size() for nested objects |
| Check object type | PROC CONTENTS |
class() or typeof() |
SAS Example
/* List all datasets with details */
proc contents data=_all_;
run;
/* Find large datasets in WORK library */
proc sql;
select memname, nobs, filesize/1048576 as size_mb
from dictionary.tables
where libname = 'WORK'
order by filesize desc;
quit;
Explanation (SAS):
PROC CONTENTS data=_all_displays detailed information about all datasetsdictionary.tablesprovides metadata about datasets including size and number of observations- Size information helps identify memory-intensive datasets
- SAS focuses on dataset-level information rather than all object types
- No direct way to see nested object sizes or complex structures
R Example
# List all objects with their classes
object_classes <- sapply(ls(), function(x) class(get(x)))
object_classes
# Find objects by size (ascending)
object_sizes <- sort(sapply(ls(), function(x) object.size(get(x))))
print(object_sizes, units = "auto")
# Detailed size analysis of nested objects
library(lobstr)
obj_sizes <- sapply(ls(), function(x) as.numeric(obj_size(get(x))))
data.frame(
object = names(obj_sizes),
size_mb = obj_sizes / 1024^2,
row.names = NULL
) %>% arrange(desc(size_mb))
Explanation (R):
sapply(ls(), function(x) class(get(x)))returns the class of each objectobject.size()reports the memory usage of objectslobstr::obj_size()provides more accurate size information for complex objects- Size information helps identify memory-intensive objects before removal
- R provides detailed memory information for all object types, not just data frames
- Understanding object relationships helps manage memory more effectively
Input Environment:
| Object Name | Type | Size |
|---|---|---|
| small_vector | numeric vector | 80 bytes |
| medium_df | data.frame | 1.2 MB |
| large_df | data.frame | 45.3 MB |
| huge_model | randomForest | 120 MB |
Expected Output of Size Analysis:
> print(object_sizes, units = "auto")
small_vector medium_df large_df huge_model
80 B 1.2 MB 45.3 MB 120.0 MB
> data.frame(object = names(obj_sizes), size_mb = obj_sizes / 1024^2) %>% arrange(desc(size_mb))
object size_mb
1 huge_model 120.00000
2 large_df 45.30000
3 medium_df 1.20000
4 small_vector 0.00008
4. Working with Multiple Environments
| Capability | SAS | R |
|---|---|---|
| Multiple libraries | LIBNAME statements |
new.env() |
| Copy between environments | PROC COPY |
assign(), get() |
| List objects in specific env | PROC CONTENTS data=lib._all_; |
ls(envir=my_env) |
| Remove from specific env | PROC DATASETS lib=templib; |
rm(list=ls(envir=my_env), envir=my_env) |
| Global vs local objects | Library references | Global, local, and package environments |
SAS Example
/* Create temporary library */
libname templib temp;
/* Copy datasets to temporary library */
proc copy in=work out=templib;
select patient_data lab_results;
run;
/* List datasets in temporary library */
proc contents data=templib._all_ nods;
run;
/* Remove datasets from temporary library */
proc datasets lib=templib kill nolist;
quit;
/* Remove temporary library reference */
libname templib clear;
Explanation (SAS):
LIBNAME templib tempcreates a temporary library referencePROC COPYmoves datasets between librariesPROC CONTENTS data=templib._all_lists only datasets in the specific libraryPROC DATASETS lib=templib killremoves only datasets in that libraryLIBNAME templib clearremoves the library reference- SAS manages data through library references rather than environments
R Example
# Create a new environment
temp_env <- new.env()
# Copy objects to the new environment
temp_env$patient_data <- patient_data
temp_env$lab_results <- lab_results
# Alternative method to copy objects
assign("patient_data", patient_data, envir = temp_env)
# List objects in the specific environment
ls(envir = temp_env)
# Access an object from the environment
temp_patient <- get("patient_data", envir = temp_env)
# Remove objects from the specific environment
rm(list = ls(envir = temp_env), envir = temp_env)
# Check if environment is empty
ls(envir = temp_env)
Explanation (R):
new.env()creates a separate environment for storing objects- Objects can be assigned directly using
env$namenotation assign()function provides more control over object assignmentls(envir = temp_env)lists only objects in the specified environmentget()retrieves objects from a specific environmentrm(list = ls(envir = temp_env), envir = temp_env)removes all objects from the environment- Environments provide more flexible isolation than SAS libraries
Input Global Environment:
| Object Name | Type |
|---|---|
| patient_data | data.frame |
| lab_results | data.frame |
| model_fit | lm |
Expected Output After Environment Operations:
# Global environment remains unchanged
> ls()
[1] "lab_results" "model_fit" "patient_data" "temp_env"
# New environment contains copied objects
> ls(envir = temp_env)
[1] "lab_results" "patient_data"
# After clearing the environment
> ls(envir = temp_env)
character(0)
5. Beyond Basics: Environment Management Automation
| Capability | SAS | R |
|---|---|---|
| Environment snapshots | Custom macro | sessionInfo(), custom functions |
| Object tracking | No built-in support | Custom tracking functions |
| Memory monitoring | PROC SETINIT |
gc(), pryr::mem_used() |
| Conditional cleanup | Macro with conditions | Functions with conditional logic |
| Session management | PROC DATASETS, macros |
savehistory(), loadhistory() |
SAS Example
/* Advanced macro for conditional dataset management */
%macro manage_datasets(pattern=, action=DELETE, condition=);
proc sql noprint;
select memname into :ds_list separated by ' '
from dictionary.tables
where libname = 'WORK'
%if %length(&pattern) > 0 %then %do;
and memname like "&pattern"
%end;
%if %length(&condition) > 0 %then %do;
and &condition
%end;
;
quit;
%if %upcase(&action) = DELETE %then %do;
%if %length(&ds_list) > 0 %then %do;
proc datasets lib=work nolist;
delete &ds_list;
quit;
%put Deleted datasets: &ds_list;
%end;
%else %do;
%put No datasets matched the criteria for deletion.;
%end;
%end;
%else %if %upcase(&action) = LIST %then %do;
%put Matching datasets: &ds_list;
%end;
%mend;
/* Example usage */
%manage_datasets(pattern=TEMP\_%);
%manage_datasets(condition=nobs > 1000);
Explanation (SAS):
- Creates a flexible macro for dataset management based on patterns or conditions
- Uses
dictionary.tablesfor metadata-based filtering - The macro supports multiple actions (DELETE, LIST)
- Conditional logic allows for targeted operations
- Provides feedback through the SAS log
- SAS macros enable automation of routine cleanup tasks
R Example
# Advanced environment management functions
library(pryr)
# Create an environment tracker
env_tracker <- function() {
# Initial snapshot of objects
initial <- list(
objects = ls(envir = .GlobalEnv),
timestamp = Sys.time(),
mem_used = mem_used()
)
# Function to check what's changed
check_changes <- function() {
current <- ls(envir = .GlobalEnv)
new_objects <- setdiff(current, initial$objects)
removed_objects <- setdiff(initial$objects, current)
mem_diff <- mem_used() - initial$mem_used
list(
new_objects = new_objects,
removed_objects = removed_objects,
time_elapsed = difftime(Sys.time(), initial$timestamp, units = "mins"),
memory_change = mem_diff
)
}
# Function for conditional cleanup
cleanup <- function(pattern = NULL, type = NULL, min_size = NULL) {
objects <- ls(envir = .GlobalEnv)
# Filter by pattern
if (!is.null(pattern)) {
objects <- grep(pattern, objects, value = TRUE)
}
# Filter by type
if (!is.null(type)) {
objects <- objects[sapply(objects, function(x)
inherits(get(x, envir = .GlobalEnv), type))]
}
# Filter by size
if (!is.null(min_size)) {
objects <- objects[sapply(objects, function(x)
object.size(get(x, envir = .GlobalEnv)) > min_size)]
}
if (length(objects) > 0) {
rm(list = objects, envir = .GlobalEnv)
cat("Removed", length(objects), "objects:",
paste(objects, collapse = ", "), "\n")
} else {
cat("No objects matched the criteria.\n")
}
}
# Return functions
list(
check_changes = check_changes,
cleanup = cleanup,
reset = function() {
initial$objects <- ls(envir = .GlobalEnv)
initial$timestamp <- Sys.time()
initial$mem_used <- mem_used()
cat("Tracker reset.\n")
}
)
}
# Create a tracker
tracker <- env_tracker()
# Example usage
# Create some objects
temp_data1 <- matrix(rnorm(10000), 100, 100)
temp_data2 <- data.frame(x = 1:1000, y = rnorm(1000))
model_result <- lm(y ~ x, data = temp_data2)
# Check what's changed
changes <- tracker$check_changes()
print(changes)
# Cleanup temporary objects
tracker$cleanup(pattern = "^temp_")
# Cleanup by object type
tracker$cleanup(type = "lm")
Explanation (R):
- Creates a sophisticated environment tracking system using closure functions
- Maintains an initial snapshot of the environment
- Tracks new objects, removed objects, time elapsed, and memory changes
- Provides conditional cleanup based on name patterns, object types, or sizes
- Environment monitoring helps identify memory leaks and track object creation
- More flexible than SAS's dataset-centric approach
- Enables data scientists to manage complex R sessions with many object types
Input Environment:
| Object Name | Type | Size |
|---|---|---|
| original_data | data.frame | 1.5 MB |
| temp_data1 | matrix | 0.8 MB |
| temp_data2 | data.frame | 0.2 MB |
| model_result | lm | 0.1 MB |
Expected Output of Tracking and Cleanup:
> changes <- tracker$check_changes()
> print(changes)
$new_objects
[1] "temp_data1" "temp_data2" "model_result"
$removed_objects
character(0)
$time_elapsed
Time difference of 0.25 mins
$memory_change
1.1 MB
> tracker$cleanup(pattern = "^temp_")
Removed 2 objects: temp_data1, temp_data2
> tracker$cleanup(type = "lm")
Removed 1 objects: model_result
> ls()
[1] "original_data" "tracker"
6. Best Practices for R Environment Management
Plan Your Memory Usage
- Estimate memory requirements before loading large datasets
- Use
object.size()andgc()to monitor memory consumption - Consider using packages like
data.tablefor memory-efficient operations
Clean Up Regularly
- Remove temporary objects when no longer needed
- Use naming conventions for temporary objects (e.g., prefix with "temp_")
- Create cleanup functions for routine operations
Structure Your Environment
- Use separate environments for different analysis components
- Consider package development for complex projects with many functions
- Use R projects to isolate different analyses
Optimize Large Data Handling
- Process large datasets in chunks when possible
- Use connection objects for streaming data
- Consider database connections instead of loading everything into memory
- Leverage packages designed for big data like
disk.frameorarrow
Save and Restore Sessions Strategically
- Save only essential objects, not entire environments
- Document dependencies between objects
- Use
.Rprofilefor environment setup, not data loading
Automate Environment Management
- Create helper functions for common tasks
- Implement environment snapshots at critical points
- Track memory usage over time during long analyses
Document Your Environment
- Use
sessionInfo()to record package versions - Save package dependencies with
renvorpackrat - Include environment setup in analysis documentation
- Use
**Resource download links**
1.4.8.-R-Environment-vs-SAS-Work-Library.zip