contact@a2zlearners.com

1.4. Migrating from SAS to R: A Skill Conversion Guide

1.4.8. R Environment vs SAS Work Library

This guide compares SAS and R approaches for managing objects in memory, covering essential techniques from basic object deletion to advanced environment management strategies.

1. Basic Memory Management

Capability SAS R
List objects PROC CONTENTS data=_all_ nods; ls()
Remove all objects PROC DATASETS lib=work kill; rm(list=ls())
Check memory usage PROC SETINIT; gc() or mem_used()
Check dataset size PROC CONTENTS data=mydataset; object.size(mydf)
Free unused memory Automatic gc()

SAS Example

/* List all datasets in the WORK library */
proc contents data=_all_ nods;
run;

/* Remove all datasets from WORK library */
proc datasets lib=work kill nolist;
quit;

Explanation (SAS):

  • PROC CONTENTS data=_all_ provides information about all datasets in the default WORK library
  • nods suppresses detailed dataset information, showing just a list
  • PROC DATASETS lib=work kill removes all datasets from the WORK library
  • nolist suppresses listing of deleted datasets
  • SAS automatically handles memory management and cleanup
  • The WORK library is cleared automatically when the SAS session ends

R Example

# List all objects in the global environment
ls()

# Check memory usage
gc()
library(pryr)
mem_used()

# Remove all objects from the environment
rm(list = ls())

# Force garbage collection
gc()

Explanation (R):

  • ls() returns a character vector of all object names in the current environment
  • gc() triggers garbage collection and reports memory usage statistics
  • mem_used() from the pryr package shows current memory consumption
  • rm(list = ls()) removes all objects from the current environment
  • Unlike SAS, R requires explicit cleanup of the environment
  • Removing objects doesn't immediately free memory; gc() helps reclaim unused memory

Input Environment:

Object Name Type Size
patient_data data.frame 2.3 MB
lab_results data.frame 5.1 MB
demographic_model lm (linear model) 0.7 MB
plot_function function 0.1 MB

Expected Output After Clearing:

> ls()
character(0)

> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   267456 14.3     665050  35.6   460950  24.7
Vcells  3626190 27.7    8388608  64.0  8313713  63.5

2. Selective Object Management

Capability SAS R
Remove specific objects PROC DATASETS; delete dataset1 dataset2; rm(object1, object2)
Remove objects by pattern Requires macro rm(list=ls(pattern="temp"))
Keep specific objects Requires macro rm(list=setdiff(ls(), c("obj1","obj2")))
Save before removing Use PROC COPY Use save() then rm()

SAS Example

/* Delete specific datasets */
proc datasets library=work nolist;
   delete patient_data lab_results;
quit;

/* Using a macro to keep only specific datasets */
%macro keep_only(datasets);
   proc sql noprint;
      select memname into :all_ds separated by ' '
      from dictionary.tables
      where libname = 'WORK' and memname not in (&datasets);
   quit;
   
   %if &all_ds ne %then %do;
      proc datasets library=work nolist;
         delete &all_ds;
      quit;
   %end;
%mend;

%keep_only('DEMOGRAPHIC_DATA','VITAL_SIGNS');

Explanation (SAS):

  • PROC DATASETS with the delete statement removes specified datasets
  • SAS has no built-in pattern matching for deletion
  • The keep_only macro uses dictionary tables to identify all datasets except those specified
  • Selective deletion in SAS often requires custom macro programming
  • Datasets can be copied to another library before deletion using PROC COPY

R Example

# Remove specific objects
rm(patient_data, lab_results)

# Remove objects matching a pattern
rm(list = ls(pattern = "temp_"))

# Keep only specific objects
keep_objects <- c("demographic_data", "vital_signs")
rm(list = setdiff(ls(), keep_objects))

# Save objects before removing them
save(demographic_data, file = "demographic_data.RData")
rm(demographic_data)

Explanation (R):

  • rm() directly removes specified objects
  • ls(pattern = "temp_") finds all objects with names containing "temp_"
  • setdiff(ls(), keep_objects) creates a list of all objects except those to keep
  • save() preserves objects to disk before removal
  • R provides flexible options for selective object management
  • Pattern matching allows for more dynamic object selection than SAS

Input Environment:

Object Name Type
patient_data data.frame
lab_results data.frame
temp_data1 data.frame
temp_data2 data.frame
demographic_data data.frame
vital_signs data.frame

Expected Output After Keeping Only Specific Objects:

> ls()
[1] "demographic_data" "vital_signs"

3. Object Inspection Before Removal

Capability SAS R
List object details PROC CONTENTS data=_all_; lapply(mget(ls()), class)
Find large objects PROC SQL with dictionary tables sort(sapply(ls(), function(x) object.size(get(x))))
Identify memory hogs Manual tracking lobstr::obj_size() for nested objects
Check object type PROC CONTENTS class() or typeof()

SAS Example

/* List all datasets with details */
proc contents data=_all_;
run;

/* Find large datasets in WORK library */
proc sql;
   select memname, nobs, filesize/1048576 as size_mb
   from dictionary.tables
   where libname = 'WORK'
   order by filesize desc;
quit;

Explanation (SAS):

  • PROC CONTENTS data=_all_ displays detailed information about all datasets
  • dictionary.tables provides metadata about datasets including size and number of observations
  • Size information helps identify memory-intensive datasets
  • SAS focuses on dataset-level information rather than all object types
  • No direct way to see nested object sizes or complex structures

R Example

# List all objects with their classes
object_classes <- sapply(ls(), function(x) class(get(x)))
object_classes

# Find objects by size (ascending)
object_sizes <- sort(sapply(ls(), function(x) object.size(get(x))))
print(object_sizes, units = "auto")

# Detailed size analysis of nested objects
library(lobstr)
obj_sizes <- sapply(ls(), function(x) as.numeric(obj_size(get(x))))
data.frame(
  object = names(obj_sizes),
  size_mb = obj_sizes / 1024^2,
  row.names = NULL
) %>% arrange(desc(size_mb))

Explanation (R):

  • sapply(ls(), function(x) class(get(x))) returns the class of each object
  • object.size() reports the memory usage of objects
  • lobstr::obj_size() provides more accurate size information for complex objects
  • Size information helps identify memory-intensive objects before removal
  • R provides detailed memory information for all object types, not just data frames
  • Understanding object relationships helps manage memory more effectively

Input Environment:

Object Name Type Size
small_vector numeric vector 80 bytes
medium_df data.frame 1.2 MB
large_df data.frame 45.3 MB
huge_model randomForest 120 MB

Expected Output of Size Analysis:

> print(object_sizes, units = "auto")
small_vector   medium_df     large_df   huge_model 
      80 B       1.2 MB      45.3 MB     120.0 MB 

> data.frame(object = names(obj_sizes), size_mb = obj_sizes / 1024^2) %>% arrange(desc(size_mb))
        object   size_mb
1   huge_model 120.00000
2     large_df  45.30000
3    medium_df   1.20000
4 small_vector   0.00008

4. Working with Multiple Environments

Capability SAS R
Multiple libraries LIBNAME statements new.env()
Copy between environments PROC COPY assign(), get()
List objects in specific env PROC CONTENTS data=lib._all_; ls(envir=my_env)
Remove from specific env PROC DATASETS lib=templib; rm(list=ls(envir=my_env), envir=my_env)
Global vs local objects Library references Global, local, and package environments

SAS Example

/* Create temporary library */
libname templib temp;

/* Copy datasets to temporary library */
proc copy in=work out=templib;
   select patient_data lab_results;
run;

/* List datasets in temporary library */
proc contents data=templib._all_ nods;
run;

/* Remove datasets from temporary library */
proc datasets lib=templib kill nolist;
quit;

/* Remove temporary library reference */
libname templib clear;

Explanation (SAS):

  • LIBNAME templib temp creates a temporary library reference
  • PROC COPY moves datasets between libraries
  • PROC CONTENTS data=templib._all_ lists only datasets in the specific library
  • PROC DATASETS lib=templib kill removes only datasets in that library
  • LIBNAME templib clear removes the library reference
  • SAS manages data through library references rather than environments

R Example

# Create a new environment
temp_env <- new.env()

# Copy objects to the new environment
temp_env$patient_data <- patient_data
temp_env$lab_results <- lab_results

# Alternative method to copy objects
assign("patient_data", patient_data, envir = temp_env)

# List objects in the specific environment
ls(envir = temp_env)

# Access an object from the environment
temp_patient <- get("patient_data", envir = temp_env)

# Remove objects from the specific environment
rm(list = ls(envir = temp_env), envir = temp_env)

# Check if environment is empty
ls(envir = temp_env)

Explanation (R):

  • new.env() creates a separate environment for storing objects
  • Objects can be assigned directly using env$name notation
  • assign() function provides more control over object assignment
  • ls(envir = temp_env) lists only objects in the specified environment
  • get() retrieves objects from a specific environment
  • rm(list = ls(envir = temp_env), envir = temp_env) removes all objects from the environment
  • Environments provide more flexible isolation than SAS libraries

Input Global Environment:

Object Name Type
patient_data data.frame
lab_results data.frame
model_fit lm

Expected Output After Environment Operations:

# Global environment remains unchanged
> ls()
[1] "lab_results"  "model_fit"    "patient_data" "temp_env"

# New environment contains copied objects
> ls(envir = temp_env)
[1] "lab_results"  "patient_data"

# After clearing the environment
> ls(envir = temp_env)
character(0)

5. Beyond Basics: Environment Management Automation

Capability SAS R
Environment snapshots Custom macro sessionInfo(), custom functions
Object tracking No built-in support Custom tracking functions
Memory monitoring PROC SETINIT gc(), pryr::mem_used()
Conditional cleanup Macro with conditions Functions with conditional logic
Session management PROC DATASETS, macros savehistory(), loadhistory()

SAS Example

/* Advanced macro for conditional dataset management */
%macro manage_datasets(pattern=, action=DELETE, condition=);
   proc sql noprint;
      select memname into :ds_list separated by ' '
      from dictionary.tables
      where libname = 'WORK' 
      %if %length(&pattern) > 0 %then %do;
         and memname like "&pattern"
      %end;
      %if %length(&condition) > 0 %then %do;
         and &condition
      %end;
      ;
   quit;
   
   %if %upcase(&action) = DELETE %then %do;
      %if %length(&ds_list) > 0 %then %do;
         proc datasets lib=work nolist;
            delete &ds_list;
         quit;
         %put Deleted datasets: &ds_list;
      %end;
      %else %do;
         %put No datasets matched the criteria for deletion.;
      %end;
   %end;
   %else %if %upcase(&action) = LIST %then %do;
      %put Matching datasets: &ds_list;
   %end;
%mend;

/* Example usage */
%manage_datasets(pattern=TEMP\_%);
%manage_datasets(condition=nobs > 1000);

Explanation (SAS):

  • Creates a flexible macro for dataset management based on patterns or conditions
  • Uses dictionary.tables for metadata-based filtering
  • The macro supports multiple actions (DELETE, LIST)
  • Conditional logic allows for targeted operations
  • Provides feedback through the SAS log
  • SAS macros enable automation of routine cleanup tasks

R Example

# Advanced environment management functions
library(pryr)

# Create an environment tracker
env_tracker <- function() {
  # Initial snapshot of objects
  initial <- list(
    objects = ls(envir = .GlobalEnv),
    timestamp = Sys.time(),
    mem_used = mem_used()
  )
  
  # Function to check what's changed
  check_changes <- function() {
    current <- ls(envir = .GlobalEnv)
    new_objects <- setdiff(current, initial$objects)
    removed_objects <- setdiff(initial$objects, current)
    mem_diff <- mem_used() - initial$mem_used
    
    list(
      new_objects = new_objects,
      removed_objects = removed_objects,
      time_elapsed = difftime(Sys.time(), initial$timestamp, units = "mins"),
      memory_change = mem_diff
    )
  }
  
  # Function for conditional cleanup
  cleanup <- function(pattern = NULL, type = NULL, min_size = NULL) {
    objects <- ls(envir = .GlobalEnv)
    
    # Filter by pattern
    if (!is.null(pattern)) {
      objects <- grep(pattern, objects, value = TRUE)
    }
    
    # Filter by type
    if (!is.null(type)) {
      objects <- objects[sapply(objects, function(x) 
        inherits(get(x, envir = .GlobalEnv), type))]
    }
    
    # Filter by size
    if (!is.null(min_size)) {
      objects <- objects[sapply(objects, function(x) 
        object.size(get(x, envir = .GlobalEnv)) > min_size)]
    }
    
    if (length(objects) > 0) {
      rm(list = objects, envir = .GlobalEnv)
      cat("Removed", length(objects), "objects:", 
          paste(objects, collapse = ", "), "\n")
    } else {
      cat("No objects matched the criteria.\n")
    }
  }
  
  # Return functions
  list(
    check_changes = check_changes,
    cleanup = cleanup,
    reset = function() {
      initial$objects <- ls(envir = .GlobalEnv)
      initial$timestamp <- Sys.time()
      initial$mem_used <- mem_used()
      cat("Tracker reset.\n")
    }
  )
}

# Create a tracker
tracker <- env_tracker()

# Example usage
# Create some objects
temp_data1 <- matrix(rnorm(10000), 100, 100)
temp_data2 <- data.frame(x = 1:1000, y = rnorm(1000))
model_result <- lm(y ~ x, data = temp_data2)

# Check what's changed
changes <- tracker$check_changes()
print(changes)

# Cleanup temporary objects
tracker$cleanup(pattern = "^temp_")

# Cleanup by object type
tracker$cleanup(type = "lm")

Explanation (R):

  • Creates a sophisticated environment tracking system using closure functions
  • Maintains an initial snapshot of the environment
  • Tracks new objects, removed objects, time elapsed, and memory changes
  • Provides conditional cleanup based on name patterns, object types, or sizes
  • Environment monitoring helps identify memory leaks and track object creation
  • More flexible than SAS's dataset-centric approach
  • Enables data scientists to manage complex R sessions with many object types

Input Environment:

Object Name Type Size
original_data data.frame 1.5 MB
temp_data1 matrix 0.8 MB
temp_data2 data.frame 0.2 MB
model_result lm 0.1 MB

Expected Output of Tracking and Cleanup:

> changes <- tracker$check_changes()
> print(changes)
$new_objects
[1] "temp_data1"   "temp_data2"   "model_result"

$removed_objects
character(0)

$time_elapsed
Time difference of 0.25 mins

$memory_change
1.1 MB

> tracker$cleanup(pattern = "^temp_")
Removed 2 objects: temp_data1, temp_data2

> tracker$cleanup(type = "lm")
Removed 1 objects: model_result

> ls()
[1] "original_data" "tracker"

6. Best Practices for R Environment Management

  • Plan Your Memory Usage

    • Estimate memory requirements before loading large datasets
    • Use object.size() and gc() to monitor memory consumption
    • Consider using packages like data.table for memory-efficient operations
  • Clean Up Regularly

    • Remove temporary objects when no longer needed
    • Use naming conventions for temporary objects (e.g., prefix with "temp_")
    • Create cleanup functions for routine operations
  • Structure Your Environment

    • Use separate environments for different analysis components
    • Consider package development for complex projects with many functions
    • Use R projects to isolate different analyses
  • Optimize Large Data Handling

    • Process large datasets in chunks when possible
    • Use connection objects for streaming data
    • Consider database connections instead of loading everything into memory
    • Leverage packages designed for big data like disk.frame or arrow
  • Save and Restore Sessions Strategically

    • Save only essential objects, not entire environments
    • Document dependencies between objects
    • Use .Rprofile for environment setup, not data loading
  • Automate Environment Management

    • Create helper functions for common tasks
    • Implement environment snapshots at critical points
    • Track memory usage over time during long analyses
  • Document Your Environment

    • Use sessionInfo() to record package versions
    • Save package dependencies with renv or packrat
    • Include environment setup in analysis documentation

**Resource download links**

1.4.8.-R-Environment-vs-SAS-Work-Library.zip