Using BFI in SAS

Hassan Pazira

2024-04-25

Introduction

The BFI package is a powerful tool in R designed to execute the Bayesian Federated Inference (BFI) methodology, supporting a wide range of regression models including linear, logistic, and Cox regression. While SAS offers robust statistical capabilities, it currently lacks a dedicated package for implementing the BFI method. Consequently, this vignette serves to bridge that gap by illustrating how to utilize the R package BFI within the SAS environment. By seamlessly integrating R’s BFI package with the analytical prowess of SAS, users gain access to a comprehensive suite of statistical techniques, enhancing their ability to conduct sophisticated data analyses, particularly when working with small datasets.

In this guide, we’ll explore how you can leverage SAS to effectively utilize the BFI package for your data analysis needs.

To utilize R within SAS, it’s assumed that you have access to a SAS server. This access allows you to establish a connection to a SAS session by providing the necessary connection parameters, such as the SAS server hostname, port number, and authentication credentials.

More information about configuring the SAS system to call functions in the R language is documented in the SAS Online Help.

Accessing SAS Services

To access SAS services, you typically need to connect to a SAS server. Here’s how you can do it using SAS Studio, which is a web-based interface for SAS:

Open a web browser and navigate to the URL provided by your SAS administrator for accessing SAS Studio. Enter your credentials to log in to SAS Studio. Once logged in, you can access SAS services such as analysis and reporting through the SAS Studio interface.

Install R and Configure with SAS

SAS requires two configuration options in order to communicate with R. First the RLANG option must be set when SAS is started. This may be set either in a custom configuration file or on the SAS command line. Second, SAS needs an R_HOME environment variable to point it to the correct, available version of R.

The RLANG System Option

The RLANG system option determines whether you have permission to call R from the SAS system. You can determine the value of the RLANG option by submitting the following SAS statements:

proc options option=RLANG;
run;

The result is one of the following statements in the SAS log:

If the SAS log contains this statement, it means that R integration is disabled, and you do not have permission to call R from the SAS system :( You may need to consult with your SAS administrator or IT department to enable it.

If the SAS log contains this statement, it means that R integration is enabled, and you can call R from the SAS system :)

Install R

Download and install R from the official R website (https://www.r-project.org/). Follow the installation instructions provided for your operating system.

Install SAS/IML Interface to R

The SAS/IML Interface to R allows you to call R functions from within PROC IML (Interactive Matrix Language). Check if the interface is installed by running the following code within SAS:

proc options option=R_HOME;
run;

If the path to your R installation directory is displayed, the SAS/IML Interface to R is installed. If not, you may need to install or reinstall it.

Using PROC IML and Rsubmit

You can use R inside SAS through the use of the PROC IML procedure. PROC IML allows you to execute R code within a SAS session, enabling integration between SAS and R for data analysis and statistical modeling.

Installing R packages from CRAN and GitHub in SAS:

To install an R package from CRAN and GitHub, you can use the base, stats and remotes packages, respectively. It can be done in R or in SAS. Here’s how you can do it within SAS:

proc iml;

  rsubmit;

    /* First install and load 'base', 'stats' and 'remotes' from CRAN */
    install.packages("base")
    install.packages("stats")
    install.packages("remotes")
    library(base)
    library(stats)
    library(remotes)
  
    /* Then install and load BFI from GitHub */
    remotes::install_github("hassanpazira/BFI", force = TRUE)
    library(BFI)

  endrsubmit;

quit;

Now that you have the BFI package installed and configured, let’s explore its functionality through the following example.

Example

Now we generate two datasets independently from Gaussian distribution, and then apply main functions in the BFI package to these datasets:

Simulate data for two local centers

First generate 30 samples randomly from Gaussian distribution N(0, 1) with p=3 covariates:

proc iml;

    /****************************************************************/
    /* Center 1: Data simulation for local center 1 with 30 samples */
    /****************************************************************/

    p     = 3;   /* Number of variables */
    n1    = 30;  /* Number of samples for center 1 */
    theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */

    X1  = j(n1, p); /* Initialize matrix X1 */
    mu1 = j(n1, 1); /* Initialize vector mu1 */
    y1  = j(n1, 1); /* Initialize vector y1 */

    /* Generate data for center 1 */
    call randseed(1123);
    X1  = randfun(n1 || p, "Normal", 0, 1);
    mu1 = theta[1] + X1 * theta[2:4];
    y1  = randfun(n1 || 1, "Normal", mu1, sqrt(theta[5]));

    /* Create dataset for center 1 */
    create y1 var {"y1"};
    append;
    close y1;

    create X1 from X1[colname={"X1_1" "X1_2" "X1_3"}];
    append from X1;
    close X1;

    call ExportMatrixToR(X1, "X1");
    call ExportMatrixToR(y1, "y1");

quit;

Now generate 50 samples randomly from N(0, 1) with 3 covariates:

proc iml;

    /****************************************************************/
    /* Center 2: Data simulation for local center 2 with 50 samples */
    /****************************************************************/
    p     = 3;   /* Number of variables */
    n2    = 50;   /* Number of samples for center 2 */
    theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */
  
    X2  = j(n2, p);
    mu2 = j(n2, 1);
    y2  = j(n2, 1);

    /* Generate data for center 2 */
    call randseed(1123);
    X2  = randfun(n2 || p, "Normal", 0, 1);
    mu2 = theta[1] + X2 * theta[2:4];
    y2  = randfun(n2 || 1, "Normal", mu2, sqrt(theta[5]));

    /* Create dataset for center 1 */
    create y2 var {"y2"};
    append;
    close y2;

    create X2 from X2[colname={"X2_1" "X2_2" "X2_3"}];
    append from X2;
    close X2;

    call ExportMatrixToR(X2, "X2");
    call ExportMatrixToR(y2, "y2");

quit;

We have transferred SAS data to the R session and are currently initiating an analysis using the BFI method in R. All communications with R are facilitated through SAS’s PROC IML. It’s important to note that capitalization matters in R, and character variables are automatically converted into factors.

MAP estimates at the local centers

The following compute the Maximum A Posterior (MAP) estimators of the parameters for center 1:

proc iml;
  
  rsubmit;

    #---------------------------
    # Inverse Covariance Matrix
    #---------------------------
    # Creating the inverse covariance matrix for the Gaussian prior distribution:
    Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian)

    #--------------------------
    # MAP estimates at center 1
    #--------------------------
    fit1       <- MAP.estimation(y1, X1, family=gaussian, Lambda)
    theta_hat1 <- fit1$theta_hat # intercept and coefficient estimates
    A_hat1     <- fit1$A_hat     # minus the curvature matrix
    summary(fit1, cur_mat=TRUE)

  endrsubmit;

quit;

Obtaining the MAP estimators of the parameters for center 2 using the following:

proc iml;

  rsubmit;
    
    # Creating the inverse covariance matrix for the Gaussian prior distribution:
    Lambda <- inv.prior.cov(X2, lambda=0.05, family=gaussian)

    #--------------------------
    # MAP estimates at center 2
    #--------------------------
    fit2       <- MAP.estimation(y2, X2, family=gaussian, Lambda)
    theta_hat2 <- fit2$theta_hat
    A_hat2     <- fit2$A_hat
    summary(fit2, cur_mat=TRUE)

  endrsubmit;

quit;

BFI at central center

Now, you can utilize the primary function bfi() to acquire the BFI estimates:

proc iml;

  rsubmit;
  
    # Creating the inverse covariance matrix for central server:
    Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian) # the same as other centers
  
    #----------------------
    # BFI at central center
    #----------------------
    A_hats     <- list(A_hat1, A_hat2)
    theta_hats <- list(theta_hat1, theta_hat2)
    bfi        <- bfi(theta_hats, A_hats, Lambda)
    summary(bfi, cur_mat=TRUE)

  endrsubmit;

  call ImportMatrixFromR(bfi, "bfi");
  
quit;

Datasets included in the BFI package

In order to find and use the datasets available from the BFI package, use the following codes:

proc iml;

  rsubmit;
  
    # To find a list of all datasets included in the package
    print(data(package = "BFI"))  
  
    # To use the 'Nurses' data
    BFI::Nurses
    cat("Dimension of the 'Nurses' data: \n", dim(Nurses))
    cat("Colnames of the 'Nurses' data: \n", colnames(Nurses))

    # To use the 'trauma' data
    BFI::trauma
    cat("Dimension of the 'trauma' data: \n", dim(trauma))
    cat("Colnames of the 'trauma' data: \n", colnames(trauma))

  endrsubmit;

quit;

Importing the data from R

R objects and data may be brought back into SAS as well, for any manipulation you might want to do in SAS. Here, we just grab the bfi object and the Nurses data from R and print the data in SAS.

proc iml;
  submit / R;  * 'rsubmit' is equivalent to 'submit / R' ;

   # Export 'bfi' object
   ExportDataSetToSAS(bfi)

   # Export dataset 'Nurses'
   ExportDataSetToSAS(Nurses)
   
  endsubmit;
run;
  
proc print data=Nurses;
run;

BFI as a SAS Package

In the near future, we will be releasing the SAS/IML package for BFI, which can be installed by the PACKAGE INSTALL statement in the SAS environment.

Contact

If you find any errors, have any suggestions, or would like to request that something be added, please file an issue at issue report or send an email to: .