--- title: "Creating a Singularity Container to Run HuggingFace Transformers Models in R" output: github_document #rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Singularity_Container} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` [Singularity](https://apptainer.org/) is a container engine alternative to Docker. Singularity containers are well suited for the requirements of High Performance Computing (HPC) workloads. A container contains all code as well as *all* its dependencies so that the an application runs reliably on different computers (or different computing environments). It can be used to run on servers or as a way to ensure computational reproducibility (that the code run on other systems, and in the future). For an introduction to the concept of containers see [Computational Reproducibility via Containers in Psychology](https://open.lnu.se/index.php/metapsychology/article/view/892/). Below is code to build a Singularity container for setting up transformers language models from HuggingFace and running the ```text```-package. ## Code to build a singularity container with HuggingFace models in R ``` Bootstrap: docker From: ubuntu:20.04 %environment export LANG=C.UTF-8 LC_ALL=C.UTF-8 export XDG_RUNTIME_DIR=/tmp/.run_$(uuidgen) %post # Install apt-get -y update export R_VERSION=4.2.2 echo "export R_VERSION=${R_VERSION}" >> $SINGULARITY_ENVIRONMENT # Install R apt-get update apt-get install -y --no-install-recommends software-properties-common dirmngr wget uuid-runtime wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \ tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc add-apt-repository \ "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/" apt-get install -y --no-install-recommends \ r-base=${R_VERSION}* \ r-base-core=${R_VERSION}* \ r-base-dev=${R_VERSION}* \ r-recommended=${R_VERSION}* \ r-base-html=${R_VERSION}* \ r-doc-html=${R_VERSION}* \ libcurl4-openssl-dev \ libharfbuzz-dev \ libfribidi-dev \ libgit2-dev \ libxml2-dev \ libfontconfig1-dev \ libssl-dev \ libxml2-dev \ libfreetype6-dev \ libpng-dev \ libtiff5-dev \ libjpeg-dev # Add a default CRAN mirror echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site # Fix R package libpaths (helps RStudio Server find the right directories) mkdir -p /usr/lib64/R/etc echo "R_LIBS_USER='/usr/lib64/R/library'" >> /usr/lib64/R/etc/Renviron echo "R_LIBS_SITE='${R_PACKAGE_DIR}'" >> /usr/lib64/R/etc/Renviron # Clean up rm -rf /var/lib/apt/lists/* # Install python3 apt-get -y install python3 wget apt-get -y clean # Install Miniconda cd / wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b -p /miniconda /bin/bash <<EOF rm Miniconda3-latest-Linux-x86_64.sh source /miniconda/etc/profile.d/conda.sh conda update -y conda # Install reticulate and text Rscript -e 'install.packages("pkgdown")' Rscript -e 'install.packages("ragg")' Rscript -e 'install.packages("textshaping")' Rscript -e 'install.packages("reticulate")' Rscript -e 'install.packages("devtools")' Rscript -e 'install.packages("glmnet")' Rscript -e 'install.packages("tidyverse")' # Rscript -e 'install.packages("text")' Rscript -e 'devtools::install_github("oscarkjell/text")' # Create the Conda environment at a system folder Rscript -e 'text::textrpp_install(prompt = FALSE, rpp_version = c("torch==1.11.0", "transformers==4.19.2", "numpy", "nltk"))' Rscript -e 'text::textrpp_initialize(save_profile = TRUE, prompt = FALSE, textEmbed_test = TRUE)' Rscript -e 'text::textEmbed("hello", model = "distilbert-base-uncased", layers = 5)' Rscript -e 'text::textEmbed("hello", model = "roberta-base", layers = 11)' ```