Based on my previous posts about using Seurat for single-cell RNAseq data (single sample or two samples), it started to become clear to me that many people will have trouble with their computing resources. My desktop is Windows 10 with 64 Gb of RAM and I was reaching my limits (with a few other programs running in background) when I tried to combine four 10XGenomics datasets!
It seemed reasonable to use our HPC (Perceval or Amarel at Rutgers, run by OARC) environment to take advantage of a much more robust hardware environment. I quickly learned that there was no way to install all of the required system libraries or even R packages that I needed. I wish I had a fully-customizable “desktop-like” environment on my HPC!
Naturally, I’m not the only person who had this thought. The HPC staff recommended Singularity, which is a “container” technology designed exactly for my needs–to run computationally-intensive jobs on a HPC while keeping my custom installation environment isolated from the system. Singularity has the unique property of maintaining user identity and security. But it provides an environment which acts like a combined version of your Linux desktop and the HPC system.
I am only a beginner to the use of Singularity. There’s much more to it than I’m going to describe. This description will be limited to how I got Singularity to allow me to run R and Seurat on a single compute node on my cluster. You can also submit batch jobs using SLURM and use it for many other customized workflows, but I’ll skip all that for now.
Install Singularity on a local computer
Following the “Quick Start” directions on the Sylabs.io site, I used git to clone singularity into a new directory on my Ubuntu 16.04 desktop. Follow the Quick-Start directions to make and install. You’ll need sudo permission to install into a local system location.
Build a Sandbox Container
The first step is to build an empty container and “install” an OS. I’m going to use Ubuntu, since this matches my local Linux computer’s system. After trying a few things and reviewing the documentation, I chose to create a very basic recipe file first. Here’s my initial recipe file:
Bootstrap: docker From: ubuntu:16.04 %runscript echo "This is what happens when you run the container.." %post echo "Hello from inside the container" apt-get -y update apt-get -y install vim %environment export LC_ALL=C export PATH=$PATH
All this was saved into a file named “Singularity” in a new directory that I named “containers“. My instructions say to pull Ubuntu 16.04 from the docker repository and then to give me a message (“Hello from inside the container”) and run some apt-get command (just to prove to myself that this works). I copied some environment settings from an example I found–I still need to research these and perhaps modify them.
To start, call singularity as root and tell it you want a writable “sandbox” container, which is implemented as a new subdirectory named “myubuntu/“.
sudo singularity build --sandbox myubuntu/ Singularity
Once this runs, you should have a new subdirectory in your containers folder. You can start an interactive shell in this new environment to install anything you want. Make sure to start it using sudo and use the –writable flag (if you leave this out, you can “test-drive” the shell but anything you do won’t be saved).
sudo singularity shell --writable myubuntu/
After giving your sudo password, you should see:
Singularity: Invoking an interactive shell within container... Singularity myubuntu:~>
Since you pre-installed vim, you should be able to run it now from command-line. You can now use apt-get to install anything you want to use within your container.
Fix your Recipe
Before moving on to R, we’ll need to install several libraries we’re going to need. One of these is for installing R from the secure CRAN repository, others are needed for Seurat prerequisites. You can run these lines manually from the shell command-prompt:
apt-get -y install libssl-dev apt-get -y install libcurl4-openssl-dev apt-get -y install libhdf5-dev apt-get -y install apt-transport-https
Or, if you prefer, you can add these commands to your Singularity recipe file (after the vim install command). Just delete your myubuntu container, and re-run the sudo singularity build command with the new recipe file.
Installing R Inside Your Container
This gets a little tricky, since the Ubuntu 16.04 default r-base package is too old to work with Seurat. To use a CRAN/Ubuntu repository for installation, we need to do some customization. First, edit your /etc/apt/sources.list file (meaning the file inside your singularity container–make sure you have already used the “sudo singularity shell…” command from above). Open it with the nano editor and add this line at the end:
deb https://cloud.r-project.org/bin/linux/ubuntu xenial/
Before you can use this repository, however, you need to download and install the public key for this repository. I found that there’s lots of websites listing the older, outdated key; here’s the correct command that works:
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9
If that seems to work (no error messages) you’re ready to start. Update your install repositories:
And then install R:
apt-get install r-base r-base-dev
This’ll take a while. When it’s done, you should be able to open R:
Singularity myubuntu:~> R R version 3.4.4 (2018-03-15) -- "Someone to Lean On" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
Install R Packages
Just to be sure everything is working, start with an easy one.
You should be able to install dplyr into your local user library without any problem. If everything works correctly, go on to the big one:
This will run for a while. There’s a lot of dependencies and everything needs to be downloaded and compiled. If everything is working, you will be able to load the library:
If that works, you’re ready to finalize your container.
Convert your Container
Quit R (“q()”, save history if you like) and leave your container’s shell (control-d). Now you’re back at the prompt for your local Linux system. Convert your sandbox to a single file.
sudo singularity build --writable production.simg myubuntu/
Doing this maintains your sandbox (myubuntu/) but creates a new ext3 image file named production.simg — mine is ~1.5Gb. If you remove the –writable flag from this command, you get a smaller, squashfs file that works but cannot be modified. Once you’re done testing everything this is a good way to make a static, working container.
Move this file to your home space on the HPC:
scp production.simg firstname.lastname@example.org:/home/username/.
Singularity on Perceval
Next, ssh into your account on Perceval. Singularity is installed as a module but it does not show up on the module spider command list because it is still in “test” mode and they have it hidden. Load it like this:
module load singularity/.2.5.1
The dot in front of the version number indicates that it’s hidden. Now, before starting to run your container, request a node from the cluster so you’re not using compute resources on the headnode:
srun --partition=main --nodes=1 --ntasks=1 --cpus-per-task=1 --mem=8g --time=00:30:00 --export=ALL --pty bash -i
It may take a minute until you’re assigned to an available node, but when you do you’ll see a change in your prompt similar to this:
[username@node073 ~] $
Start up your container:
singularity shell production.simg
You’ll see the same prompt as before. You can now start R and load library(Seurat) as before on your desktop!
Dynamically Adding Mount Points
One issue I noticed was that I didn’t have access to the /scratch/username volume in my container. With some input from Josh and some trial-and-error, I found the solution. You can specify additional mount when you invoke the container but they must be bound to an existing mount-point. I realized that my barebones Ubuntu already had a /mnt in the directory tree and it was empty. So here’s the new command:
singularity shell --bind /scratch/username:/mnt production.simg
Now your /scratch/username space is found at /mnt. Easy!
A Smaller Production Container
Once you’ve got everything built and you’ve tested your container (made with the –writable flag), you can make a smaller version of the same container for production. For example, go back to your local system (where you built the myubuntu/ sandbox) and do this:
sudo singularity build final.simg myubuntu/
The file size for this is about half of the writable one. Copy it to the HPC system and run it or start a shell as before. The only difference is that you can’t modify this container.
If you try to start your singularity container on the HPC and see this error:
ERROR : Failed to resolve path to /var/singularity/mnt/container ABORT : Retval = 255
You’re probably trying to run from the headnode, which isn’t allowed. Use the srun command to gain access to a compute node.
- Notice the settings on the srun command above for requesting resources. To test the container, I specified only 1 node, 1 CPU and 8g RAM, as well as setting a time limit of 30 minutes. Seurat is single-threaded so there’s probably no reason to set more CPUs, but certainly you’ll want to increase the RAM and maybe the time. Asking for more resources may delay granting your request.
- This example allows for interactive shell usage of R only. Once you set up an R script file with all your commands, there’s no reason why you can’t automate the process with an Rscript file.R command. You can even put that in your recipe file and use “singularity run” instead of “shell” (put it under %runscript). But that’s for another post…