Skip to content

The CNIO HPC cluster

These are some guidelines to using our HPC cluster and the Slurm workflow system.

Cluster resources

Compute nodes

The cluster currently features 12 compute nodes with the following configurations:

Count Node names CPU cores RAM GPUs
1 bc001 24 32Gb -
6 bc00[2-7] 52 512Gb -
3 bc00[8-10] 128 1Tb -
1 hm001 224 2Tb -
1 gp001 112 768Gb 2 x Nvidia A100 80Gb


Storage

The cluster features 120Tb of standard storage (where user's home directories are mounted) and 512Tb of high performance storage (for compute jobs input and output).

Paragraph for adding to grant applications, etc.

The CNIO HPC cluster currently consists on: two login nodes working in active/passive mode (2x 40 core/392Gb RAM); a total of 728 compute cores distributed in 9 "standard" compute nodes (6x 52 core/512Gb RAM and 3x 64core/768Gb RAM) and a high-memory node (1x 224 core/2Tb RAM); a dedicated GPU node featuring 2 x Nvidia A100 80Gb GPUs. Local storage is composed of 48 4TB disks in a dual-channel enclosure organized into a RAID-10 unit with an effective available storage of 172TB which host the home directories, and a high-performance lustre storage system with 512 TB of effective available storage for computation. The cluster runs a Slurm queuing system with fairshare priority management.

Cluster usage

Introduction to HPC

See this tutorial for an introduction to HPC concepts and usage (highly recommended if you have little or no experience in using HPC environments).

Requesting access

Please fill out the access request form in order to request access to the cluster.

Accessing the cluster

The cluster's hostname is cluster1.cnio.es (cluster1 for short), and can be accessed via SSH.

Note

If you'd like to SSH into the cluster through the CNIO VPN you will need to ask IT for VPN access.

Mailing list

Important notifications about the cluster are sent to the CNIO HPC mailing list. Make sure to subscribe using this link to stay up to date. The list has very low traffic.

Note

Do subscribe to the list!

Storage

Warning

There are two types of storage in the cluster: your home directory and scratch space.

Your home directory is well suited for storing software (e.g. installing conda), configuration files, etc. It is however not quick enough to store the input or output files of your analyses. Using your home for this will likely hang your jobs and corrupt your files. You should use the scratch space instead.

Warning

Scratch space is to be considered "volatile": you should copy your input files, execute, copy your output files, and delete everything.

Home directories are not volatile, but data safety is not guaranteed in the long term (so keep backups).

Home directories

You have an allocation of 300Gb in your home directory, located in /home/<yourusername>/.

Scratch space

User space

You have an allocation of 500Gb in your "scratch" directory, located at /storage/scratch01/users/<yourusername>/.

This space is meant to be used for testing and ephemeral compute operations.

Project space

To request high-performance scratch space for a new project please fill the storage request form.

Checking your quotas

You can check your current quotas with the following commands:

$ zfs get userquota@$(whoami) homepool/home # check your home quota

$ quotr quota list # check your scratch quotas

Copying data to/from the cluster

The most efficient way of copying large amounts of data is by using rsync. rsync allows you to resume a transfer in case something goes wrong.

To transfer a directory to the cluster using rsync you would do something like:

rsync -avx --progress mydirectory/ cluster1:/storage/scratch01/myuser/mydirectory/

Note

Pay attention to the trailing slashes, which completely change rsync's behaviour if missing.

For small transfers you could also use scp if you prefer:

scp -r mydirectory/ cluster1:/storage/scratch01/myuser/

In both cases you can revert the order of the local and remote directory to copy from the cluster to your local computer instead.

Submitting jobs

Warning

As a general rule, no heavy processes should be run directly on the login nodes. Instead you should use the "sbatch" or "srun" commands to send them to the compute nodes. See examples below.

The command structure to send a job to the queue is the following:

sbatch -o <logfile> -e <errfile> -J <jobname> -c <ncores> --mem=<total_memory>G \
    -t<time_minutes> --wrap "<command>"

For example, to submit the command bwa index mygenome.fasta as a job:

sbatch -o log.txt -e error.txt -J index_genome -c 1 --mem=4G -t120 \
    --wrap "bwa index mygenome.fasta"

Job resources

-c, --mem, and -t tell the system how many cores, RAM memory, and time your job will need.

Time limits

Note

Ideally, you should aim at your jobs having a duration of between 1 and 8 hours. Shorter jobs may cause too much scheduling overhead, and longer jobs will make it harder for the scheduler to optimally schedule jobs into the queues.

If no time limit is specified, a default of 30 minutes will be assigned automatically.

The "short" queue has a limit of 2 hours per job and the highest priority (i.e. jobs in this queue will run sooner compared to other queues).

The "main" queue has a limit of 24 hours per job and medium priority.

The "long" queue has a limit of 168 hours (7 days) per job and 4 concurrent jobs, and the lowest priority.

Note

You can specify the partition (queue) you want to use with the -p argument to sbatch or srun.

Resources and their effect on job priority

The resources you request for a job will influence the chances that such job has to enter the queue, compared to others: the more resources you request, the longer you may have to wait for those resources to be available.

In addition, the future priority of your jobs will also be influenced by the resources you request (not use, request, even if you don't use them in the end): the more you request, the less priority you'll have for future jobs.

If one of your jobs has lower priority than another one, but its running time would not delay that higher priority one from entering the queue and there are enough resources for it, your job could enter the queue first. That's why it's important to try and assign reasonably accurate time limits (a.k.a. walltimes) to your jobs.

You can check how efficiently a job used its assigned resources with the seff <jobid> command (once the job is finished).

Snakemake executor for Slurm

Warning

The previous way of having Snakemake send jobs to the nodes using a profile (--profile $SMK_PROFILE_SLURM) is still functional but deprecated. You should ideally move to the native executor described below, and report any issues to the list.

Snakemake features a native Slurm executor, which will send jobs to Slurm instead of running them locally. To use it simply add the --slurm argument to your snakemake command.

Note

In order to indicate the resources required by each Snakemake rule you should use the threads and resources options.

Note

The Snakemake command will remain active while your jobs run, so it's recommended that you launch it inside a detachable terminal emulator (e.g. GNU Screen) so you can disconnect from the cluster and keep your jobs running.

Interactive sessions

During development and testing you may want to be able to run commands interactively. You can request an interactive session, which will run on a compute node, using the srun command with the --pty argument.

For example, to request a 2-hour session with 4Gb RAM and 2 CPUs, you would do:

srun --mem=4096 -c 2 -t 120 --pty /bin/bash

Note

Interactive sessions are limited to a maximum of 120 minutes.

GPUs

Warning

GPU usage is "experimental". Please expect changes in the setup (which will be announced through the mailing list).

The cluster features two Nvidia A100 GPUs with 80Gb of VRAM each.

The GPUs are split into instances of different capacities in order to increase usability. Available instances are:

  • 1 x 7g.80gb: 80Gb of GPU VRAM and 7 streaming multiprocessors
  • 3 x 2g.20gb: 20Gb of GPU VRAM and 2 streaming multiprocessors
  • 1 x 1g.20gb: 20Gb of GPU VRAM and 1 streaming multiprocessors

To run a command using GPUs you need to specify the gpu partition, and request the GPU instances you require using the --gres=gpu:<instance_name>:<number_of_gpus> argument of sbatch. Here's an example to run a python script with two 2g.20gb instances:

sbatch -p gpu --gres=gpu:2g.20gb:2 --wrap "python train_my_net.py"

Installing software

Software management is left up to the user, and we recommend doing it by installing mambaforge and the bioconda channels.

If you're completely unfamiliar with conda and bioconda, we recommend following this tutorial. Notice that the tutorial uses miniconda, which is equivalent to the recommended mambaforge.

Note

Remember to use your home directory to install software (including conda), as lustre performs poorly when reading small files often.