5. Compute on Spider

Tip

This is a quickstart on the platform. In this page you will learn:

  • how to prepare and run your workloads

  • about job types, partitions and Slurm constraints

The table below lists the available* Spider node types.

Number of nodes

Node name

CPU SKU

CPU cores per node

Total memory per node (per core)

Other characteristics

Included in the partition

25

wn-ca-[01-25]

AMD Rome 64 Cores/Socket

64

480 GB (7,5 GB)

Local scratch 6TB SSD

normal, infinite, short, interactive

18

wn-dc-[01-18]

AMD Rome 64 Cores/Socket

64

960 GB (15 GB)

Local scratch 12TB SSD

normal

5

wn-ha-[01-05]

AMD Rome 64 Cores/Socket

64

950 GB (14,85 GB)

Local scratch 12TB SSD

normal, infinite

5

wn-hb-[01-05]

AMD Naples 64 Cores/Socket

64

950 GB (14,85 GB)

Local scratch 12TB SSD

normal

2

wn-gp-[01,02]

Intel Cascade Lake 22 Cores/Socket

22

720 GB (32,77 GB)

Local scratch 6TB SSD

gpu_v100

3

wn-ga-[01-03]

AMD Rome (2x) 7 Cores/Socket

14

229 GB (16,38 GB)

Local scratch 6TB SSD

gpu_a100_7c

5

wn-gb-[01-05]

Intel Ice Lake(2x) 22 Cores/Socket

44

353 GB (8 GB)

Local scratch 6TB SSD

gpu_a100_22c

  • Updated on the 29th November, 2023.

5.1. Prepare your workloads

When you submit jobs to the batch system, you create a job script where you specify the resources that your programs need from the system to execute successfully.

Before submitting your jobs, it is a good practice to run a few tests of your programs locally (on the login node or other system) and observe:

  1. the time that your programs take to execute

  2. the amount of cores that your software needs to execute these tasks

  3. the maximum memory used by the programs during execution

We suggest you, where possible, first debug your job template on the login node. In doing so, please take into account that the login node is a shared resource and hence any job testing should consume the least demanding set of resources. If you have high resource demands please contact our helpdesk for support in testing your jobs.

Once you get a rough estimate of the resources above, you are set to go. Create your job script to request from the scheduler the estimated resources.

In the current setup of Slurm on Spider, we ask you to specify at least the following attributes:

SBATCH directive

Functionality

Usage example

-N <number>

the number of nodes

#SBATCH -N 1 (the job will run on a single node)

-c <number>

the number of cores

#SBATCH -c 2 (the job will use 2 cores couple to 16000 MB memory)

-t HH:MM:SS

the wall-clock time

#SBATCH -t 1:00:00 (the job will run max for 1 hour)

-p <partition>

partition selection

#SBATCH -p normal (the job will run max for 120 hours)

-p <partition>

partition selection

#SBATCH -p infinite (the job will run max for 720 hours)

-p <partition>

partition selection

#SBATCH -p short (the job will run max for 12 hours)

-p <partition>

partition selection

#SBATCH -p interactive (the job will run max for 12 hours)

-p <partition>

partition selection

#SBATCH -p gpu_v100 (the job will run on V100 nodes with a max of 120 hours)

-p <partition>

partition selection

#SBATCH -p gpu_a100_22c (the job will run on A100 nodes with a max of 120 hours, with max 22 cores per GPU)

-p <partition>

partition selection

#SBATCH -p gpu_a100_7c (the job will run on A100 nodes with a max of 120 hours, with max 7 cores per GPU)

The specifics of each partition can be found with scontrol show partitions, the information per machine can be found with scontrol show node NAME, where NAME is the name of the worker node and for a simple overview use sinfo.

5.2. Run your jobs

5.2.1. Running a local Job with srun

The srun command creates an allocation and executes an application on a cluster managed by Slurm. It comes with a great deal of options for which help is available by typing srun --help on the login node. Alternatively, you can also get help at the Slurm documentation page.

The srun command when used on the command line is executed locally by Slurm, an example of this is given below. A python script, hello_world.py, has the following content;

#!/usr/bin/env python
print("Hello World")

This python script can be locally executed as;

srun python hello_world.py
#Hello World

Typically srun should only be used with a job script that is submitted with sbatch to the Slurm managed job queue.

5.2.2. Running an interactive Job with srun

You can start an interactive session on a worker node. This helps when you want to debug your pipeline or compile some software directly on the node. You will have direct access to your home and project space files from within your interactive session.

The interactive jobs will also be ‘scheduled’ along with batch jobs for resources so they may not always start immediately.

The example below shows how to start an interactive session on a normal partition worker node with maximum time of one hour, one core and one task per node;

srun --partition=normal --time=00:60:00 -c 1 --ntasks-per-node=1 --pty bash -i -l

To stop your session and return to the login node, type exit.

The example below shows how to start an interactive session on a single core of a specific worker node;

srun -c 1 --time=01:00:00 --nodelist=wn-db-02 --x11 --pty bash -i -l

5.2.3. Submitting a Job Script with sbatch

The sbatch command submits batch script or job description script with 1 or more srun commands to the batch queue. This script is written in bash, and requires SBATCH header lines that define all of your jobs global parameters. Slurm then manages this queue and schedules the individual srun jobs for execution on the available worker nodes. Slurm takes into account the global options specified with #SBATCH <options> in the job description script as well as any local options specified for individual srun <options> jobs.

Below we provide an example for sbatch job submission with options. Here we submit and execute the above mentioned hello_world.py script to the queue via sbatch and provide options - N 1 to request only 1 node, -c 1 to request for 1 core and 8000 MB memory (coupled) and -t 1:00 to request a maximum run time of 1 minute. The job script, hello_world.sh, is an executable bash script with the following code;

#!/bin/bash
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1:00
srun python /home/[USERNAME]/[path-to-script]/hello_world.py

You can submit this job script to the Slurm managed job queue as;

sbatch hello_world.sh
#Submitted batch job 808

The job is scheduled in the queue with jobid 808 and the stdout output of the job is saved in the ascii file slurm-808.out.

more slurm-808.out
#Hello World

More information on sbatch can be found at the Slurm documentation page.

5.2.4. Using local scratch

If you run jobs that require intensive IO processes, we advise you to use scratch because it is the local SSD on every compute node of the the Spider. This is a temporary storage that can be used only during the execution of your job and will be arbitrarily removed at any point once your job has finished running.

In order to access the scratch filesystem within your jobs, you should use the $TMPDIR variable in your job script. We advise you the following job workflow:

  • At the start of your job, copy the necessary input files to $TMPDIR

  • Run your analysis and produce your intermediate/output files on $TMPDIR

  • Copy the output files at the end of the job from $TMPDIR to your home directory

TMPDIR is /tmp which is a ‘bind mount’ from /scratch/slurm.<JOBID> so you will only see your own job files in /tmp and all files will be removed after the job finishes.

Tip

TMPDIR variable can only be used within the SLURM jobs. It can not be used nor tested on the UI because there is no scratch space.

Here is a job script template for $TMPDIR usage;

#!/bin/bash
#SBATCH -N 1      #request 1 node
#SBATCH -c 1      #request 1 core and 8000 MB RAM
#SBATCH -t 5:00   #request 5 minutes jobs slot

mkdir "$TMPDIR"/myanalysis
cp -r $HOME/mydata "$TMPDIR"/myanalysis
cd "$TMPDIR"/myanalysis

# = Run your analysis here =

#when done, copy the output to your /home storage
tar cf output.tar output/
cp "$TMPDIR"/myanalysis/output.tar $HOME/
echo "SUCCESS"
exit 0

5.3. Job types

5.3.1. CPU jobs

  • For regular jobs we advise to always only use 1 node per job script i.e., -N 1. If you need multi-node job execution, consider better an HPC facility.

  • On Spider we provide 8000 MB RAM per core.

    • This means that your memory requirements can be specified via the number of cores without an extra directive for memory

    • For example, by specifying -c 4 you request 4 cores and 32000 MB RAM

  • On Spider we provide 80 GB scratch disk per core.

    • This means that your scratch disk requirements can be specified via the number of cores without an extra directive for storage

    • For example, by specifying -c 2 you request 2 cores and 160 GB scratch disk

    • When you target specifically our fat nodes with 12TB available scratch, the provided scratch disk per requested core is 200 GB

5.3.2. GPU jobs

  • For more information on using GPUs on Spider, see the dedicated section.

  • For jobs that require GPU resources a specific partition is available (see partitions for all the different partitions).

  • Access to the GPU paritions needs to be requested and received.

5.4. Slurm partitions

We have configured four CPU and two GPU partitions on Spider as shown in the table above:

  • If no partition is specified, the jobs will be scheduled on the normal partition which has a maximum walltime of 120 hours and can run on any worker nodes.

  • Infinite partition jobs have a maximum walltime of 720 hours. Please note that you should run on this partition at your own risk. Jobs running on this partition can be killed without warning for system maintenances and we will not be responsible for data loss or loss of compute hours.

  • Short partition is meant for testing jobs. It allows for 2 jobs per user with 8 cores max per job and 12 hours max walltime.

  • Interactive partition is meant for testing jobs and has 12 hours maximum walltime.

  • GPU V100 contains 1 Nvidia V100 (32GB) card per node.

  • GPU A100 contains 2 Nvidia A100 (40GB) cards per node.

5.5. Slurm constraints

5.5.1. Regular constraints

The Slurm scheduler will schedule your job on any compute node that can fulfil the constraints that you provide with your sbatch command upon job submission.

The minimum constraints that we ask you to provide with your job are given in the example above.

Many other constraints can also be provided with your job submission. However, by adding more constraints it may become more difficult to schedule and execute your job. See the Slurm manual (https://slurm.schedmd.com) for more information and please note that not all constraint options are implemented on Spider. In case you are in doubt then please contact our helpdesk.

5.5.2. Spider-specific constraints

In addition to the regular sbatch constraints, we also have introduced a number of Spider-specific constraints that are tailored to the hardware of our compute nodes for the Spider platform.

These specific constraints need to be specified via constraint labels to sbatch on job submission via the option --constraint=<constraint-label-1>,<constraint-label-2>,...,<constraint-label-n>

Here a comma separated list implies that all constraints in the list must be fulfilled before the job can be executed.

In terms of Spider-specific constraints, we support the following constraints to select specific hardware:

SBATCH directive

Functionality

Worker Node

--constraint=skylake

cpu architecture

wn-db-[01-06]

--constraint=napels

cpu architecture

wn-hb-[01-05]

--constraint=rome

cpu architecture

wn-ca-[01-25], wn-ha-[01-05]

--constraint=ssd

local scratch

all nodes

--constraint=amd

cpu family

wn-ca-[01-25], wn-ha-[01-05], wn-hb-[01-05]

--constraint=intel

cpu family

wn-db-[01-06], wn-gb-[01-04], wn-gp-[01-02]

As an example we provide below a bash shell script hello_world.sh that executes a compiled C script called ‘hello’. In this script the #SBATCH line specifies that this script may only be executed on a node with 2 cpu-cores where the node must have a skylake cpu-architecture and ssd (solid state drive) local scratch disk space.

#!/bin/bash
#SBATCH -c 2 --constraint=skylake,ssd
echo "start hello script"
/home/[USERNAME]/[path-to-script]/hello
echo "end hello script"

From the command line interface the above script may be submitted to Slurm via:

sbatch hello_world.sh

Please note that not all combinations will be supported. In case you submit a combination that is not available you will receive the following error message:

‘sbatch: error: Batch job submission failed: Requested node configuration is not available’

5.6. Querying compute usage

5.6.1. Overview

sacct and sreport are slurm tools that allows users to query their usage from the slurm database. The accounting tools sacct and sreport are both documented on the Slurm documentation page.

These slurm queries result in a users total usage for a user. The sum of Raw CPU times / 3600 gives total core usage for the defined period. -d Produces delimited results for easier exporting / reporting

5.6.2. Examples

# look into the details of your usage by job
sacct \
   -X #sum\
   -S2020-07-01 -E2020-07-30 \
   --format=jobid,jobname,cputimeraw,user,alloccpus,state,partition,account,exitcode
#view the spexone project usage and your user's usage
sreport \
   -t second \
   -T cpu cluster \
   AccountUtilizationByUser \
   Start="2020-07-01" \
   End="2020-07-30"

See also

Still need help? Contact our helpdesk