Maintenance instructions
New Spider release Feb 2021
Highlights in this release:
OS update of the cluster to CentOS 8
SLURM scheduler software update to version 20.11.3
Refactoring our resource provisioning and software deployment practices
Optimising our external network interfaces
Increasing the cluster capacity with two new worker nodes
Adding a short queue for small testing jobs
Testing phase
Dates: 03/02/2021 - 12/02/2021
During the testing phase you can test your pipelines on two new worker nodes, fully configured with the upgrades outlined above and confirm whether or not your work will be impacted by the upgrade. This is optional but highly recommended to ensure that your jobs will be running as expected after the upgrade of the production cluster.
Note
Any jobs running on the upgraded nodes during the testing phase will be killed when the testing phase ends, on 12/02/2021 end of the day.
Note
Any jobs running on the production nodes (i.e. excluding the upgraded nodes) during the testing phase will not be affected.
How to test your pipelines
You may submit your jobs on the new nodes by using any of the following directives. Any of the options below will make sure that your jobs run on an upgraded node:
SBATCH directive |
Functionality |
Usage example |
---|---|---|
|
access to the upgraded nodes |
|
|
access to cluster nodes running Centos 8 |
|
|
access to cluster nodes of cpu familiy ‘rome’ |
|
Useful commands
sinfo -a
: check all available partitions and the associated nodes
scontrol show node "wn-ca-01"
or scontrol show node "wn-ca-02"
: check the specifications of the upgraded nodes
Known issues
CentOS 8 does not provide to users a python interpreter, only python3. If you still use python2 in your code, you may want to consider decommissioning your old environment in favor of python3 or use conda or Singularity containers to create an execution environment tailored to your preferred python version.
The grid software stack including proxy authentication is not available on the upgraded nodes. We are working on a solution. -> SOLVED
The
TMPDIR
variable is not available on the upgraded nodes. We are working on a solution. -> SOLVED
Feedback
Please report any issues or other feedback from testing your pipelines on the upgraded nodes to our Service Desk.
Downtime phase
Dates: 15/02/2021 - 19/02/2021
During the downtime phase the login node will not be reachable and new jobs won’t be accepted. Your data on dCache will still be accessible from any other computer/platform outside Spider.
Note
On 12/02/2021 end of the day, the cluster will stop accepting new jobs. Any running jobs on the production worker nodes will not be affected. Any jobs running on the upgraded nodes or the infinite
partition will be killed.
Getting in touch
Please contact our Service Desk in case of any questions regarding the downtime.