Cluster administration
Rack layout
Diagram can be edited with diagrams.net using the source code available here
Checking and restarting nodes
For a full list of all available nodes, type:
scontrol show node
A given node status can be checked out with:
scontrol show node NODENAME
Sometimes, a node state will change to DRAIN for multiple reasons: i.e. slurm could not gracefully kill a job on a given node. Drained nodes won't handle incoming jobs until they are manually reset. To do so, the following command must be typed with admin privileges:
scontrol update NodeName=NODENAME State=RESUME