If I have a JOBID=12345 of a running job, what is the command to attach to that job so that I can run top on the compute node so that I can get an idea of the resources being used?
To attach to a running job (job $JOBID)
srun --pty --jobid $JOBID /bin/bash
This will run inside the cgroup (CPU, RAM etc.) of the running job.
If you want to see the processes you can run top directly via
srun --pty --jobid $JOBID top
For jobs with multiple nodes you need first find the node you want to attach to via
scontrol show job $JOBID |grep NodeList
Then use the
-w switch to specify the node ($NODE)
srun --pty --jobid $JOBID -w $NODE /bin/bash
Alternatively, our cluster has the pam_slurm_adopt module installed that lets users just ssh from the submit host to any node they have a job running on.