Ask.Cyberinfrastructure

How to attach to a running job to run top on compute node

If I have a JOBID=12345 of a running job, what is the command to attach to that job so that I can run top on the compute node so that I can get an idea of the resources being used?

To attach to a running job (job $JOBID)

srun --pty --jobid $JOBID /bin/bash

This will run inside the cgroup (CPU, RAM etc.) of the running job.

If you want to see the processes you can run top directly via

srun --pty --jobid $JOBID  top

For jobs with multiple nodes you need first find the node you want to attach to via

scontrol show job $JOBID |grep NodeList

Then use the -w switch to specify the node ($NODE)

srun --pty --jobid $JOBID -w $NODE /bin/bash
1 Like

Alternatively, our cluster has the pam_slurm_adopt module installed that lets users just ssh from the submit host to any node they have a job running on.