How to check if a job has been completed in SLURM

Hi,

Do you know how to check if a job has been completed in SLURM? Thank you.

You can use the sacct command: “sacct is a scheduler command used to display accounting data for all jobs and job steps in the SLURM job accounting log or SLURM database”

It displays information about the job including its status. The ‘State’ gives the status of the job; job states include COMPLETED, FAILED, CANCELLED, or RUNNING.

Use the --format option to control the output of the sacct

  • Example 1:
    sacct -j <job_id> --format=JobID,JobName,State
    Displays only the JobID, JobName, and State information for the specified job ID:

  • Example 2
    sacct -j <job_id> -o jobid,submit,start,end,state
    -o is short for --format

Run sacct --helpformat to get the list of available fields

Actually, the job status can be simply checked by squeue command and the option -u followed by the specific user name. Then, a detailed summary of about the job running status can be viewed, by which the user can directly judge whether the job is done.

#SBATCH --mail-user=<your email>
Set different types of notifications:
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=REQUEUE
#SBATCH --mail-type=ALL

Another option is with a slurm command (get history of your submission)

sacct --starttime 2023-06-01 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist

Here are a few ways to check if a job has been completed:

  1. squeue: The squeue command displays information about jobs in the Slurm queue, including their status. To check the status of a specific job, use the -j option followed by the job ID.
squeue -j <job_id>

If the job is completed, you won’t see it in the output.

  1. sacct: The sacct command provides detailed accounting information for jobs. You can use it to check the status of finished jobs as well.
sacct -j <job_id>

This command will show you various details about the job, including its completion status.

  1. scontrol: The scontrol command allows you to query and modify job and job step attributes. You can use it to directly query the job status.
scontrol show job <job_id>

This command will provide detailed information about the job, including its current status.

1 Like

I like to set a generic alias that any user can run to check the statuses of their current jobs. I set the alias to run the command

queue -u $USER

$USER is predefined in almost all shells, so setting an alias to this command is a generic way to see what jobs you have running.

Here is an example:

[nucci@p-sc-2340 ~]$ alias sq='squeue -u $USER'
[nucci@p-sc-2340 ~]$ sq
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           4241048      open test-tes    nucci PD       0:00      1 (BeginTime)
           4241049      open test-tes    nucci PD       0:00      1 (BeginTime)
           9849012  sla-prio RoarColl    nucci  R    1:36:25      1 p-sc-2340
[nucci@p-sc-2340 ~]$

For SLURM directives, I like to use the following in my job script:

#SBATCH --mail-user=<your email here>
#SBATCH --mail-type=ALL,TIME_LIMIT_80,TIME_LIMIT90

mail_type ALL is not really ‘ALL’, so I like to add the TIME_LIMIT_80 and TIME_LIMIT_90 to also send to me a message when jobs approach 80% and 90% of its allocated wall time limit.