Hi,
Do you know how to check if a job has been completed in SLURM? Thank you.
Hi,
Do you know how to check if a job has been completed in SLURM? Thank you.
You can use the sacct command: âsacct is a scheduler command used to display accounting data for all jobs and job steps in the SLURM job accounting log or SLURM databaseâ
It displays information about the job including its status. The âStateâ gives the status of the job; job states include COMPLETED, FAILED, CANCELLED, or RUNNING.
Use the --format option to control the output of the sacct
Example 1:
sacct -j <job_id> --format=JobID,JobName,State
Displays only the JobID, JobName, and State information for the specified job ID:
Example 2
sacct -j <job_id> -o jobid,submit,start,end,state
-o is short for --format
Run sacct --helpformat to get the list of available fields
Actually, the job status can be simply checked by squeue command and the option -u followed by the specific user name. Then, a detailed summary of about the job running status can be viewed, by which the user can directly judge whether the job is done.
#SBATCH --mail-user=<your email>
Set different types of notifications:
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=REQUEUE
#SBATCH --mail-type=ALL
Another option is with a slurm command (get history of your submission)
sacct --starttime 2023-06-01 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
Here are a few ways to check if a job has been completed:
squeue command displays information about jobs in the Slurm queue, including their status. To check the status of a specific job, use the -j option followed by the job ID.squeue -j <job_id>
If the job is completed, you wonât see it in the output.
sacct command provides detailed accounting information for jobs. You can use it to check the status of finished jobs as well.sacct -j <job_id>
This command will show you various details about the job, including its completion status.
scontrol command allows you to query and modify job and job step attributes. You can use it to directly query the job status.scontrol show job <job_id>
This command will provide detailed information about the job, including its current status.
I like to set a generic alias that any user can run to check the statuses of their current jobs. I set the alias to run the command
queue -u $USER
$USER is predefined in almost all shells, so setting an alias to this command is a generic way to see what jobs you have running.
Here is an example:
[nucci@p-sc-2340 ~]$ alias sq='squeue -u $USER'
[nucci@p-sc-2340 ~]$ sq
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4241048 open test-tes nucci PD 0:00 1 (BeginTime)
4241049 open test-tes nucci PD 0:00 1 (BeginTime)
9849012 sla-prio RoarColl nucci R 1:36:25 1 p-sc-2340
[nucci@p-sc-2340 ~]$
For SLURM directives, I like to use the following in my job script:
#SBATCH --mail-user=<your email here>
#SBATCH --mail-type=ALL,TIME_LIMIT_80,TIME_LIMIT90
mail_type ALL is not really âALLâ, so I like to add the TIME_LIMIT_80 and TIME_LIMIT_90 to also send to me a message when jobs approach 80% and 90% of its allocated wall time limit.