Ask.Cyberinfrastructure

My submission to the "bigmem" node violates account policies

slurm
sherlock
#1

I have a job that I want to run on the “bigmem” node, and so I am trying to submit it like this:

sbatch --partition=bigmem parseEncoding.sh

But I immediately get this error:

sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

In the old Sherlock we used to have to set the --qos, but setting it here is incorrect and also results in error (I found this in the Sherlock 2.0 documentation.

0 Likes

#2

To make sure that jobs are not submit that don’t need the bigmem node, we require that the memory requested is at least 128GB. Since you don’t request any memory, it is assuming the default and denying the request (as your job would be able to run on a regular node).

To fix this and allocate the job to the node just add a specification for memory with --mem

sbatch --partition=bigmem --mem 130000 parseEncoding.sh

Note that the unit is in MB (130 GB is 130,000 MB)

How to see limits for a node

The single command scontrol show qos will show you a huge table, and for a formatted version try:

sacctmgr show qos format=Name,MaxTRESPerUser,MaxSubmitJobsPerUser,MaxJobsPerUser,MaxWall,MaxTRESPA,MaxSubmitJobsPerAccount -r normal,owners,gpu,dev,bigmem,long,owner

      Name     MaxTRESPU MaxSubmitPU MaxJobsPU     MaxWall     MaxTRESPA MaxSubmitPA 
---------- ------------- ----------- --------- ----------- ------------- ----------- 
    normal       cpu=512        1000            2-00:00:00      cpu=1024        2000 
       dev  cpu=2,mem=8G           2              02:00:00     cpu=99999          32 
      long        cpu=32          20        16  7-00:00:00                        40 
    bigmem        mem=3T          10            1-00:00:00        mem=6T          20 
       gpu    gres/gpu=8          50            2-00:00:00   gres/gpu=24         100 
     owner     cpu=99999        3000            7-00:00:00     cpu=99999        5000 
    owners      cpu=2048        3000            2-00:00:00      cpu=4096        5000 

Information using sinfo

Another option to “inspect” a node is with sinfo. For example:

$ sinfo -N -p bigmem --long
Fri Nov 23 03:31:27 2018
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
sh-02-13       1    bigmem        idle   32    4:8:1 153600        0 108491 CPU_GEN: none                
sh-02-14       1    bigmem        idle   32    4:8:1 153600        0 108491 CPU_GEN: none                
sh-112-01      1    bigmem        idle   56   4:14:1 307200        0 109661 CPU_GEN: none                
sh-112-02      1    bigmem        idle   32   2:16:1 512000        0 106461 CPU_GEN: none  
0 Likes