Ask.Cyberinfrastructure

How do you handle emergency resource requests with the SGE scheduler, while still maintaining fair access to all users?

sge
scheduler

#1

We occasionally get ‘emergency’ requests from users, asking for additional resources, or more often bumps in priority. How can we help, without being unfair to other users?


#2

There are two parts to this question: 1) how to manipulate the system to allow a user or group to run their jobs sooner and 2) how to do so fairly or equitably.

Manipulating things to favor a group or user:
Depending on your setup you can make (temporary) adjustments to the users’ queuing priorities, accessible queues, possibly within projects if you’re using them:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_priority.html
with http://gridscheduler.sourceforge.net/htmlman/htmlman1/qalter.html if the target job was already submitted.

Or if you are using projects:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/project.html

If you want to increase the concurrent job limits look at maxujobs – there are also many other options such as under weighting:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html

If XForwarding is enabled on your system, qmon wraps most of these in a GUI tool:

gridscheduler.sourceforge.net/htmlman/htmlman1/qmon.html

Doing so “equitably”
…is problematic.
(Just my two cents, here)

In most cases, within a given HPC system, either there is enough spare capacity to handle the jobs, in which case it is a non-issue OR there isn’t, in which case any added/new resources the emergency request receives are essentially removed from other users.

While this may be appropriate in a given case, if the normal settings are equitable, then I would doubt most would consider it fair to add a bias in someone’s favor.

The only exceptions are edge cases where there is both enough room for all, AND this job needs administrator level help to run (though this might also be a sign of a configuration/usecase mismatch).