Managing make on login (-j)

MaximilianKing · October 1, 2018, 3:57pm

Hey folks,

We have some issues where users run “make -j” on our systems login node which is causing massive performance issues for others connected. I know that if the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.

How do you handle users running make -j without a specified number of jobs? Do you simply document SOP is to always specify a number of jobs? Do you have people build on compute or interactive nodes rather than login? Do you use a wrapper for it? Are there other more proficient methods out there people are using?

Max

iki · November 2, 2018, 4:23pm

We don’t do anything to control this on our systems: it’s a problem that only comes up very, very rarely (usually when people are compiling C++ and icpc devours all the node’s RAM).

I guess in principle, you could probably set MAKEFILES in the default system profile to a file only containing the .NOTPARALLEL pseudo-target, which would inhibit parallel builds unless that variable was reset. This is a pretty opaque and surprising, though, with regards to the principles @jpessin1 mentioned.

I tend to use something like make -l $(( $(nproc) / 2 )) -j if I’m building on shared nodes myself: that starts as many jobs as it wants but won’t start new processes if the load is above half the number of cores.

jpessin1 · November 2, 2018, 5:08pm

+1 for make -l

jpessin1 · October 2, 2018, 2:52pm

For a shared environment, typical procedure is to use “make” on the compute nodes.

Most uses of “make” on a cluster are compiling for larger builds with an occasional long running tasks e.g. using make as a work-flow manager. Neither of which are appropriate for typical shared login-nodes.

On heterogeneous systems there is also the consideration of having the compiled program match the system it is being run on, which is part of a larger compatibility verse optimization question . (Modulefiles/LMOD is usually used to manage which runs where)

jpessin1 · October 2, 2018, 2:55pm

JM2C:

Capping the number of make threads with a wrapper seems like a challenge in terms of ‘Unix Philosophy.’ It is changing a behavior from one that is common to many systems, explicit in the documentation, and one that external programs such as cmake, automake/autoconf interact with.

If you simply hide it behind a wrapper it’s problematic in terms ‘Least Surprise,’ and ‘Transparency.’
Where as being verbose to let folks know can interfere with automated functions and piping.
https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond’s_17_Unix_Rules

vsoch · November 9, 2018, 8:35pm

If you want to control users running make, you have three options here.

Allow them to run it (meaning, do nothing).
Try to allow them to run it with some filter for catching “special cases”
Don’t allow them to run it.

Option 1 is obviously not idea, hence the posting of your question to begin with. Option 2 goes against (the user’s perception) of reliability and consistency, because it appears that running make is an okay thing to do, but then “uhoh, this one time…” and so I don’t think is ideal. Option 3 is not perfect because it should be the case that we are helping users to compile and do all the things they need to perform the task at hand.

Let’s step back though, how can we implement 3 (solving the issue) but also do so in a way that supports and educates? In the simplest case, we hide the binary and the user is upset that “make” cannot be found. We get a ticket. But what if we allowed them to find it, but used it to inform them how we wanted make to be used for our cluster? For example:

cd mysciencething/
make
# --- A Message from Research Computing ---#
# We provide make for you on an interactive node! Please run:
# $ sdev
# to launch your node and try this command again

from a very practical standpoint, this meets requirements to handle the running of make without hurting users, but also educating them how to make the best decision in the future. The next or subsequent time this comes up, they would likely launch sdev without thinking.