How can I see the names of the nodes my multi-node MPI job is using on our SGE cluster?

sge
mpi
programming-for-hpc
scheduler
parallelization
question

#1

I am running an MPI job using 8 nodes with 16 cores each. When I execute qstat -u username command it shows only the master node. How can I view all the nodes that are used for my job?

Curator: Katia


#3

This will not give you the precise answer if you have multiple, multiple node MP jobs running, but will give you all the nodes that all your jobs are running on:

http://moo.nac.uci.edu/~hjm/qbetta

It’s a fast and dirty perl script that merges the output of ‘qstat -s r’ and ‘qhost -h (host) -q’ and does some dirty math on the result to show which nodes are under/overloaded.

grep the result for anything you want (usually hostnames or usernames).

Here’s a stanza of output. takes no option - just grep for what you want.

Shows most of the info shown from 'qhost -q' and 'qstat -s r' but in one
line.  Also shows whether a node is over (+) or under(-) loaded.  At the end
of each line is the status of all Qs that use this node.  Only compute nodes
are shown in this output.
           under/    CPUs           RAM        (Assigned/Total)
HOSTNAME    over  USED/TOTAL     USED/TOTAL    Queue    v  [flags] users,jobs
compute-1-10    64.06 /  64      3.5G / 126.2G  free64(64/64) vturlo,64  tw(0/64) 
compute-1-11    64.03 /  64      3.9G / 126.2G  free64(64/64) vturlo,64  tw(0/64) 
compute-1-12  - 27.01 /  64      1.8G / 126.2G  free64(0/64)[S] tw(24/64) frankes,24  
compute-1-13  -  0.05 /  64      3.8G / 252.4G  
compute-1-14  -  0.53 /  24      7.2G /  94.7G  free24i(0/24)[S] gpu(3/24) staimour,2  yoshitom,1  
compute-1-2   -  3.99 /  64      3.9G / 252.4G  abio(0/64) free64(64/64) jfarran,61  vojh1,3  sf(0/64) 
compute-1-3     64.04 /  64      6.2G / 252.4G  free64(64/64) meganjm1,64  
compute-1-4     64.07 /  64      4.2G / 252.4G  air(0/32) chem(0/32) free64(64/64) vturlo,64  

To run it, you’ll also need scut


#2

qstat by default tries for easy reading - and limits output to one line per, but there are several way to customize the output. $ man qstat or
http://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html
has number of options for outputs.

In this case the -g flag might do the trick, add -g t and it will give the info one line per process/processor and a bit more info.

For example if you want to keep a snapshot of this for later parsing:
$ qstat -g t -u username >> myQstatOutput

as explained by the man page

With -g t parallel jobs are displayed verbosely in a one line per parallel job task fashion. By default, parallel job tasks are displayed in a single line. Also with the -g t option, the function of each parallel task is displayed, rather than the jobs slot amount (see section OUTPUT FORMATS).

** EDIT: Additionally qhost -j will give you jobs by host set.
Both can be select jobs by user name with the -u flag.