Ask.Cyberinfrastructure

How can one determine the amount of RAM a node has in an HPC environment?

memory
zeta:for-sc18

#1

How can one determine the amount of RAM a node has in an HPC environment?

CURATOR: torey


#2

The best way is to consult your site’s documentation.

On SLURM, one can do this in two steps:

  1. invoke sinfo to see the list of nodes and their states

  2. invoke srun to run the free command on the desired compute node. For example, for node named r001, invoke: srun -w r001 free.

Here is a real example from PSC Bridges supercomputer:

$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
RM*           up 2-00:00:00      1 drain* r242
RM*           up 2-00:00:00      1   comp r400
RM*           up 2-00:00:00      1  drain r668
RM*           up 2-00:00:00      9   resv r[405-412,670]
RM*           up 2-00:00:00    702  alloc r[006-241,243-399,401-404,413-667,669,671-719]
RM-shared     up 2-00:00:00     21    mix r[720-721,733-747,749-752]
RM-shared     up 2-00:00:00      3  alloc r[723-724,748]
RM-shared     up 2-00:00:00      9   idle r[722,725-732]
...

The nodes under “alloc”, “drain”, “resv” states cannot be reached, so let’s try to see the free memory of r720 (it is on the non-default partition RM-shared so we have to specify that):

srun -p RM-shared -w r720 free
              total        used        free      shared  buff/cache   available
Mem:      131734464    13434632   107366248      560604    10933584   116069696
Swap:      17591292     2375984    15215308

Looks like we have 128GB of total RAM. That matches what is said on its manual page, here (r720 is one of the regular memory nodes): https://www.psc.edu/bridges/user-guide/system-configuration .


#4

In GridEngine Family systems you can pull up what the Queue Scheduler is using with qhost.


#3

The method is dependent on the resource manager or scheduler in use at your site. With Slurm you can quickly see this in the output of

scontrol show node $NODENAME

Replace $NODENAME with the actual name of the node you’re interested in. You may leave the node name off and get a listing that includes all nodes.


#5

If you are inspecting directly and need more details than free gives, most systems will let you run other tools like lshw on the job node as well.