Ask.Cyberinfrastructure

How do I know if my job is parallelizing when submitted as a batch script?

python
scripting
parallelization
programming-for-hpc

#1

I am trying to parallelize my Python job, but I am not sure if it runs using many cores. I looked at the time of the execution and the time of the execution is a little shorter than when I submit the same job without parallelization, but I expected that it would be 4 times shorter since I am using 4 cores. How do I know if the program actually using all 4 cores?

Curator: Katia


#2

ANSWER: One quick and easy way is to run top and examine the %CPU column. If the %CPU exceeds 100% you have multiple cores working on it. Remember, you have to have loops that are time-consuming enough to make multiple core worthwhile. The setup and tear down for parallel loops is 1 or 2 orders of magnitude slower than memory access operations. That means you need to have a fairly beefy processing problem to really rack up parallel processing time.


#3

ANSWER: You can also use the ‘time’ command. It looks like this:

time myexecutable --myparameters

At the completion of the execution you’ll get a report that looks something like this:

real 81m59.485s
user 1707m42.779s
sys 9m31.001s

The ratio between user and real shows how efficiently you were using processing. In this case the 21:1 ratio shows how efficiently I used the 24 cores associated with this run


#4

COMMENT:

@lwhitsel Actually, with many R parallelization packages instead of 1 line with CPU column close to 400%, the top command will show 4 lines with R processes each close to 100%.