What is the most efficient Python package to run a parallel job over multiple nodes?

ktrn · April 13, 2018, 2:58pm

I would like to parallelize my Python script over multiple nodes on the cluster. Which Python package is most efficient for this purpose?

CURATOR: Katia

aculich · August 20, 2018, 9:44pm

For the python ecosystem, consider using Dask which provides advanced parallelism for analytics. Why use Dask versus (or along with) other options? Dask integrates with Numpy, Pandas, and Scikit-Learn, and it also:

scales up to clusters with multiple nodes
deployable on job queuing systems like PBS, Slurm, MOAB, SGE, and LSF
also scales down to parallel usage of a single-node such as a server or laptop— modern laptops often have a multi-core CPU, 16-32GB of RAM, and flash-based hard drives that can stream through data several times faster than HDDs or SSDs of even a year or two ago.
supports a map-shuffle-reduce pattern popularized by Hadoop and is a smaller, lightweight alternative to Spark.
works with MPI via mpi4py library (see @jpessin1’s suggestion above) and compatible with infiniband or other high speed networks.

See example Dask Jobqueue for PBS cluster:

from dask_jobqueue import PBSCluster
cluster = PBSCluster()
cluster.scale(10)         # Ask for ten workers

from dask.distributed import Client
client = Client(cluster)  # Connect this local process to remote workers

# wait for jobs to arrive, depending on the queue, this may take some time

import dask.array as da
x = ...                   # Dask commands now use these distributed resources

jpessin1 · May 25, 2018, 10:14pm

If you are looking for parallel processing the traditional (and still very valid) approach is use an MPI library,
mpi4py is an example of a python based wrapper, and https://mpi4py.readthedocs.io/en/stable/intro.html
includes a good overview of the concepts and related methods. (Not an endorsement, just not reinventing the wheel here)

Some other things to consider:
Would it be less work to make the job fit on a single node? With tools like concurrent.futures (or the underlying multiprocessing & threading modules) or mixed tools like numpy/scipy/pandas with Cython?

Would a faster python implementation (like pypy) provide enough speed?

Not that they go away when you move to multi-node, but they are often, though not always sufficient and less demanding of the user/developers time.

jpessin1 · August 18, 2018, 8:35pm

@ktrn As an aside if you are trying to bring more hardware resources to bear but the jobs don’t require true parallelization, message queuing is an alternate/async approach, (queue is in the standard lib), there are several message queuing (MQ) systems out there for or in python off the top of my head zMQ (null-MQ), and RabbitMQ