What is the most efficient Python package to run a parallel job over multiple nodes?



I would like to parallelize my Python script over multiple nodes on the cluster. Which Python package is most efficient for this purpose?



For the python ecosystem, consider using Dask which provides advanced parallelism for analytics. Why use Dask versus (or along with) other options? Dask integrates with Numpy, Pandas, and Scikit-Learn, and it also:

See example Dask Jobqueue for PBS cluster:

from dask_jobqueue import PBSCluster
cluster = PBSCluster()
cluster.scale(10)         # Ask for ten workers

from dask.distributed import Client
client = Client(cluster)  # Connect this local process to remote workers

# wait for jobs to arrive, depending on the queue, this may take some time

import dask.array as da
x = ...                   # Dask commands now use these distributed resources


If you are looking for parallel processing the traditional (and still very valid) approach is use an MPI library,
mpi4py is an example of a python based wrapper, and
includes a good overview of the concepts and related methods. (Not an endorsement, just not reinventing the wheel here)

Some other things to consider:
Would it be less work to make the job fit on a single node? With tools like concurrent.futures (or the underlying multiprocessing & threading modules) or mixed tools like numpy/scipy/pandas with Cython?

Would a faster python implementation (like pypy) provide enough speed?

Not that they go away when you move to multi-node, but they are often, though not always sufficient and less demanding of the user/developers time.


@katia As an aside if you are trying to bring more hardware resources to bear but the jobs don’t require true parallelization, message queuing is an alternate/async approach, (queue is in the standard lib), there are several message queuing (MQ) systems out there for or in python off the top of my head zMQ (null-MQ), and RabbitMQ