If you are looking for parallel processing the traditional (and still very valid) approach is use an MPI library,
mpi4py is an example of a python based wrapper, and https://mpi4py.readthedocs.io/en/stable/intro.html
includes a good overview of the concepts and related methods. (Not an endorsement, just not reinventing the wheel here)
Some other things to consider:
Would it be less work to make the job fit on a single node? With tools like concurrent.futures (or the underlying multiprocessing & threading modules) or mixed tools like numpy/scipy/pandas with Cython?
Would a faster python implementation (like pypy) provide enough speed?
Not that they go away when you move to multi-node, but they are often, though not always sufficient and less demanding of the user/developers time.