We have a user who is trying to execute multiple containers using MPI on our Slurm cluster, but the job hangs with no output.
Here’s a minimally reproducible example:
Running this works fine: srun -n 5 apptainer exec docker://alpine cat /etc/alpine-release
However, things hang when running the following: srun --mpi=pmix -n 5 apptainer exec docker://alpine cat /etc/alpine-release
This behavior is the same for the self-built container our researcher is using, except running it without --mpi results in expected MPI errors, whereas running it with it hangs with no output.
Can anyone provide some insight as to why this is happening?
I am able to run this at PSC on Bridges2, except I swap --mpi=pmi2 as there is no pmix option on Bridges:
$ srun --mpi=pmi2 -n 5 apptainer exec docker://alpine cat /etc/alpine-release
srun: job 30687221 queued and waiting for resources
srun: job 30687221 has been allocated resources
INFO: Using cached SIF image
INFO: Using cached SIF image
INFO: Using cached SIF image
INFO: Using cached SIF image
INFO: Using cached SIF image
3.21.3
3.21.3
3.21.3
3.21.3
3.21.3
On the PSU RC cluster, I cannot even get your minimally reproducible test case to run from an interactive desktop session:
$ srun --mpi=pmi2 -n 5 apptainer exec docker://alpine cat /etc/alpine-release
srun: warning: can't honor --ntasks-per-node set to 1 which doesn't match the requested tasks 5 with the maximum number of requested nodes 1. Ignoring --ntasks-per-node.
srun: error: Unable to create step for job 37007252: More processors requested than permitted
I suspect it may have to do with the way SLURM handles an environment export.
As your minimally reproducible test case with pmi2 runs fine at PSC, I suspect the PSU RC cluster has configuration issues either with SLURM or with the default MPI.