Ask.Cyberinfrastructure

Access to /dev/infiniband from user space

Hi,

I’m trying to get a singularity container to run using the infiniband network on a cluster I have access to. I can get it to run using MPI fine, but it’s only using TCP/IP and hence the MPI performance is 10x slower than it should be.

Tracing through where things are going wrong it looks like it’s failing where it’s trying to write to: /dev/infiniband/uverbs0. It looks like it doesn’t have permission to write into there, although such a call works fine for applications run outside singularity (for debugging all I’m running is ibv_devinfo inside and outside singularity and stracing what happens).

Anyone any ideas why this would happen or what I should do to get around this issue?

thanks

Hi Adrian,

Singularity recommendations explicitly says 'To support infiniband the container must support it". It means that you have to install infiniband libraries and link MPI to them inside the container.

IMHO, bind-mount external libraries can be a dangerous approach and must be done carefully. We can tell that bind-mounting host libraries inside the container is safe whenever these libraries maintains API/ABI compatibility.

Can you tell us which library are you overwriting that is the one incompatible (host-container versions)?

Here I’ve a singularity recipe to install infiniband libraries, It’s old stuff and probable there are more up-to-date recipes anywhere:

Here is a solution in one of the singularity issues:

Finally, containers are usually built outside of the HPC and libraries/application must be compiled against OpenMPI/PMIx/IB stuff before moving the container to the infrastructure. If you decide to bind-mount host libraries, do not forget to install things before into the container.

Hope it helps!

Víctor

1 Like

In addition, I would like to also post another (debian based) Singularity recipe piece where I install infiniband dependencies:

...
apt-get install -y dkms infiniband-diags libibverbs* ibacm librdmacm* libmlx4* libmlx5* mstflint libibcm.* libibmad.* libibumad* opensm srptools libmlx4-dev librdmacm-dev rdmacm-utils ibverbs-utils perftest vlan ibutils 

Here I’m installing too much things, probably the installation of some of these libraries can be avoided.

Best,
Víctor

Hi Victor,

Thanks, I think the issue I encountered here was a difference between the versions of the infiniband drivers on the host and installed inside the container. Getting the correct infiniband drivers to install in the container is not always possible, i.e.:
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
The only available drivers are the latest version and don’t match those installed on the host system. In this case it seems the only way to get the container to use infiniband rather than TCP/IP is to bind the host infiniband drivers.

1 Like

I should have said, this is Singularity 3.0.3, I’ve installed the infiniband drivers inside the container and strace is showing they are being found. It is likely the infiniband libraries inside the container are not exactly the same version as on the system. At the point I’m getting this error I’ve not yet touched MPI, I’m still just trying to get the infiniband tools working (i.e. ibv_devinfo which should just print out the details information about the infiniband devices I have in the system). ibstat does work, so the container can see the infiniband device is there, but it cannot access it to get detailed information (which I can do outside the container).

This is a response after some discussion from one of the users on the Singularity list, posted with permission, and summarized.


On our HPC cluster, I bind infiniband related libraries and folders from host to container, I’m able to run ibv_devinfo correctly. Here we first start on the host. LD_LIBRARY_PATH is set as as:

$ export LD_LIBRARY_PATH=$MY_LD_LIBRARY_PATH:.:/host/lib:$LD_LIBRARY_PATH

And then we can write a quick testing script.

[wang@c17-04 osu-bench]$ cat run-test2.sh 
#!/bin/bash
img=/beegfs/work/public/singularity/ubuntu-18.10.simg
ib=/etc/libibverbs.d
for lib in /opt/slurm/lib64/lib*.so* /usr/lib64/libosmcomp.so.3* /usr/lib64/libmlx*.so* /usr/lib64/libi40iw-rdmav2.so* /lib64/libib*.so* /usr/lib64/libnl.so*; do
    ib="$lib:/host/lib/$(basename $lib),$ib"
done
singularity exec --bind /opt/slurm,/usr/bin/ibv_devinfo,$ib $img ibv_devinfo

Notice how the final command will execute ibv_devinfo to the container, given all the binds are done.
This is part of the script above, but I’ll rewrite for clarity:

$ singularity exec --bind /opt/slurm,/usr/bin/ibv_devinfo,$ib $img ibv_devinfo

And then to put it all together, here is the output running the script on the host:

[wang@c17-04 osu-bench]$ sh run-test2.sh 
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1

        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand

The setup and details might of course vary by cluster, but this could be a good start for testing.

Thanks Adrian,

I will take this into account for the next infrastructure I test!

Best,
Víctor

Thanks, in the end, for me, this is what I did:

export SINGULARITY_CONTAINLIBS=/lib64/libmlx5-rdmav2.so,/lib64/libibverbs.so,/lib64/libibverbs.so.1,/lib64/libmlx4-rdmav2.so

mpirun -x SINGULARITY_CONTAINLIBS --prefix /lustre/home/z04/adrianj/openmpi/2.1.0 --mca btl openib --hostfile $PBS_NODEFILE …

This was after installing the infiniband libraries in the container and building OpenMPI correctly in both places.

1 Like