Access to /dev/infiniband from user space

This is a response after some discussion from one of the users on the Singularity list, posted with permission, and summarized.


On our HPC cluster, I bind infiniband related libraries and folders from host to container, I’m able to run ibv_devinfo correctly. Here we first start on the host. LD_LIBRARY_PATH is set as as:

$ export LD_LIBRARY_PATH=$MY_LD_LIBRARY_PATH:.:/host/lib:$LD_LIBRARY_PATH

And then we can write a quick testing script.

[wang@c17-04 osu-bench]$ cat run-test2.sh 
#!/bin/bash
img=/beegfs/work/public/singularity/ubuntu-18.10.simg
ib=/etc/libibverbs.d
for lib in /opt/slurm/lib64/lib*.so* /usr/lib64/libosmcomp.so.3* /usr/lib64/libmlx*.so* /usr/lib64/libi40iw-rdmav2.so* /lib64/libib*.so* /usr/lib64/libnl.so*; do
    ib="$lib:/host/lib/$(basename $lib),$ib"
done
singularity exec --bind /opt/slurm,/usr/bin/ibv_devinfo,$ib $img ibv_devinfo

Notice how the final command will execute ibv_devinfo to the container, given all the binds are done.
This is part of the script above, but I’ll rewrite for clarity:

$ singularity exec --bind /opt/slurm,/usr/bin/ibv_devinfo,$ib $img ibv_devinfo

And then to put it all together, here is the output running the script on the host:

[wang@c17-04 osu-bench]$ sh run-test2.sh 
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1

        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand

The setup and details might of course vary by cluster, but this could be a good start for testing.