What are some ways to organize the modules created to manage multiple versions and combinations of compilers, tools, drivers and libraries on a shared cluster?



Modules created to handle multiple versions of compilers, tools, drivers and libraries on a shared cluster need to be presented in a manner that promotes efficient loading to the user’s environment. Interdependence is prevalent among individual modules; for a given application, for example, the user may need specific libraries and compiler environments to successfully execute a calculation. What are the most effective ways to handle these?

CURATOR: torey



I would suggest using LMOD, this is TACC’s environment module system. LMOD forces those dependencies, whether it is compilers or libraries, to be loaded first before you can see the software that is available with those combinations. This eliminates the issue of having the wrong library in your path when running software linked to it. As they put it "Lmod is a Lua based module system that easily handles the MODULEPATH Hierarchical problem. "



As syockel stated, LMOD is a good solution.

Alternatively, we have had success with standard (Tcl) modules using .modulerc files. We would use
multiple paths in the module tag name, specifying the compiler, MPI libs, etc. the package depended
on. Modulerc files would then be used to detect what if any compiler/MPI lib, etc. loaded and default the module path accordingly. E.g., for package foo version 1.2.3 built with gcc/6.1.0 and openmpi/1.10.2 and netcdf/, we might have the following modules:

foo/1.2.3/netcdf/ and
(these would typically either be symlinks to each other, or more typically tiny stub files (just defining
variables for versions of foo, netcdf, compiler, mpi) which then include a common module file for all
“foo” modules)

The .modulercs under foo and foo/1.2.3 would default to netcdf.
The .modulercs under foo/netcdf, foo/1.2.3/netcdf would default to the version of netcdf previously loaded (and if none previously loaded, get the most recent in the directory)
The .modulercs under the netcdf/ dirs would default to the compiler family (e.g. gcc)
The .modulercs under the netcdf/ dirs would default to the gcc version.
The .modulercs under netcdf/ would default to the MPI family
The .modulercs under netcdf/ to the OpenMPI version previously loaded.

As the modulercs to default on family or version of compiler, MPI, netcdf, etc. are used for many packages, and are identical, these can all be symlinks to a stock modulerc.select_compiler_family, etc script in an utilities directory.

This approach offers some more flexibility in some respects than the lmod approach (e.g. one could use .modulerc files to default simd support levels based on hostname), but is also more work to maintain. It also does not support “module swap” well, and requires a patch to the old Tcl modulefiles code (there was a bug present in many versions that always evaluated .modulerc files
in “load” mode — this would cause errors re compiler/app mismatches in the .modulerc files to get displayed during module avail, etc). And although it looks like someone started supporting Tcl
modules once again, I am not sure that all of this works with the new updates.

In short, we have used the above successfully, but I expect we will be switching to Lmod for our next cluster. But if anyone wants more detail on the above, feel free to contact me.