There seem to be three main possible places to load modules into one’s compute environment. These include the bash_profile or bashrc file, within the module itself, and in the script one uses to submit a job to the cluster. What are pros and cons of each?
I generally recommend the use of all three locations, but for different purposes:
- inside of a modulefile:
I recommend use of this either for “toolchain”/“meta-module” modules, e.g. you create a module to
load your standard compilation environment (i.e. your preferred compiler, MPI lib, linalg packages, etc). (This might be better done with Lmod “User Collections” (module save/restore) in lmod)
I think this is also useful when the software being made available with a particular modulefile requires certain other packages. E.g., if a module for “foo” requires a particular compiler/MPI lib,
it might be good to module load that compiler/MPI library in the foo modulefile. (With Lmod, it is probably better to have the foo module in the appropriate module path so it only becomes available
when the required compiler/MPI lib/etc have been previously loaded)
-
In .bashrc or other dot files:
This is good for the user’s default set of modules. I would strongly recommend that these modules only get loaded for interactive shells (e.g. in a “if [ ! -z “$PS1” ]; then” block or similar). This way if an user always wants a particular version of matlab and gcc to be loaded, they do not have to manually type module load every time they log in. -
In job scripts:
I recommend that job scripts explicitly load all the modules they need, and that they should in general explicitly give the versions of the packages. This is for documentation and reproducibility.
If an user needs to return to a job and re-run it after 9 or 10 months, they might encounter unexpected and undesired problems if the version of a package they were using was “defaulted” and that version changed. Or if the user decided to change what their default compiler (loaded by .bashrc) is during that time.
This is also useful when one wishes to do benchmarking or test new versions of an app. I.e., if I submit lots of Matlab jobs, and want to test if the latest version works with my code, I can have my 25 production runs using the penultimate version, and submit a test job with the new matlab version at the same time. This would be more complicated if using .bashrc.