HPC software management - how does your institution do it?

andrew.monaghan · January 25, 2023, 4:05pm

Dear colleagues,

CU Boulder Research Computing is in the planning stages of re-thinking how we manage and install software on our HPC cluster. To date, we have largely installed software packages “by hand” (as opposed to using package managers such as Spack, or models such as OpenHPC) and then made them available as modules via Lmod. This approach has had the advantage of allowing us to optimize the software for our system, and to work around some of the unique quirks that exist on any given HPC cluster. The downside is that custom software installations are time consuming, and possibly unsustainable for continuing to maintain our software stack, given the need to support an increasing number of applications, as well as ensure the software is robust to the growing heterogeneity on our incrementally-funded HPC cluster.

We are presently reaching out to peer institutions in order to better understand what others are doing and inform planning. Would you be willing to share with us how you manage software on your system and opinions on what has worked and what hasn’t?

Thanks in advance for any insight you can provide!

nucci · January 26, 2023, 4:27pm

We use a hybrid model based on Spack and manually installed and maintained codes.

The Spack-based software is persistent or semi-persistent: versions that change slowly over time (1 or 2 a year) and form the core of our install base. Think compilers, libraries, R, MATLAB, etc.

The hand-installed codes are usually special case codes, restricted software, experimental, or codes that require multiple versions. Example codes here might be VASP, Quantum Espresso, gaussian, custom-built R, etc. These are usually installed and maintained by one or more subject matter experts (SMEs) who are responsible for the entire lifecycle.

Included in this model is a third tier of software availability where PI groups have managed software installed and maintained within their own group disk space. This is useful for groups to maintain their own anaconda environments, for example. A combination of our own SMEs and PI group members then maintain their local software stack.

This approach is messy and could use some optimization/improvement, but it provides a lot of flexibility and spreads the workload out. It is especially useful to have SMEs tending to their areas of expertise. It also gives us a chance to roll out (updated, beta, etc.) versions for testing without impacting production work.

Everything is managed by lmod, even access to group-local software. Key to making sense of all of this is requiring good documentation, especially from the SMEs. We also have a local gitlab repo for maintaining sources, modules, and documentation. This way anyone can go in and build/rebuild a particular set of software.

ChuckP · January 27, 2023, 5:05pm

We plan to ditch the spack-based model and have a mentors among the RISE staff be responsible for certain packages for the central stack. That will reduce our footprint to one central stack with human control and local packages for research groups on an as-needed basis.

We plan to support the current and one version back for the central stack, but no software will be removed once it is installed unless it is a security risk.

Having three stacks, as Jeff stated is messy.

jsimms · March 10, 2023, 10:43pm

We’re a small school so up to this point we’ve just installed packages manually and used Lmod to provide access. This works well for most things, but we’ve seen an uptick in researchers who need to use a consistent environment to ensure reproducible science, such as identical versions of compilers, libraries, etc. This causes problems when needing to patch or upgrade systems, sometimes, and of course this is not a novel problem, with numerous options available to address it.

We are considering Spack as a way to facilitate “frozen” environments but are just looking to go down this road. Also potentially Singularity containers, but again, being so small, it takes time to work through everything. If we do use Spack, we may provide documentation to let users create and manage their own environments, rather than have us create global environments. Any insight would be welcome…

Warmest regards,
Jason

langford · March 11, 2023, 1:16pm

We switched from an in-house tool to using EasyBuild a few years ago. It has been such a time-saver. There are tons of existing recipes available, it generates module files automatically, and the toolchain system helps make sure software is compatible with each other.

I don’t have experience with Spack, but I can’t recommend EasyBuild enough.

ChuckP · March 13, 2023, 3:20pm

I’m going to look into this EasyBuild product. Have any tips before a new user gives it a whirl? I’m assuming this is the product? https://easybuild.io

jsimms · March 14, 2023, 6:50pm

I’m curious @langford whether you considered Spack (and @nucci whether you looked at EasyBuild). It’s unclear to me exactly why some select one over the other. I am always confused why there would be two tools that largely do the same thing, so I’m trying to sort out why one would ultimately be preferable to the other. Both have their adherents, but I can’t seem to find where people actively compared both and identified reasons for going with one over the other.

langford · March 14, 2023, 7:16pm

@jsimms The transition happened right as I was joining our team, so I wasn’t really part of the down-select process. I believe we looked at Spack, but I’m not sure what the pros/cons were at that time. Do you guys use Spack? What do you like/dislike about it?

@ChuckP That’s the right one, this is their repository for software recipe (called easyconfigs). You can look to see what versions of your commonly used software have existing recipes. Of the software we install, I would say ~80% use community developed easyconfig files. I recently built a new apps tree for our newest cluster and installed several hundred modules in a few weeks. EasyBuild generates the module files automatically as well.

While EasyBuild puts out new toolchains twice a year (sets of GCC/zlib/etc versions), we only deploy two toolchains at a time on a rolling basis. Right now we use 2018b and 2020b, but we’re in the process of deprecating 2018b and rolling out 2022b.

nucci · March 17, 2023, 4:44pm

I asked the folks who maintain our main software stack why they had chosen Spack at the time. I’ve summarized their responses:

"The main reasoning was that we didn’t want to continue to build RPMs, we wanted something more flexible. We evaluated the other build systems out there at the time and spack showed the most promise and had the convenience of mainly using python, which is much more ubiquitous. I have no particular regrets about using spack. I wouldn’t say we ever used it as intended, but it certainly worked for what we needed.

Spack allows for a complex chain of dependencies and compilers to be specified. You can ask Spack to build OpenMPI version X with Intel version Y, making the dependency foo version Z using compiler Bar, and on and on."

Hope this helps!

chris_blanton1 · May 8, 2023, 3:52pm

I’ve seen a variety of methods used. I have liked Easybulid and Spack about equally when I’ve used them.

Handbuilding software seems to have been the fastest method generally, but when things are difficult, things are problematic throughout the deployment.

Writing RPMs was a great and self-documenting process, but the process does require more training and took the longest time to go from request to installation.

Spack/Easybulid are great as long as you only need the software built in the way the recipes were written. Using proprietary software in the software stack is possible, but it is not always the easiest. I think the choice is often which has recipes for what software you need to install. After that initial set of installations, it becomes easiest to maintain using the same framework to install more software.

I think the easiest is to use containers, but there is a learning curve for the users and for the staff to master their usage.

jsimms · May 11, 2023, 2:22pm

Yes, I have more or less come to the same conclusion. I do think Spack and EasyBuild are great - as long as you can use their “recipes” essentially “out of the box.” Ultimately I think a container solution is optimal, but as you say, I’m concerned that there aren’t good, basic introductions and examples, at least not that I’ve come across. The benefit to authors of such content is, I understand, minimal, but what a benefit to the community it would be…

Abusaad · November 21, 2023, 11:07am

Hi ,
We at University of East Anglia use modules (Lmod) but we are looking to automate the module creation process by using Jenkins pipeline/script and container first approach.

We are also doing a proof of concept/testing of Mii (GitHub - stanford-rc/mii: A smart search engine for module environments.) and looking to integrating it with our Jira service desk instance , so if a user types a binary/executable name in the terminal, then he will be presented be a list of modules which provide that binary. If he does not find the module/software available on the system. he can the create a jira service desk request from command line .