The terms “cluster” and “HPC” are about as useful as “cloud” at this point, they pretty much can describe anything someone wants to describe with them. If we strip away all the fluff,
- cluster: two or more hosts that cooperate on a computational problem.
- HPC cluster: two or more hosts which cooperate on a fine grained computational problem very efficiently
Two hosts with an OS installed, copies of the same applications in the same locations and ssh that works between them are a cluster, so a basic setup for learning is simple to do using any two of anything: VM, pi, open-box special from BestBuy, cloud instance, those old laptops you can’t bear to throw away,… can all be used to learn to do “cluster” computing. It’s also entirely possible to learn everything you need to know about parallel computing on a single system now that everything under the sun has multiple cores and you can get a CUDA aware GPU in a laptop.
From that basic starting point it’s all turd polishing by adding as much extra stuff as desired:
- Scheduler (Slurm, MOAB/MAUI/torque, LSF, PBS, Condor…)
- Shared $HOME (I like NFS but the next item can work for this)
- Shared Parallel Filesystem (BeeGFS, Lustre, GPFS,…)
- Common software stack (lmod/modules, easybuild, spack, etc…)
- Provisioning tool (Warewulf, XCat, a gazillion others)
- Configuration management (Saltstack, ansible, cfengine, puppet, chef,…)
- Interconnect (Ethernet, Infiniband, proprietary foo)
- Grouchy HPC Sysadmin to tell users “NO!”
There’s nothing magical about clusters, although the marketing would have us believe otherwise. IMHO the most important thing is to keep a good grasp on the high level view of “what do I want to accomplish with this?” because the problem being solved should drive the cluster, not the other way around. If the goal is to become a cluster sysadmin, then hit every bullet point hard and try multiple tools for each. If the goal is to learn parallel programming, skip it all and just run MPI or whatever interests you on your daily driver system.
I think the most important thing we can take home from that list is how critical it is that we keep my boss convinced that the last item there is the one that matters the most.