Does Anyone have experience with mixing AMD and Intel CPUs on their HPC clusters? Do you have to partition then differently or can they coexist well for most jobs? How about MPI, will a job have any issues if it’s placed across Intel and AMD?
Hi
At the University of Arizona the supercomputer we bought in 2016 has Intel Haswell and Broadwell processors. The one we bought in 2020 has AMD Rome. After about a year of running them separately we created a common install image on a AMD Rome based VM, and compiled our 100+ modules using GCC on another AMD Rome VM. All the compute nodes, whether Boadwell, Haswell or Rome run the same image and access the same modules. We don’t have any compatibility issues. If our 2024 system has Genoa, we will likely compile everything with the AMD LLVM based compiler with the highest possible AVX extensions. So we might have a general set of modules that run on Genoa and Rome, and a performant set of modules compiled with AVX512 etc
So long as the base OS is still the same and you compile with the least common denominator for the chipset, you should be ok. That said, we’ve avoided that combination on any cluster I have designed or contributed. If your jobs are OpenMP (one node) you should be able to containerize them and make them “bare metal agnostic”. As for OpenMPI, we’ve gotten that to work inside containers, but not 100% consistently as the base OS libraries and drivers can cause havoc. Hope that helps.