Reasoning and Approach
To summarize liquid cooling strategies, I analyzed the email discussion among institutional IT and HPC leaders regarding their approaches, experiences, and considerations for implementing liquid cooling in data centers. The summary below highlights the main strategies, their rationale, and practical examples.
Does you institution approach this differently? Add to the conversation
Summary of Liquid Cooling Strategies
1. Direct-to-Chip Liquid Cooling (DLC):
- What: DLC involves circulating coolant directly to the CPU (and sometimes GPU) sockets to remove heat efficiently.
- Why: It enables higher density computing and is increasingly necessary as air cooling becomes insufficient for modern, high-power systems.
- How: Some institutions have used DLC for several years and plan to continue for future upgrades. DLC is especially valued for its effectiveness in central machine rooms.
- Example: “Some institutions report using direct liquid cooling (DLC) on their CPU sockets for 6 years and now love it.”
2. Rear Door Heat Exchangers (RDHX):
- What: RDHX systems use chilled water to cool air as it exits the rear of server racks, reducing the load on room air conditioning.
- Why: RDHX allows for effective cooling without major changes to room layout (e.g., hot aisle/cold aisle containment) and can be retrofitted to existing racks.
- How: A University Data Center has used RDHX since 2012, with clusters from multiple vendors. Their new cluster will also use RDHX, but vendors indicate future systems may require DLC.
- Example: , From an IT professional. “Our systems have had rear door heat exchangers RDHX since then, either Coolcentric or Motivair… Vendors told us the next generation will have to be DLC.”
3. Rack-Level Coolant Distribution Units (CDU):
- What: CDUs manage and distribute coolant at the rack level, rather than at the row or room level.
- Why: Rack-level CDUs offer flexibility for growth and reduce vendor lock-in, making it easier to add specialty clusters or upgrade without major infrastructure changes.
- How: When considering DLC, some institutions prefer rack-level CDUs to manage expansion and avoid being tied to a single vendor’s ecosystem.
- Example: One IT Professional Reports “When we reviewed DLC, we considered a rack level CDU rather than row level, which would better help us manage growth. And this reduces vendor lock.”
4. Planning and Concerns:
- Exploratory Phase: Some institutions are still evaluating liquid cooling, with implementation planned in the next 3-5 years. Concerns include vendor lock-in, cost, expertise, and the need for broader infrastructure updates.
- Vendor Lock: There is apprehension about being tied to specific vendors for connectors, components, and support, especially for smaller institutions.
- Power Delivery: As cooling becomes more efficient, delivering enough power to high-density racks is also a growing concern.
5. Broader Strategies:
- Integration with New Builds or Retrofits: Institutions are considering liquid cooling as part of new data center builds, retrofits, or augmentations of existing spaces.
- Standardization vs. One-Off Solutions: Some are accommodating near-term, self-contained, or vendor-managed solutions, while others seek more standardized, scalable approaches.
Key Takeaways
- DLC and RDHX are the primary strategies in use or under consideration.
- Flexibility and future-proofing (e.g., using rack-level CDUs) are important to avoid vendor lock-in.
- Institutions are at different stages: some have mature deployments, others are in the exploratory phase.
- Concerns include vendor lock, cost, expertise, and power delivery.
These strategies reflect a mix of practical experience and forward-looking planning as institutions adapt to the increasing cooling demands of modern computing hardware.