The Magazine for Developers of Open Communication, Industrial, and Rugged Systems
ARTICLES
PRODUCTS
NEWSWIRE
VENDORS
E-LETTER
E-CAST SCHEDULE
Home > Articles >
Related Articles (6)
Browse Topics
Search Articles
Recently Published
Magazine >

About the Magazine
Editorial Topics
Free Subscription
Search Articles
Search Products
Contact Information
Columns

Editor's Foreword
by Joe Pavlat
Software Corner
Specification Corner
Technology in Europe
Technology Update
Market Studies
Tutorial
AdvancedTCA Perspective
OSP Editor's Notes
Departments

What is CompactPCI?
What is AdvancedTCA?
Environmental Issues
Webcasts

Upcoming E-casts
Archived E-casts
Submissions

Submit a Press Release
Submit a New Product
Submit an Article for Review
Vendors/Sponsors

Preferred Vendors
Upcoming Issue
Advertise
Editorial Calendar
Media Kits






 Thermal management
Posted: October 2006 | Printer-Friendly Version

Liquid cooling embedded computing initiative

By Tahir Cader, Eric Grabowski, Joe Pavlat, and John Peters

In last month’s CompactPCI and AdvancedTCA Systems, Tahir, Eric, Joe, and John introduced the objectives of the Liquid-Cooled Embedded Computing (LCEC) initiative, a privately funded group tasked with creating an open standard for liquid-cooled embedded computing architectures. Here they detail development of an architecture that deploys liquid cooling for the entire board, also known as full board cooling.

(visit our sponsor below)

The full board cooling solution being developed consists of a number of enclosed boards. Each board is envisioned to be a Line Replaceable Unit (LRU)) housed in a common chassis. This system, to be discussed in greater detail, is being targeted to achieve a greater power density than any competitive air-cooled system, and will possess all the inherent advantages discussed earlier.

In the early stages of the system development, the LCEC committee recognized a number of key technology risks associated with implementing liquid cooling. As a result, it adopted a two-pronged development approach. The first involves the development of what is called the Static Display Model (SDM). The second approach leverages a prototype system, developed under a Defense Microelectronics Activity (DMEA) program by member company ISR. The second development activity has resulted in a live system that will simply be referred to as the prototype system. The live prototype system was developed to address key technology risks, to prove feasibility, and to provide a reference design. Even though the SDM and the prototype systems have some differences, the core engineering development tasks apply to all LRUs that are completely cooled by a dielectric coolant, such as Fluorinert. These core tasks include: 

  • Heat acquisition and rejection
  • Hermetic sealing
  • Manufacturable
  • Reliability
  • Serviceability/maintainability

Static display model
Figure 1 shows the static display model subrack with disassembled enclosure and printed circuit board. It is a nonfunctional mechanical model built as an example of a potential architecture that could be realized from the standard. Building the SDM has provided critical insight into what it will take to create a practical standard, and is playing an important role as the committee works towards a standard. The model is intended to roughly represent key subsystems and their placement with respect to each other, and will serve as a baseline as LCEC members gather marketplace requirements. The SDM aims to achieve a physical representation of the ideal packaging for a liquid-cooled LRU subrack.

the static display model subrack with disassembled enclosure and printed circuit board
Figure 1
(click to zoom)

The SDM shown in Figure 1 is 9U with 6U blades completely enclosed within a sealed container. With this packaging, a dielectric fluid completely cools the blades within a hermetic structure that encapsulates the PCB. This enclosure could be made of several materials, including aluminum or plastic. The material selected will depend upon the final form factor, operating limits, and pitch, among other key drivers. The blades utilize hermetic electrical connectors that transmit high-speed signals (minimum 10 GBps) and power between the dry and wet environments. Member company Yamaichi Electronics has prototyped these connectors. The connectors are currently undergoing leakage testing. Promising preliminary results have shown leakage levels well under maximum specifications for a system using a dielectric as the coolant. The blades simultaneously plug into the backplane and the fluid manifold as hot swappable units. When a blade is removed, special shut-off fluid connectors will prevent the coolant from spilling and the fluid system will remain closed.

The current version of the SDM contains fourteen slots on a 1.2-inch pitch. While the pitch can be reduced, the current pitch provides extra room for 3D stacking of components, denser mezzanine cards, and other general 3D structures that can all be made possible with liquid cooling. A fine balance between maximum modularity, ability to go in the vertical direction on a board, and maximum thermal power density must be considered in order to provide a standard foundation that can be built on for years to come.

During normal operation, the fluid supply manifold in the SDM will supply cold fluid, while the return manifold will remove hot fluid from each sealed blade. Fluid will travel from the return manifold to a 3U Thermal Management Unit (TMU) located in the bottom of the subrack. The TMU will contain redundant pumps, control electronics, a liquid-to-liquid heat exchanger, and a reservoir. The fluid it provides will remove heat from the blades and transfer it to a facility water system via the liquid-to-liquid heat exchanger. The system will be capable of rejecting its heat to either chilled or condenser (cooling tower) water. Employing this latter form of heat rejection will allow the user to bypass or remove the facility chillers, leading to higher facility energy efficiencies. Alternatively, using chilled water could reduce the size of the heat exchanger, reduce the amount of water flow required, and lower component temperatures. Whether the waste heat is removed to chilled or condenser water, it is important to note that this heat is not being rejected into the facility air. This approach to cooling will enable a dramatic reduction in airflow management challenges that data center/telecom centers face today.

Each slot in a subrack may contain both Front and Rear Transition Modules (FTMs and RTMs) for maximum I/O breakout. Figure 2 shows a close-up of the interface between an FTM, blade, and a backplane. These FTMs and RTMs are envisioned to be relatively low power so that only modest convective air cooling infrastructure is necessary. As long as the amount of air cooling per slot is kept low, the fans and ducting should be able to be packaged in the same volume as the rest of the system. The lower third of the chassis, which primarily houses the TMU, will likely share space with the Power Entry Modules (PEMs) and fans for cooling the RTMs and FTMs. Figure 3 shows a rear view of the subrack with the lower third extending beyond the card-cage. The backplane will primarily contain the mesh and deliver power. The open space above it could be utilized for user specific I/O to the RTM, similar to the AdvancedTCA standard. The backplane will need to be mechanically coupled to the coolant manifold, as shown in Figure 2, to accommodate the hot swapping action.

a close-up of the interface between an FTM, blade, and a backplane
Figure 2
(click to zoom)

a rear view of the subrack with the lower third extending beyond the card-cage
Figure 3
(click to zoom)

The design intention is for each blade to have 350 W to 500 W of power dissipation and cooling capacity for a chassis total of 5,000 W to 7,000 W per subrack. The chassis, built around the same blade building block, is envisioned to accommodate either 300 mm (12 inches) or 600 mm (24 inches) depths. The 300 mm design will most likely have lower performance than a 600 mm design. For an overall power density calculation, one can assume an average depth of 20 inches (500 mm). This depth, coupled with a chassis width of 17 inches (430 mm), and a subrack height of 9U (15.75 inches or 400 mm) will provide the following system density:

As will be shown in the prototype system description, cooling a blade with a dielectric fluid should enable system densities of 1.4 W per cubic inch or better. In comparison, the densest air-cooled systems can achieve about 0.5 W per cubic inch. Table 1 shows the assumed 19-inch rack dimension and power for each standard.

Standard

Power

Dimensions

CompactPCI

1,500 W in 12U

15 inches (D) by 17.5 inches (W)

AdvancedTCA

2,800 W in 12U

15 inches (D) by 17.5 inches (W)

ATX

350 W in 1U

24 inches (D) by 17.5 inches (W)

Table 1

At the blade level, VITA 48 holds the potential to achieve a similar density to what LCEC is in the midst of demonstrating (500 W). Public disclosure from VITA 48 to date indicates that 200 to 500 W is their target for cooling. Current disclosed methods for liquid cooling in VITA 48 are liquid flow through (LFT) and Dry Spraycooling (DSC). Both of these methods appear to be different versions of full-board cold plates. Full-board cold plates will be faced with the challenge of maintaining low thermal resistances from component to fluid.

Besides the ability to increase system density by two to five times, there are several other benefits associated with removing heat via liquid rather than air, such as, energy savings, noise reduction, and freedom to place racks in any location.

The prototype system
The prototype system has been designed, built, and tested by ISR under a DMEA contract. The system was not specifically designed for LCEC, however, as mentioned earlier, many of the technologies developed can be leveraged by LCEC to reduce risk, prove feasibility, and act as a reference design. As this system is further refined for military applications, it will continue to assist LCEC as a platform for proving out key technologies at the component, board, and system levels.

Design features
Figure 4 shows the prototype system in a 19-inch transport case. At the heart of this platform is a 7U chassis with 18 slots. Each slot is sized to deliver and cool up to 500 W. The system utilizes atomizers to directly spray the components of interest on the board. The maximum thermal capacity is under evaluation, but earlier prototypes proved successful at achieving over 400 W of cooling per Line Replaceable Unit (LRU). Each LRU currently encloses a live dual Opteron board with a maximum of 16 GB of DDR1 memory (333 MHz) per board. The current functional board dissipates over 200 W with two dual-core 2.2 GHz (95 W thermal design power), 90 nm AMD Opterons. This demonstration system is targeting the following system density:

This shows a two to five times increase in density in comparison to the current air-cooled CompactPCI, AdvancedTCA, and ATX standards as highlighted earlier.

A 3U thermal management unit is combined with the 7U chassis. As described earlier for the static display model, the TMU serves to pump the dielectric fluid to and from the LRUs and reject the heat to facility water. The TMU has supply and return water connections in the rear. The heat exchanger has been tested and shown to reject over 10 kW to 35 ºC Propylene Glycol Water (PGW) within specified operating conditions on both the dielectric coolant and water sides. The TMU shown in the prototype system (Figure 4) accommodates redundant hot swappable pumps, power supplies, and control boards.

TMU shown in the prototype system accommodates redundant hot swappable pumps, power supplies, and control boards
Figure 4
(click to zoom)

The TMU includes a hardware controller and shelf manager that will be Intelligent Platform Management Interface (IPMI) compliant (note that the static display model will utilize open standards for its choice of shelf management hardware and software). The controller controls the pressure of the fluid system regardless of environmental or load conditions. In addition, it monitors the pressure and temperature sensors to ensure that the coolant is being delivered within specified limits. The firmware will force an automatic and seamless switch to the redundant pump in the event that it senses a loss of control or pump failure. In the event of a pump failure or system limit being reached, the firmware will provide an appropriate signal to the shelf or system manager, which will dictate an appropriate response to the event. While the prototype system currently houses the shelf manager, it is possible to locate this at any other appropriate location within the rack or facility.

The chassis power supply and switches are off-the-shelf 1U subsystems. The current prototype utilizes a 10 kW power supply to convert AC to -48 VDC. There are two 24-port switches – one InfiniBand and the other 1 Gigabit Ethernet (GbE). In addition to 1 GbE and InfiniBand, the prototype board has the capability of sending SATA, PCI Express, AMD’s Hypertransport, and other general I/O through the backplane. These high-speed signals point to achieving large bandwidths through a narrow pitch, liquid-cooled LRU. All of this functionality has been tested and shown to work at a basic level. Ongoing testing will evaluate all of these system features in greater depth.

Figure 5 shows the prototype system’s LRU next to the SDM’s concept LRU. Both LRUs have their power and signal connectors along the same edge as the fluid connectors. Figure 6 shows the printed circuit boards with the enclosure either fully or partially removed. The primary difference between the two LRUs is that the prototype system requires the enclosure to hermetically seal against the printed circuit board, while the SDM encloses the entire board.

the prototype system’s LRU next to the SDM’s concept LRU
Figure 5
(click to zoom)

the printed circuit boards with the enclosure either fully or partially removed
Figure 6
(click to zoom)

Figures 7 and 8 show the front and rear of the prototype system’s chassis. Similarly to the SDM, the chassis contains a backplane mechanically coupled to the fluid manifold. An alignment pin, similar to the one utilized in the AdvancedTCA standard, is the primary centering feature to align the LRU connectors to the backplane and manifold. The dimensions of all the mechanical and electrical components have been selected to allow proper alignment of all connectors within industry acceptable tolerances. This ensures that the design can be produced at a reasonable cost in large volumes. The rear of the chassis shows the RTM space, redundant filtered power entry, and chassis fluid I/O.

the front of the prototype system’s chassis
Figure 7
(click to zoom)

the rear of the prototype system’s chassis
Figure 8
(click to zoom)

This prototype development effort has played a significant role in technical risk reduction. Initial testing of the mechanical and thermal systems has shown promise.

Results from testing
The system testing to-date has covered mechanical, thermal performance, electrical, and other functions.

Mechanical testing
The ability to inject and eject a blade without losing fluid or breaching the fluid system has been achieved. This interface is being further optimized and studied for reliability. The fluid couplings are under continued test and evaluation for two-phase pressure drop performance, leakage, and reliability.

Minimizing LRU leakage is an important area for the development effort. In helium leak testing, the LRUs have been shown to leak far below their specification of four grams per year per LRU. The testing has been conducted for isolated LRUs, and not at the system. The initial leak testing results are promising and provide confidence that the full system leak specification will be met. Testing is ongoing and will focus on long-term leakage.   

Thermal performance testing
Measuring the exact power draw of the various components, for example, CPUs and memory, is extremely difficult when dealing with live boards. Knowledge of the exact power draw of the CPUs in this environment is critical if one wants to accurately determine the thermal resistance from the CPU to the final heat rejection medium (water or air). To overcome these difficulties, ISR conducted the thermal testing in two separate prototype setups. The first setup isolated the LRU from the chassis and measured the exact power draw of the CPUs along with other critical items such as fluid temperature and fluid flow rate. ISR used AMD Opteron Thermal Test Kits (TTKs), which are live die certified by AMD to a high and exact power rating (around 90 W at a specific case temperature, for example, 70 ºC). The initial setup tested TTKs with the dielectric coolant as high as 57 ºC and all dissipated in excess of 90 W. In all cases, the specified case temperature of 70 ºC was not exceeded.

In the second setup, ISR tested the LRUs as part of the full prototype system. Primary focus has been placed on the ability to maintain all the high power and high heat flux components within manufacturers’ specified temperatures. In this environment, it was not possible to measure the power dissipation of the CPUs. Due to some technicalities that have since been worked out, it was also not possible to read the CPU diode temperatures with a quantifiable level of accuracy. Results for the measured case temperatures of the other components of concern are shown in Table 2. The coolant was delivered at approximately 45 ºC for this testing. Also shown in Table 2 is a conservative (linear) extrapolation of the case temperatures to a higher dielectric coolant temperature of 57 ºC (simple addition of 12 ºC to all case temperatures). To put the 57 ºC dielectric coolant temperature in perspective, testing and analysis show that facility water delivered at 35 ºC to the TMU would allow the it to deliver the dielectric to the electronics at 57 ºC or lower. Research shows that it is not likely that water delivered directly from a cooling tower (with chiller plant bypassed) will exceed 35 ºC at any point during the year, for the vast majority of locations nationwide (if not all locations).     

Component

Component Case Temperature ºC

Measured Case at 45 ºC Coolant Temperature

Conservative Prediction of Case Temp at 57 ºC Coolant

BCM5780 – GbE (U4)

No spec given

56

68

IRF1310N MOSFET

170

54

66

Power Diode1

105

54

66

Power Diode2

105

54

66

SiS965 – Northbridge (U1)

No spec given

63

75

SiS965 – Southbridge (U2)

91

54

66

Tessera Memory

90

57

69

Vicor1 – 48 V to 12 V

110

54

66

VT1103 Power Reg

122

56

68

Table 2

The results show that all the critical components were well below their maximum case temperatures (where manufacturers’ specifications were available). These promising results make practical thermal design for individual boards feasible. In other words, the thermal designer will have the ability to grossly wet noncritical components and pinpoint extra fluid momentum onto hot components as needed. The phase-change nature of evaporative cooling with a low-boiling-point fluid, such as Fluorinert, provides an efficient method for transferring heat with low thermal resistances and lower fluid volumes as compared with single-phase heat transfer.

Electrical and other testing
The current prototype is undergoing extensive testing in all the areas mentioned earlier. In addition to these areas, other areas of evaluation include:

  • The user interface of a LRU
  • Serviceability issues
  • Performance under a variety of power levels
  • Manufacturability of core hardware components
  • Stress testing of the liquid enclosure
  • EMI/EMC emissions/susceptibility

All of this testing to-date and in the future will help pave the way, reduce risk, and attract more advocates for a liquid cooled standard that has the ability to revolutionize the computer industry. Data centers are already moving towards liquid cooling the air inside racks, which is a step in the right direction. However, LCEC views this current industry trend as only a small steppingstone of the future potential for full board liquid cooling.

Design upgrades
The results shown for temperatures of components on the prototype dual processor board have shown feasibility of cooling high power and heat flux components in a liquid cooled LRU. It is expected that effective thermal management of the next generation of components is feasible. LCEC is evaluating upgrades to the prototype board. These upgrades could include the latest processors, memory, chipsets, and power converters.

LCEC future activity
A dense, environmentally isolated, liquid-cooled system solution that leverages desktop silicon (AMD Opteron) has been demonstrated with the prototype system. As discussed earlier, the prototype system was developed in order to prove out several key technologies/capabilities to be used in an upcoming upgrade to a functional SDM – to be renamed the Functional Display Model (FDM). Key among the technologies/capabilities developed were a hermetic connector, the ability to effectively seal and cool a functional and high power blade in a narrow pitch, and the ability to hot swap a functional blade while maintaining proper alignment and with minimal loss of coolant.

(visit our sponsor below)

To enable rapid development of the prototype system, a number of constraints such as vertical height and depth were relaxed. The prototype system is 10U, which is 1U higher than the envisioned height for the upgrade to the upcoming FDM. Future activity involving the FDM will incorporate the technologies developed with the prototype system, and will seek to constrain the FDM to 9U and 300 mm or 600 mm in depth. The first iteration of the FDM will have limited functionality, but will be designed for high reliability and efficient manufacturing (for lower cost). Future activities will emphasize developing the sealed enclosure that will house the boards, as well as integrating the TMU with the chassis, power supplies, and power entry modules. Effective EMI/RFI shielding afforded by the sealed enclosure is a key benefit, and this will be evaluated.

There are currently seven LCEC member companies. A key goal for the near term is to expand the membership. Finally, significant thought will go towards alignment of the LCEC under a trade group.

CompactPCI and AdvancedTCA News with RSS Link

©MMVIII CompactPCI and AdvancedTCA Systems. An OpenSystems Publishing, LLC publication.

ARTICLES   PRODUCTS   PREFERRED VENDORS   NEWSWIRE   EMBEDDED FORUM   eLETTER   SUBSCRIBE FREE >
About this Magazine and Website | Contact Us | Media Kits