Originally Published MEM Fall 2005
IMAGING
A Modular Multiaccelerator Platform for High-Performance Imaging and VisualizationA standards-based system offers all the benefits of a blade server in bringing data-intensive medical imaging functions to the office desktop.
Philippe Roy and Robert Murphy
Conventional blade server clusters are a well-established technology popular in high-performance computing. Their tall and thin packaging form factor is ideal for flat motherboards housing multiple central processing units (cpus) and memory. However, if the application requires high-end imaging and visualization, the conventional blade server form factor falls short; it cannot accommodate the packaging and cooling requirements of current extremely high-performance graphics processing units (GPUs).
In fact, those devices with graphics-intensive applications have been trying for years to take advantage of the compelling density and modularity characteristics of the blade form factor, but with little success. A standards-based cluster that has all the benefits of a blade server and can accommodate high-performance GPUs has long been the elusive Holy Grail of high-end visualization.
Clusters offer a combination of computing and networking to achieve performance levels previously attainable only with very-high-end computers, such as supercomputers or mainframes. Over time, mainstream technology, driven mostly by Intel Corp. (Santa Clara, CA) and AMD (Sunnyvale, CA), has brought high performance to the desktop level, along with a high level of integration at the chip set level among processors, memory subsystems, and input/output (I/O) interfaces.
Most recently, interconnect fabric (i.e., high-speed networking interfaces that can be connected via switches and connect processors into mesh topologies with very-high-bandwidth capabilities) that can handle multiple simultaneous streams of data passing between processors at speeds to 800 Mbyte/sec have been commodified. Unprecedented connectivity between processors within the cluster has become available. Examples are InfiniBand and RapidIO.
Commodity servers certainly offer a great price-performance point. However, they do not necessarily provide the optimized hardware and software components needed to take advantage of the significant computing and networking power available through various subsystems. The packaging is generally based on 3U or 4U 19-in. rack-mount chassis, which can be repacked into workstations offering a less-than-optimal density factor. Clusters often require significant facilities extension, large amounts of power, and increasingly strong air-conditioning units.
The most popular option, certainly, continues to be to put together as many rack-mount servers as needed, including the appropriate interconnect switch fabric and local or shared storage subsystems, and to plan for the increasingly scarce space. The upside to this approach is that it is an inexpensive system to acquire. However, it is not necessarily the least expensive to own, which becomes clearer when all of the other costs required to make it work are added in. The apparent downsides are substantial: mediocre computing density, a high noise level (the system must be in a computer room), high-power air-conditioning, restricted I/O capabilities, and limited flexibility due to space constraints.
Another option is now available. A scalable, modular, high-performance computing platform is designed specifically to deliver all the benefits of blade server computing and accommodates multiple GPUs.
![]() |
| Figure 1. A modular platform is engineered around various standards, including the PICMG 1.3 specification, and the PCI-X and PCI Express standard connectors. (click to enlarge) |
Small-Form-Factor Computing Power
Scalable systems enable small-form-factor modules to be engineered around mainstream technologythose Intel or AMD processorsyet, in their modularity, are highly configurable and provide significant flexibility in the choice of I/O and of accelerators, such as GPUs or field-programmable gate arrays (FPGAs).
This type of platform is engineered around the PCI Industrial Computer Manufacturers Group (PICMG) 1.3 specification, with PCI-X- and PCI Express–standard connectors (see Figure 1). A base plane card allows up to three modules to be plugged in: a single-board computer (SBC), a PCI Express GPU or a PCI-X board, and a possible second PCI-X board. The platform can be configured with a single-processor SBC and single GPU, and it can be upgraded easily with Windows or Linux system software-ready pretested configurations.
The SBC plugs into a standard PICMG 1.3-compliant connector and can be configured with one or two processors, as much as 8 Gbyte of double data rate 2 (DDR2) 400-MHz memory, and gigabit Ethernet and InfiniBand ports. The base plane is equipped with a dedicated processor running system management programs. One such program is automatic noise-level control, which makes each module a quiet unit suitable for normal office environments.
Each system module can be assembled in a standard 4U-high 19-in. rack-mountable chassis or into a workstation assembly capable of handling up to four modules (see Figure 2). Each 4U subsystem can be configured easily at the module level, optimizing cost, space, and power requirements. And each subsystem can be interconnected with other subsystems via InfiniBand or gigabit Ethernet switches, depending on the application bandwidth requirements.
An application example is the minicluster system shown in Figure 3, which demonstrates the combination of density, with respect to both the cpu and the GPU/FPGA, and modularity being sought for high-end imaging and visualization applications. The minicluster is a 35-in.- or 90-cm-tall cabinet that accommodates up to 24 low-voltage Intel Xeon 2.8-GHz EM64T processors, 24 high-performance PCI Express accelerators such as GPUs or specialized processors, and as much as 128 Gbyte of DDR2 400 memory. It includes a full InfiniBand mesh based on 12 or 16 InfiniBand 4x ports, providing aggregate bandwidth of up to 6.4 Gbyte/sec between processors. An additional 800 Mbyte/sec is available between the storage server and compute-visualization nodes. The InfiniBand storage server installed in the lower part of the cabinet can be customized to provide up to 9.6 Tbyte of shared storage in a highly compact form factor and natively connected to the InfiniBand mesh infrastructure. The equivalent configuration using standard commodity rack-mount systems would require at least two 42U cabinets side by side to handle the extra units.
Increasingly, manufacturers need to combine high-performance computation, the manipulation of increasingly larger data sets, and better visualization of increasingly complex data in a personal resource available at the desktop. The modularity, in association with several software options, can provide users with a personal high-performance visualization computing (HPVC) cluster workstation. This innovative cluster can be used in the office environment.
![]() |
| Figure 4. A personal HPVC cluster station is shown here. (click to enlarge) |
Personal HPVC cluster stations can be configured to be as small as two processors and two GPUs or scaled up to as many as eight processors and eight GPUs within the same enclosure (see Figure 4). Obviously, each workstation can be integrated into a larger InfiniBand or gigabit Ethernet network with its own storage potential of 1.6 Tbyte.
Medical Imaging
Such a high-performance system can handle highly complex analytical and visualization applications in a variety of fields, including medical imaging and visualization applications.
The system of modular processors, storage subsystems, and high-speed interconnects creates a scalable end-to-end infrastructure to address the massive and increasingly data-intensive challenges in medical imaging. Multislice computed tomography (CT) scanners, modality fusion, and innovative reconstruction techniques have generated an explosion of image data to process and an expanded diagnostic work flow to manage. But with its capability to house several GPUs in a module, the standards-based platform enables modules to be connected to deliver reconstruction and volume-rendering acceleration.
![]() |
| Figure 5. A medical imaging scanner infrastructure composed of processing and storage modules. (click to enlarge) |
In the case at hand, modules are integrated with medical reconstruction and visualization software as shown in Figure 5. Medical image data are acquired by a scanner when a source passes x-ray, microwave, magnetic, nuclear, or other energy through, or applies it to, the patient's body. Detectors convert the energy readings into digital signals. These signals are then reconstructed into a viewable 2- or 3-D image, typically with the help of a special-purpose accelerator capable of orders of magnitude more calculations than a conventional cpu.
A control console synchronizes the activity of all the energy sources and detectors in the scanner to ensure that the requisite data sets are acquired properly. Finally, the image data are viewed and manipulated on a diagnostic console to optimize visualization of abnormalities in the image for diagnosis. This step, too, requires accelerator hardware to support fast, interactive 3-D visualization techniqueswhich even become 4-D with the added dimension of time.
This medical image data flow can be carried throughout the hospital enterprise. Remotely located clinicians may require instant access to the same data and same visualization capabilities as those available at the local diagnostic console. Here, advanced visualization functions are still calculated on the local accelerator hardware within the platform, with the results sent over the network to minimally configured thin clients equipped with only the simplest of display capability. All networked computers throughout the enterprise thus have advanced visualization capability without there being any need to install complex and expensive special accelerator hardware in each machine.
Finally, the platform provides a pool of inexpensive high-speed InfiniBand processors that have a storage capacity of many terabytes. It holds the thousands of images generated in this data-intensive, high-activity work flow. As images are viewed and diagnoses are made, these data eventually migrate to hierarchically arranged storage pools for long-term archiving. These pools constitute the hospital's picture archiving and communications system, known as PACS. To keep moving, the image-data traffic travels though a high-bandwidth InfiniBand fabric.
The software used in the system depicted in Figure 5 consists of several components specialized for image reconstruction, volume rendering, and visualization. The image reconstruction software, in conjunction with the accelerator hardware, speeds up reconstruction of 3-D images from medical scanning devices. Its advanced reconstruction techniques are optimized to exploit the power available with the accelerator cards. The volume-rendering software processes multigigabyte data sets with accelerators for multiplanar reconstruction, maximum-intensity projection, shaded and classical volume rendering, and surface-shaded display. A thin client or server enables thin clients to act as fully functional medical workstations. All medical-image data processing and visualization is done on the server; only the resulting screens are streamed to the thin client via a standard network connection. Thus, legacy hardware is supported, and users have consistent access to data on centralized, scalable resources.
Conclusion
The packaging and cooling limitations that have inhibited the adoption of blade server–like computing in applications that required GPU and FPGA acceleration have been overcome with the development of standards-based modular platforms. Designers and users of graphics- and visualization-intensive systems can now take advantage of the same density, modularity, and cost benefits that have been available for general-computation applications for years.
More important, the availability of the standards-based platform can facilitate a paradigm shift in medical imaging and visualization like that which occurred in software for general computational applications. That is, application software can migrate from expensive and proprietary monolithic graphics architectures to standard scalable graphics clusters.
Lowering the cost of hardware and application software can lead to the introduction of many new high-end visualization applications and the general adoption of productivity tools previously restricted to only those few who could accommodate their high cost and complexity.
Philippe Roy is director of visualization and simulation systems for Mercury Computer Systems (Chelmsford, MA). He can be contacted at proy@mc.com or 978-967-1769. Robert Murphy is director of marketing for life sciences for Mercury. He can be contacted at rmurphy@mc.com or 858-794-1600.
Copyright ©2005 Medical Electronics Manufacturing








