Skip to : [Content] [Navigation]
 

Originally Published MEM Fall 2004

MEDICAL IMAGING

General-Purpose CPUs for Medical Image Processing

CPU flexibility, along with increasing performance and functional capability, makes general-purpose processors more than competitive with DSPs and ASICs for medical applications.

Andrew Alleman

Image reconstruction and processing is a fundamental function performed by medical imaging systems. While many aspects of this processing function are well known and are common to different imaging modalities and system manufacturers, there is nevertheless a large degree of variation among image-processing elements that can be attributed to cost, performance, and architectural constraints. Further, intensifying competition and market demands have recently accelerated the rate of feature and performance upgrades and the rate of new product introductions.

A critical question regarding the design of medical imaging system architecture is whether general-purpose central processing units (cpus), special-purpose digital signal processors (DSPs), or dedicated application-specific integrated circuits (ASICs) are better suited for the core data-path processing tasks these imaging systems must carry out. The choice must be able to satisfy the demands of the users in this dynamic market environment.

Medical Imaging System Architecture

A wide variety of system architectures underlie the different types of medical imaging systems. Which kind is used is determined by the system designer and by the application, whether it is ultrasound, magnetic resonance imaging, computed tomography, positron emission tomography, digital x-ray, or some other technology. However, some reasonable assumptions can be made regarding system requirements that are common across multiple modalities and vendors. These include scalable performance, scalable cost, a long system life cycle, optimized lifetime cost, a high mean time between failure (MTBF) and first-time fix rate (FTFR), and a roadmap for performance improvement over time. Additionally, requirements for graphics display, network interfaces, disk storage, and other subsystems may apply, depending on the system.

Figure 1. Generic data flow model for a medical imaging system.
(click to enlarge)

Data flows and data rates also vary according to the specific system, but the flow model that generally applies begins with data acquisition via sensors or detectors, which is followed by preprocessing and packaging of the data, image reconstruction processing, postprocessing of the reconstructed images, and finally by displaying or archiving images (see Figure 1). While this model represents all imaging modalities in a general way, the amount of work done in each of the image- processing stages may vary greatly among modalities and systems. Some cases require significant processing for image reconstruction, for example, while in others the detector produces the image directly.

Data rates are highest at the beginning of the flow, before raw sensor data are processed into an image, and then move lower for the reconstructed-image stream. The aggregate sensor data rate ranges from 4 Mbyte/sec (32 Mb/sec) to higher than 250 Mbyte/sec (2 Gb/sec), while image streams range from 4 Mbyte/sec (32 Mb/sec) to more than 60 Mbyte/sec (480 Mb/sec). The higher end of those ranges trends upward over time with advances in technology.

Imaging-system providers regard their specific algorithms for image reconstruction and processing as valuable intellectual property, but the fundamental operations are similar in most cases. These alogrithms include Radon transforms, 1-D and 2-D discrete Fourier transforms, simple pixel manipulation, and convolution filters. Also, to address performance requirements, especially at the high end of the range, it is typically necessary to consider distributed implementation of the algorithms—that is, running the process in parallel.

Processing Technologies

The cpu is the main processing element in computing systems ranging from desktop computers to embedded systems to supercomputers. Historically, it was a discrete subsystem, but the introduction of integrated-circuit (IC) technology enabled Intel to develop the first general-purpose microprocessor in 1971. That invention consisted of 2300 transistors and ran 0.06 million instructions per second. Since then, Intel, Motorola, and other manufacturers have steadily advanced the state of the art in both performance and capability.

Today, there are myriad microprocessor architectures and variations, serving in an even greater number of applications. Most of these are general-purpose microprocessors in the sense that they can run any kind of program. But also, by a stricter definition of general-purpose, they are not tuned or targeted toward a narrow set of applications. Hence, a cpu with general-purpose architecture may be found in end products that span a variety of markets, such as workstations, servers, telecommunication systems, and medical imaging systems. This fact does not imply, however, that a general-purpose cpu cannot have architectural features that are well suited for certain kinds of processing.

The DSP was introduced in the early 1980s as a special-purpose microprocessor targeted specifically at digital-signal applications such as speech recognition, telecommunication protocol processing, signal processing, and image processing. To serve these types of applications, DSPs were designed as low-power, highly pipe-lined devices that included extensive data-movement capabilities and contained specialized instructions for fixed-point arithmetic and bit manipulation. Texas Instruments introduced a family of DSPs in 1983 that was highly successful, and subsequent innovations by TI, Analog Devices, and other companies have improved the performance and usefulness of DSPs steadily.

ASICs have existed throughout the microprocessor era, providing application-specific functionality in a custom IC. They are typically fixed-function products and offer much higher performance levels than a software-programmable solution can. Dramatic improvements in field-programmable gate arrays (FPGAs) and complex programmable logic devices (CPLDs) have meant that ASIC-like fixed-function circuits can be implemented in programmable hardware with only minor performance degradation.

Cpus, DSPs, and ASICs (including FPGAs and CPLDs) are all viable options for supplying the core data-path processing functionality that medical imaging systems require. Cpus are the most flexible, followed by DSPs and then ASICs. The remainder of this article examines these technologies with respect to some of the key architectural considerations for image processing systems. General-purpose cpus receive particular emphasis because all of the factors discussed, except performance, overwhelmingly favor their use.

Performance Considerations

The main reason a system designer would consider specifying a less-flexible, more narrowly focused processing architecture rather than a more versatile general solution is to improve performance. In many applications, that choice is justified. (Cost is another argument in support, as will be discussed.) If the required performance cannot reasonably be achieved with a general-purpose cpu architecture, then the only alternative is to use DSPs or ASICs. Custom ASICs or FPGAs almost always outperform any software-programmable solution. However, because of the proprietary nature of ASICs, generalizations about their performance cannot really be made.

DSP architectures are inherently well suited to image reconstruction and processing operations. They typically feature simplified, flat memory models and single-cycle instruction execution. They contain extensive bit-manipulation capabilities, including barrel shifting and masking, as well as optimized multiply and accumulate, or MAC, functions. Instruction and data memory spaces often are split (the Harvard architecture), with caching available for the instruction memory and multichannel direct memory access (DMA) available for the data memory. Also, the processor architecture offers several levels of parallelism, including both multiple execution cores and multiple functional units, such as multiple-data and multiple-address arithmetic logic units.

Throughout the 1980s and much of the 1990s, general-purpose cpu architectures lacked many of the attributes that made DSPs attractive for image reconstruction and processing applications. DSP performance in these applications was thus substantially better than that of cpus for a long while. However, recent generations of cpus have borrowed some of the most generally applicable DSP concepts to improve their own performance in applications such as digital image, audio, and video processing.

Therefore, cpus now offer performance better suited to the requirements of medical image processing applications. Some recent improvements include deeper pipelines, which result in faster instruction execution (single-cycle in many cases); bit-manipulation instructions and other media, video, and graphics instructions; and parallelism that is provided via single-instruction multiple-data processing concepts. Further, cpus have added numerous other architectural enhancements, such as multiprocessor support, multithreading support, deeper and larger caches, 64-bit processing extensions, and significantly faster clock rates, all of which have improved performance for a wide range of applications.

Figure 2. BDTImark2000 processor scores. The BDTImark 2000 provides a summary measure of DSP speed calculated from a suite of benchmarks. The score is based on many different DSP algorithms. For more information and scores, go to http://www.bdti.com.
(click to enlarge)

DSPs are typically benchmarked against other DSPs, and cpus against cpus, but a number of cross-comparison studies have been performed. One compared the fast Fourier transform performance of the Texas Instruments TMS320C40 DSP, IBM/Motorola PowerPC 604, and Intel Pentium P5 processor and found them to be comparably efficient.1 A more comprehensive comparison involves multiple algorithms spanning different applications. Berkeley Design Technology Inc. (BDTI; Berkeley, CA) has developed a scoring system that compares various processors by means of a set of algorithm kernels reflecting key DSP operations. Figure 2 graphs a selection of these scores for several modern DSPs and one cpu. The intent of this illustration is not to highlight any specific comparison, but rather to show that the performance of general-purpose cpus can indeed be satisfactory for medical image processing applications.

Another important aspect of processor performance is scalability; medical imaging systems must be able to scale between low data rates and very high data rates and scale up to support technology improvements over time. ASICs, DSPs, and cpus may all be designed into a parallel processing architecture, but the flexibility and dynamic scalability of the architecture will vary. Cpus have the advantage of being able to scale at the chip, board, or box level, using standard commercial-off-the-shelf (COTS) hardware. Further, COTS software environments for distributed computing can be used to the advantage of standard cpu-based solutions.

Processor memory capacity also factors into performance scalability. Large-memory support is necessary to handle the highest raw data rates and image sizes. Cpus support the widest range of memory capacity today, scaling to 64 Gbyte and beyond. They are able to take advantage of the most recent advancements in memory technology, for example, multiple banks of 400-MHz double-data-rate synchronous dynamic random-access memory (DDR SDRAM) or 1200-MHz Rambus DRAM (RDRAM). DSPs and ASICs typically lag behind the leading edge of mass-memory technology because that technology is driven primarily by the general-purpose-computing market.

While all silicon technology benefits from Moore's Law (the doubling, roughly, of the number of transistors per IC that is achieved every couple of years, according to the formulation of Intel's Gordon Moore), it is not always possible to translate this continual improvement into performance enhancements for existing systems. Cpus here again have the advantage for application in medical imaging systems because many standard COTS hardware designs for cpus exist at the level of boards (mezzanine modules or blades) and boxes that operate using standard COTS software environments. This allows low-effort upgrading of cpu subsystems to improve system performance.

System Cost Considerations

Performance alone is rarely the deciding factor for selecting system architecture. Cost usually is taken into account. Cost analyses should address not just silicon cost, but also software costs, development costs, opportunity costs, and life-cycle costs. Thus, it is intimately linked with the other decision factors.

ASICs are very expensive to develop but relatively inexpensive to manufacture in large volumes. Many medical imaging systems, however, are not produced in quantities sufficient to achieve optimal ASIC costs. FPGAs are a reasonable alternative, but they significantly increase the cost of goods. And developing the intellectual property represented by large, complex FPGAs also is expensive. This translates to significant ongoing expenses throughout a product's life cycle as new features and performance enhancements must be introduced. For well-defined, stable functionality, FPGAs or ASICs may be a good fit, but the developmental and life-cycle expenses discourage designing the entire data path around them.

Because of its specialized nature, a DSP-based image processing design often can be realized for a lower silicon cost than a cpu-based design of comparable performance. However, when the complete bill-of-materials costs for the module and system, including all common equipment, are considered, the difference is not that significant. Software development environments for DSPs have improved, but nevertheless, comparing software development costs still greatly favors cpus because of their standard operating environments and tools and widely available expertise in programming them. Finally, relative to DSPs, standard cpu-based COTS hardware and software products offer less risk and a faster time to market. Because they also provide greater scalability and flexibility, they keep total system life-cycle costs lower.

System Architecture

The overall system architecture also determines the suitability of cpu, DSP, or ASIC as the core data-path processing element. Key aspects of the architecture include the processing interconnect, input/output (I/O) interfaces, storage interfaces, and coprocessing options, all of which can be addressed adequately through the use of cpus, DSPs, or ASICs. However, cpus offer opportunities to leverage industry standards and high-volume technologies, thus minimizing system costs and development efforts.

There are many different processing interconnects, from proprietary switch fabrics to standard buses such as virtual machine environment (VME) or peripheral component interconnect (PCI). The most widely deployed standard interconnects are PCI-bus and Ethernet. Current Ethernet switches scale comfortably from 10 Mb/sec to 1 Gb/sec, while 10 Gb/sec will be readily available in the near future. PCI has expanded from a 32-bit width at 33 MHz to 64 bits at 66 MHz and higher with PCI-X.

Most recently, PCI-Express, a switched high-speed serial interconnect, has been introduced. Both PCI and Ethernet standards are well supported by a multitude of cpu chipsets and COTS products, offering a large set of choices to the system architect. Also, by using such standards, designers can introduce a limited number of specialized processing elements while maintaining a cpu-based architecture in the core data path.

I/O interfaces are even more diverse than processing interconnects. Standard cpu chip sets and COTS products offer integrated support for some of the most versatile and widely used interfaces, including Ethernet, universal serial bus (USB), and accelerated graphics ports (AGP). Ethernet interfaces can be used for connecting to a processor interconnect switch or as external interfaces connecting to a variety of other types of equipment. USB is a flexible interface that can be used to connect to a multitude of peripheral devices, including input, storage, and control devices. AGP is a widely adopted high-performance graphics interface (scaling up to 2.1 Gbyte/sec) that allows high-speed data movement between main memory and graphics controllers.

Several different storage interfaces are available in cpu chip sets and standard adapter modules, including Fibre Channel, advanced technology attachment (ATA), and small computer system interface (SCSI). All are supported in standard COTS software environments, making software design straightforward and insulating the application software from the storage infrastructure.

Many COTS cpu-based products—both modules and systems—support standard interfaces that can be used to incorporate specialized coprocessing logic without losing the advantages of using a standard cpu-based architecture. Relevant interfaces include PCI slots and PCI mezzanine card (PMC) sites. These add significant flexibility and scalability to the system architecture, allowing specialized hardware (even DSPs and ASICs), I/O interfaces, or storage to be incorporated into a cpu-based data path.

System Flexibility

Flexibility in medical imaging system architectures is important. Not only does it allow ever changing and advancing feature and performance requirements to be accommodated, but it also provides the opportunity for one common processing architecture to be used across multiple imaging modalities. Cpu-based architectures provide flexibility through upgradable and scalable processing elements (via COTS hardware), industry-standard coprocessing interfaces, and general-purpose software programmability. To emphasize the last of these: Cpus offer the ultimate software flexibility; the abundance of operating systems, tools, and commercial and open-source software available for general-purpose cpus is much greater than those available for DSPs.

The flexibility of cpus supports also the long life cycles expected of medical imaging systems. While embedded COTS cpu products offer long life guarantees, further insurance is provided by the ability to upgrade a processing element without revamping the architecture of the system or software. A given generation of DSP or ASIC silicon may have a longer life than a cpu, but upgrading to a newer part often requires a new hardware or software architecture or both, which substantially increases the cost of change.

The Time-to-Market Imperative

Time to market can override all other requirements in determining a system architecture. If a product cannot be delivered through the appropriate market window, the quality of its architecture is irrelevant. In this regard, cpu-based architectures are once again superior. Use of COTS hardware and software products for the basic data-path architecture allows the medical system manufacturer to focus immediately on the overall system architecture and software algorithms. As mentioned, the operating systems and tools available for general-purpose cpus are expansive and mature.

Programming in a general-purpose environment (the cpu) is much faster and much easier to maintain than programming in a special-purpose environment (the DSP), let alone designing specific hardware (the ASIC). While DSP software environments have improved, no longer requiring hand-coded assembly language programming, they still require careful management of the specialized silicon resources. The data-movement features of cpus and DSPs may be contrasted: DSPs provide integrated DMA controllers that allow efficient movement of large data blocks from one memory location to another or between memory and the processor, whereas cpus provide massive hierarchical data caches. The DMA approach yields higher performance but requires explicit software control. The cache mechanisms, on the other hand, are managed implicitly by the hardware.

It should be noted that recent generations of cpu chip sets have been providing DMA engines as an additional data-movement option, continuing the trend of borrowing selected features from DSP architectures.

Benefits of CPU Advancement

In view of all the requirements and constraints governing medical imaging systems design, cpu-based architectures clearly provide the most comprehensive coverage. It is fundamentally a question of specialization versus generalization: DSPs and ASICs provide the highest performance in answering a stable and constrained set of requirements, while cpus provide high performance in addressing a wider variety of less-predictable requirements. As general-purpose cpus have been deployed in more and in more-diverse applications, the aggregation of technological advancements in those processes developed to make them useful across multiple industries has increased their performance and capability to the point that cpu functionality is sufficient even for demanding applications such as medical image processing.

Once basic performance requirements are met, market demands drive system architecture development. General-purpose cpus offer the most cost-effective, scalable, flexible, future-proof, and development-friendly architecture. The great body of COTS cpu hardware and software can and should be used to reduce time to market and leverage a wide variety of technologies from other industries, such as enterprise computing and telecommunications systems.

As medical systems begin to demand higher service availability, COTS distributed computing and high-availability middleware software running on COTS cpu hardware subsystems can be incorporated into medical imaging systems without significant design effort. Such attributes on the COTS cpu hardware allows medical system manufacturers to focus on other areas, such as sensor data acquisition, complete system ergo-nomics, and image reconstruction and processing software.

Conclusion

The arguments presented here obviously cannot apply to every medical imaging system, but the trend is unmistakable. An architecture based on general-purpose cpus should be the first choice for system structure. That choice should be abandoned only if it can clearly be shown to be inadequate. And even in those cases, hybrid possibilities should be considered. Cpus could be used for the majority of the data-path processing, assisted by DSPs or ASICs for well-defined tasks. Throughout the life cycle of a system, the advantages of a cpu-based data-path architecture will be apparent.

Reference

1. JN Barkdull and SC Douglas, "General-Purpose Microprocessor Performance for DSP Applications," in Proceedings of the 30th Asilomar Conference on Signals, Systems and Computers (Pacific Grove, CA: IEEE Computer Society, 1997).

Andrew Alleman is principal system architect at RadiSys Corp. (Hillsboro, OR), leading product definition and architecture development for the company's system products.

Copyright ©2004 Medical Electronics Manufacturing