Skip to : [Content] [Navigation]
 

MEDICAL IMAGING

Using FPGAs to Implement New Medical Imaging Capabilities

Field-programmable gate arrays provide data acquisition and coprocessing support for scalable cpu platforms, making more-sophisticated imaging possible.

Charles Jenkins

Medical imaging techniques are assuming an increasingly important role in healthcare. That is because the healthcare industry is striving to detect—and even predict—disease at an earlier stage and promote treatment using noninvasive means, while at the same time lowering diagnostic and therapeutic costs. Primary factors driving the design of new equipment to achieve these goals are the fusion of diagnostic imaging modalities in combination with approaches and advances in imaging algorithm development.

To provide the functionality needed to meet these healthcare industry aims, equipment developers are turning to scalable, commercial off-the-shelf (COTS) central processing unit (cpu) platforms with field-programmable gate array (FPGA) support for data acquisition and coprocessing. They must consider several factors in developing flexible, scalable medical imaging equipment efficiently. These factors are the development of imaging algorithms, the use of several imaging technologies synergistically (fusion of modalities), and platform scalability.

The development of imaging algorithms calls for the application of high-level intuitive modeling tools for the continual improvement of digital signal processing (DSP) algorithms. These advanced algorithms require scalable system platforms that offer significant increases in image processing performance. The scalable platforms should enable implementation of smaller and more-accessible portable equipment.

For near-real-time analysis to be possible, system platforms have to scale with both software (the cpus) and hardware (the amount of configurable logic). These processing platforms must meet various performance price points and must be capable of bridging the requirements between multiple imaging techniques. FPGAs can be easily integrated into multicore cpu platforms, providing the DSP horsepower for very flexible systems offering the highest performance.

System architects and design engineers need to quickly partition algorithms on these platforms, and then debug them using high-level development tools and libraries of intellectual property (IP). This process accelerates platform deployment and thus maximizes profitability for the manufacturer.

This article examines some current trends in medical imaging algorithms, the fusion of modalities, and scalable platforms to implement the algorithms.

Algorithm Developments

The discussion should begin with an examination of trends in imaging algorithms for each modality, including consideration of how FPGAs and IP are used.

MRI. Magnetic resonance imaging (MRI) produces cross-sectional images of the human body. Three functions employing FPGAs are used to reconstruct a three-dimensional volume from the cross sections. The fast Fourier transform (FFT) generates 2-D slices in gray level, typically matrices, from the frequency domain data. Reconstruction of 3-D volume involves interpolation between slices to create an interslice distance approximating the interpixel distance, so that images can be viewed from any 2-D plane. Next, iterative resolution sharpening takes place. This function uses a spatial deblurring technique based on an iterative inverse filtering procedure that reduces noise while the image structure is refocused. Therefore, the overall visual diagnostic resolution of the cross section is significantly improved.

Ultrasound. The granular appearance of ultrasound images is a phenomenon called speckle. Speckle is caused by the interaction of various independent scatterers (similar to multipath radio-frequency reflections in the wireless domain) and is multiplicative by nature. Ultrasound images can be despeckled by means of lossy compression. First, the logarithm of the image is taken; speckle noise becomes additive to the desired signal. Noise can then be minimized via lossy wavelet compression using a JPEG2000 encoder.

X-ray. Motion correction of coronary x-ray images is an algorithm used to minimize the effects of the cardiac-respiratory cycle—breathing and heart pumping—on imaging. The motion of a 3-D-plus-time coronary model is projected onto the 2-D x-ray images, allowing for the computation of a dewarping function (translate and zoom) that corrects for the motion and results in clearer images.

Molecular Imaging. Molecular imaging is the characterization and measurement of biological processes at the cellular and molecular levels. Its purpose is to detect, capture images of, and monitor abnormalities that can herald disease. For example, x-ray imaging, positron-emission tomography (PET), and single-photon-emission computed tomography (SPECT) can be combined to map functional, cellular, and molecular images at low resolution onto corresponding anatomical features at resolutions down to 0.5 mm. The trend toward greater miniaturization and the exploration of new algorithms push performance beyond the capabilities of multicore cpus and make the incorporation of FPGA technology necessary in these compact systems.

Modality Fusion. Achieving earlier disease determination and noninvasive treatment are the imperatives driving the combination of imaging techniques seen, for example, in PET/computed tomography (CT) systems and x-ray-treatment/CT equipment. The higher levels of image resolution needed to satisfy current performance demands require fine-geometry microarray detectors coupled with FPGAs for the preprocessing of photonic and electronic signals. Once preprocessed, the signals are integrated and processed by a combination of cpus and FPGA coprocessors to create detailed images of the body.

Non-real-time (NRT) image fusion, or registration, is used routinely to align functional and anatomical images taken at different times. NRT image registration is problematic, however, because of variations in patient positioning, differences in scanner bed profiles, and the involuntary movement of the patient's internal organs. The real-time fusion of PET and CT using FPGA processing allows both functional and anatomical images to be acquired and blended during a single imaging session, rather than the images being superimposed, as in the past, post hoc. The fused images provide much better definition and location accuracy for surgical treatment.

Image processing during surgery for the guidance of doctors involves the registration of preoperative CT or MRI images with real-time 3-D ultrasound or x-ray images to facilitate the application of noninvasive therapies such as ultrasound, magnetic-resonance interventional, and x-ray treatment. In this area, various algorithms are being developed to provide optimal registration results for particular combinations of imaging modalities and therapies.

In fused combination systems like these, FPGAs with high-speed serial interconnects can reduce the interconnection requirements of linking the data acquisition functions to the postprocessing portion of the systems, significantly lowering total system costs by eliminating the need for additional boards and cables.

Imaging Algorithms

Several different imaging algorithms are commonly implemented in FPGAs. These include enhancement, stabilization, wavelet analysis, and distributed vector processing.

Image enhancement uses convolution, or linear, filtering. The linear combination of the high-pass and low-pass filtered images, weighted by a mask via matrix multiplication, produces an image in which detail has been enhanced while noise has been reduced.

Stabilization of video images involves normalizing rotational and zooming effects in video data sequences to average out noise among successive frames. This algorithm additionally smooths jagged edges found in still images extracted from video and can correct image jitter to about a tenth of a pixel.

Designed to help acquire event information within signals, wavelet analysis uses a windowing technique—with the size of the windows varying—to analyze small sections of the signal at a time. Wavelet analysis allows the use of long time intervals for more-precise low-frequency information and of shorter regions for high-frequency information. Wavelet applications include detecting discontinuities and breakdown points, detecting self-similarity, suppressing signals, denoising signals or images, compressing images, and performing fast multiplication of large matrices.

The recently developed S transform combines features of the FFT and wavelet transforms. It reveals frequency variation over both space and time. Applications of this function include texture analysis and noise filtering. S transform is computationally intensive, however, making conventional cpu implementations too slow. Distributed vector processing addresses this problem by combining vector and parallel computations within FPGAs, resulting in a 25-fold reduction in processing time.

One method of early cancer detection exploits cancer's ability to recruit a new blood supply. A digital sensor detects the infrared energy emitted from the patient's body. Therefore, it can sense the minute variations from normal that are associated with increased blood flow due to cancer. A typical application of this capability is based on a programmable systolic array implemented via a general-purpose workstation and a special-purpose hardware engine based on FPGAs. The FPGA engine can accelerate the core algorithm to nearly 1000 times the rate achievable by a state-of-the-art workstation.

Multiple FPGA building-block functions are required for these sophisticated imaging algorithms. CT reconstruction, for example, calls for interpolation, FFT, and convolution functions. In ultrasound, processing methods include color flow processing, convolution, beam forming, and elasticity estimation. General imaging algorithms encompass such numerous functions as color space conversion, graphic overlay, 2-D-median-temporal filtering, scaling, frame and field conversions, contrast enhancement, sharpening, edge detection, thresholding, translation, polar and Cartesian conversion, nonuniformity correction, and pixel replacement.

Scalable Platforms

Historically, many imaging systems have been built as proprietary computational systems. But with the current availability of high-powered commercial off-the-shelf (COTS) cpu boards, system engineers can take a more out-of-the-box approach to implementation. Although NRT processing of many algorithms is acceptable in terms of software alone, real-time image processing requires a hardware assist. Today's FPGAs, with their built-in DSP blocks, high-bandwidth memory blocks, and large arrays of programmable elements, are devices perfectly suited to provide this hardware assistance.

Figure 1. System diagram of the XD 1000 coprocessor daughtercard from XtremeData Inc. (Schaumburg, IL), which plugs directly into an AMD Opteron socket 940 of a multi-Opteron motherboard. The daughtercard uses the motherboard's existing cpu infrastructure.
(click to enlarge)

Altera Corp. (San Jose) has worked closely with partners to provide fast, reliable integration of FPGA coprocessing resources with COTS cpu solutions. For single-board computers (SBCs) from Intel Corp. and Advanced Micro Devices Inc. (AMD), Altera's Stratix II GX FPGAs with built-in serializers-deserializers can directly implement PCI Express–compliant coprocessor boards for algorithm off-loading. For AMD SBCs with dual sockets, XtremeData Inc. (Schaumburg, IL) offers a coprocessor daughtercard that plugs into one of the AMD Opteron processor sockets directly, providing an elegant cpu-plus-FPGA processing solution (see Figure 1). A quad-socket AMD SBC can provide several cpu-plus-FPGA coprocessor combinations (1+3, 2+2, or 3+1) to deliver higher performance in algorithm-intensive applications. But the ultimate platform scalability can be provided through the use of multiple 1U server blades, each implementing the cpu-plus-FPGA coprocessor solution.

Application acceleration of these platforms depends on the algorithm: the more parallel calculations in an algorithm that can be off-loaded to the FPGA, the faster the overall execution. For example, an FPGA-based hardware acceleration of a CT imaging algorithm executes the overall application 10 times faster when each 3-GHz cpu is coupled with an FPGA coprocessor. The result is significant system-level savings in terms of power, space, and cost.

Development Methodology

This discussion naturally concludes with a consideration of the methodology for developing algorithms and the corresponding tools for implementing them.

Algorithm Tools. Imaging-system architects use high-level software tools to model various algorithms and evaluate the results obtained. The leading general-purpose tools for digital signal processing are the MATLAB processing engine and Simulink simulator graphical user interface from The MathWorks Inc. (Natick, MA). Most original equipment manufacturers and medical design houses use MATLAB to develop fast, accurate algorithms such as digital image processing, quantitative image analysis, pattern recognition, digital image coding and compression, forensic image processing, and 2-D wavelet transforms. In addition to algorithm development, MATLAB can be used to simulate the fixed-point arithmetic commonly employed in FPGAs and, with optional tools, can generate C code to run on a general-purpose cpu or inside an FPGA.

Partitioning and Debugging. Once the algorithms are developed, system architects must decide how to partition the functionality between the cpus and FPGAs to provide the best overall solution—the one that optimally balances performance, cost, reliability, and longevity. Equipment architects lament that partitioning algorithms among elements of a high-performance hardware system, and debugging them, is a challenge. Historically, many designs have used an assembly line approach within the FPGAs. That is, the algorithms are split into functions and executed in a sequential pipeline. Debugging pipeline operations can constitute as much as 90% of the integration effort. The difficulty arises from the fact that the execution time for each function must be balanced for maximum throughput, and that the visibility of local memories and delays is restricted.

Figure 2. Software-centric design of imaging system architecture (distributed processing) removes the difficulty of portioning and debugging that is common with assembly-line processing.
(click to enlarge)

The solution is a more software-centric method of system design (see Figure 2). Such a system is based on a distributed-coprocessor computing model in which each function in the coprocessor is an execution machine (i.e., a functional subprocessor) with a message-based capability for passing control and data between subprocessors. Full switching between all memory, cpus, and subprocessors provides complete observability and facilitates debugging. Message passing scales internally between the FPGA subprocessors and externally to other cpus and coprocessors within the system.

Altera's Avalon switch fabric inside the FPGA and a system-on-a-programmable-chip (SOPC) integration tool automatically build a flexible crossbar switch fabric between all functional elements. Pretested IP provides interfaces from the FPGA to the host cpu and from FPGA to dual in-line memory module (DIMM) memory. Pretested message-based infrastructure provides control communication between the host cpu, FPGA subprocessors, and FPGA memory controllers. A simplified debug methodology is achieved by combining messaging and full switching, which enables maximum flexibility during development. Finally, the data path can be soft-defined (redefined) during execution, while data can be intercepted or redirected to enhance observability during system integration and debug.

Design Tools and IP. While tools such as MATLAB may be optimized for algorithm development using software, they are not sufficient for implementation into FPGAs. Designers can accelerate their implementations onto FPGAs using electronic design automation (EDA) tools and IP.

Figure 3. Video- and image-processing suites provide building blocks for development of imaging algorithms.
(click to enlarge)

Video- and image-processing suites and DSP libraries provide IP building blocks that can accelerate the development and implementation of sophisticated imaging algorithms. Video- and image-processing blocksets, along with other IP modules and reference designs (including in-phase/quadrature (IQ) modems, JPEG2000 compression, FFT/inverse FFT, and edge detection) give designers a broad range of IP they can use to complete FPGA implementations of computationally intensive tasks quickly (see Figure 3).

Conclusion

Aging baby boomers are seeking greater access to new diagnostics and therapies for such highly common afflictions as heart disease and cancer, including methods of earlier detection and minimally invasive surgical treatments. Advances in the combining of various diagnostic imaging technologies and in developing their associated algorithms are driving the creation of new equipment to answer these patients' demands. Advanced algorithms require scalable system platforms offering significant increases in image-processing performance.

Integrated into COTS multicore cpu platforms, FPGAs provide the digital signal processing horsepower for the most flexible, highest-performance systems. To help accelerate the implementation of sophisticated imaging algorithms onto these platforms, high-level development tools and IP implementation libraries are needed. Tools and IP libraries that deal with these concerns have been developed.

Charles Jenkins is a senior technical marketing manager, test and medical business unit, at Altera Corp. (San Jose, CA). He can be contacted at chjenkins@altera.com.

Copyright ©2006 Medical Electronics Manufacturing