A Medical Electronics Manufacturing Spring 1998 Feature
DIGITAL SIGNAL PROCESSING
DSPs: A Growing Option for Medical Applications
Gabriel Spera
Improved digital signal processor (DSP) architectures handle more instructions at higher speeds, minimizing board space, development time, and overall project cost.The average high-tech consumer probably touches a DSP-enabled product every 10 minutes. Forward Concepts (Tempe, AZ), a market research firm specializing in microprocessors, expects the market for DSP chips to grow at a rate of about 50% annually, passing the $12-billion mark by 2001 and potentially reaching $50 billion in about 10 years. The real impetus for such growth is, of course, wireless communications and personal electronicsCD players, cell phones, telecom switches, HDTV, and the Internetbut medical equipment manufacturers have not been left behind. More powerful chips will have a profound impact on medical applicationsboth those that currently employ DSPs and those that so far haven't been able to.
The Basics
Very generally speaking, a DSP is a programmable semiconductor chip designed to take a stream of digital data and perform filtering, compression, encoding, error correction, and other mathematical algorithms. The design, or "architecture," of a DSP takes advantage of the repetitive nature of signal processing and employs specialized arithmetic hardware and memory-access schemes to speed up common signal-processing routines.
What's the difference between a DSP and a general-purpose processor, such as a Pentium? "I could take all week to answer that one," quips Ed Young, business development manager for Loughborough Sound Images, Ltd. (LSI; Loughborough, UK, and Lexington, MA). For one thing, he says, DSPs are optimized for real-time applications, capturing and processing signal data at incredible rates. Part of this speed comes from "pipelining" the instruction setstarting a second (or third or more) operation before the first is actually completed. While it is true that many modern microprocessors also use pipelining and can perform several instructions in parallel, they can't do as many as a top-notch DSP can. What's more, architectures used in general-purpose processors often require several instructions to perform operations accomplished with just one instruction on a DSP.
For example, according to the DSP-consulting firm Berkeley Design Technology, Inc. (BDTI; Berkeley, CA), the most common single task for a DSP is called a multiply-accumulate (MAC), which is particularly useful in digital filtering. A MAC entails multiplying two numbers and adding the result to the previous total. DSP architecture allows the processing core to access the memory several times during a single instruction cycle, enabling it to load several operands while simultaneously loading an instruction. As a result, a typical DSP can perform one MAC while loading the data sample and filter coefficient for the next. A DSP can perform a MAC in one clock cycle, whereas a typical microprocessor might take much longer. Dedicated address-generation units, which operate in the background (without using the main data path of the processor), form the addresses required for operand accesses in parallel with the execution of arithmetic instructions. In contrast, general-purpose processors often require extra cycles to generate the addresses needed to load operands, according to BDTI.
DSPs typically include specialized I/O interfaces or on-chip peripherals that allow efficient communication with other system components such as ADCs and memory. Most DSPs incorporate one or more serial and parallel I/O interfaces and specialized mechanisms for handling I/O, such as direct memory access (DMA), to allow data transfers to proceed with little or no intervention from the rest of the processor. "Another important point," says Young, "is that DSPs are much more deterministicyou know exactly how many cycles something will take."
Microprocessors, on the other hand, are relatively unpredictable because of extensive caching, branch prediction, and out-of-order execution. Times could vary by a factor of 10 or more, says Young. "That may be OK for a database, but it's not much good at getting the maximum performance on real-time processing, because you have to allow a big margin of safety."
Fixed Point and Floating Point
DSPs come in two varietiesfixed point and floating pointbased on the type of math employed. Fixed-point DSPs represent numbers as fractions ranging from 1 to 1; for floating-point DSPs, each value is expressed through a mantissa and an exponent. Both methods have advantages and disadvantages. Floating-point chips, for example, have a wider dynamic range than fixed-point chips, which is particularly important in scientific simulation and certain imaging applications. Dynamic rangethe spread between the largest and smallest numbers that can be representedaffects both programming and performance. For instance, an extensive series of MACs can generate numbers that are too large or too small for representation within the limited range of the fixed-point processor. To compensate, programmers have to modify their algorithms and scale their data.
Magnetic resonance imaging systems use digital signal processors for improved performance in image preprocessing.
Most floating-point DSPs use a 32-bit data word, while most fixed-point chips use a 16-bit data word. (One notable exception is the 24-bit fixed-point DSP family manufactured by Motorola in Austin, TX). Longer data words simplify programming, but they also place greater burdens on internal buses, data caches, and external memory devices, resulting in a larger, slower, more expensive chip. "Floating point is more convenient to use, and a lot of applications need to use it," says Young, "but fixed point tends to be faster and cheaper." Applications involving complex 3-D computationsfor example, ECG analysis, catheter signal processing, magnetic resonance imaging (MRI), and computed tomography (CT)typically use floating-point chips. Applications focused more on pixel processingsuch as x-ray and ultrasound equipmentcan more easily use fixed-point processors. "And if they can easily do so," adds Young, "then the system will be lower cost."
Processing Speed
Of course, even a "slow" DSP can process information at an amazing rate. Processing power today is measured in the tens or even hundreds of MIPS (millions of instructions per second) and MFLOPs (millions of floating point operations per second). In fact, the 'C67x, part of a planned processor family from Texas Instruments (TI; Houston), will soon raise the benchmark to 1 GFLOPS, and the company hopes to reach 3 GFLOPS by the year 2000. Analog Devices Inc. (ADI; Norwood, MA) expects that its third-generation SHARC processor, due out at the end of 1999, will perform 5 billion operations per second. Such dramatic increases in processing power will naturally benefit equipment designers, who will be able to achieve higher performance with fewer chips, thereby reducing board space, part count, and assembly costs.
For high-performance imaging, digital signal processors are designed for high-speed pixel and other fixed-point processing.
MIPS and MFLOPS are derived from a processor's instruction cycle time, which BDTI defines as the amount of time required to execute the unit's fastest instruction. For most current processors, which execute one instruction at a time, the reciprocal of the instruction-cycle time divided by 1 million gives the execution rate in MIPS. For example, ADI's SHARC performs one instruction at a time and has a 20-nanosecond clock cyclethe equivalent of 50 MIPS. Some high-end processors, however, will execute more than one instruction per clock cyclefor example, up to eight for TI's 'C6x. So, operating at 200 MHz, the 'C6x can perform eight instructions every 5 nanoseconds, which works out to 1600 MIPS.
Still, talking about MIPS and MFLOPS can be misleading. As David Jackson, marketing manager for ADI, explains, "MIPS, MOPS, MFLOPS, and GFLOPS are not transferable across manufacturers." ADI uses the IEEE standard definition for MFLOPS, says Jackson. "Unfortunately," he notes, "some manufacturers have started using inventive definitions for MIPS. They are using a data movement as an instruction or operation versus an actual mathematical computation." One of BDTI's core projects has been the development of a useful system for comparing DSPs with different architectures from different suppliers. A simple MIPS figure doesn't account for processor efficiency, the group says, and an instruction on one processor may accomplish far more work than an instruction on anotherespecially given the highly specialized nature of the instruction sets typically used for DSPs. In fact, in reviewing the latest-generation DSP from Lucent Technologies (Allentown, PA), BDTI found that the newer chip operating at 100 MIPS was substantially faster than the older chip operating at 120 MIPS. In comparing DSPs, Jackson says, "it's really the performance benchmark for a given application that mattersas well as bandwidth, development support, power, etc."
Input/output bandwidth is related to the speed of the external bus or other I/O peripherals, explains LSI's Young. "Generally, a faster DSP will need a higher I/O bandwidth because it can do more processing. Different applications also need more or less I/O versus processing." Wide I/O bandwidth, combined with a fast core and large on-chip memory, is partly responsible for the success of ADI's SHARC family of DSPs. "Using an analogy of a car," Jackson says, "a big V-8 doesn't do much good without tight steering and good brakesjust as a fast code with a bottleneck in the I/O and a small on-chip memory results in the core waiting for information rather than screaming along."
ADI's floating-point SHARC is something of an anomaly in that it offers up to 4 Mbit of on-chip SRAM, whereas most floating-point chips offer relatively little (although they compensate by having large external data buses). The SHARC architecture provides I/O bandwidth up to 300 MByte/sec and has four buses, enabling it to read an instruction from the instruction cache, read two operands, and perform nonintrusive DMA all in one cycle.
Development Tools
DSP suppliers emphasize processing time, but medical device manufacturers are just as concerned with a fast time to market. Fortunately, both chip suppliers and third-party vendors are taking the hassle out of DSP system development. Of course, this adds to the complexity of choosing a DSP partner. Development tools are by no means standardized. Jeff Bier, cofounder of BDTI, explains that "for a given processor, there's a baseline set of tools offered by the chip vendor, and then maybe some third-party independent tools, so there are lots of things to compare. All vendors have their own tools, but some have much better tools than others." Important tools to consider include software assemblers, compilers, linkers, debuggers, simulators, and libraries as well as hardware development boards and emulators. A simulator can be used to run the program code without the hardware, though not in real time, revealing, for example, places where the program algorithms need scaling. Some debuggers and emulators are more convenient to use than others, especially where a high-level program language is involved.
"DSP is a reasonably young industry," explains Young, "and software development support has lagged behind the PC industry, for example. Lots of people are making efforts to catch up, producing the right tools to make DSP systems easy to develop." LSI, for example, recently released its IDE6000, a fully integrated design environment with all the board configuration and debugging tools in one package, along with the project files, editor, and compiler"really a 'Visual C for DSP,'" Young calls it. IDE6000 was developed under Windows and UNIX platforms and provides a common interface for both PCs and workstations. Currently available for the SHARC and 'C6x boards, the software suite will eventually be released for all of LSI's DSP ranges. ADI itself is reportedly completing beta testing of its integrated GUI tools suite known as Visual DSP.
LSI is also one of the few third-party vendors with a dedicated medical division. While noting that size and quality are especially important to medical equipment designers, Young says that medical customers are also looking for long-term supply and maintenance capabilitiesmuch more so than customers in the research and general electronics fields. "Most of these people are moving from designing things in-house, where they have control over everything. So they need to feel comfortable with their supplier," he says. Not surprisingly, a documented quality systempreferably ISO 9001 certifiedis a high priority for device manufacturers, but so also is technology accessthe contacts that third-party vendors maintain with chip manufacturers because they want to be sure of getting the latest technology.
Programming Ease
Using the latest technology has become dramatically simpler in recent years as more DSP manufacturers are producing chips that can be programmed in C, a high-level language. "It's recognized that customers don't want to spend all their time developing in assembly language, so you'll see a shift," says Young. "Nowadays, the vast majorityif not allof development will be in C code, with only small bits that need to be optimized" through assembly language coding. Most development tools will include a C compiler, but not all compilers are created equal. As BDTI's Bier points out, "DSPs that allow programming in C are one thing. DSPs that do it efficiently are another thing entirely." For most cost-sensitive and performance-demanding applications, he says, the programmer typically still has to write a significant amount of the code in assembly language and optimize it by hand. "Compilers are improving," he says, "but you're still a ways away from being able to sit down and write a code completely in C and get it efficiently compiled for you."
Texas Instruments is aggressively promoting its C compiler, part of the package it has built up around the new 'C6x. As Will Strauss, president of Forward Concepts, puts it, "They're betting the farm that people will want it for that high-level language capability and compiler efficiency." Indeed, TI claims that its compiler is three times more efficient than competing high-end compilers, based on a series of benchmark algorithms. According to Cheryl Shepard, of 'C62x marketing for TI, the boost in efficiency is a direct result of developing the compiler in conjunction with the chip. "In the past," she says, "you'd write your code in C, compile it, and your compiler wouldn't be able to take advantage of the architectureyou'd have to go into the code and optimize it. We have a C compiler that can take advantage of our architecture because it was developed concurrently."
The architecture is known as VelociTI, a proprietary setup based on advanced VLIW (very long instruction word) technology. VelociTI is what allows the 'C6x to achieve such high instruction-level parallelism. "With VLIW," says Shepard, "we have eight functional units, which are orthogonaloperating independently of each otherand within that, we have a deterministic pipeline," which lets the compiler know exactly how long an instruction will take to implement. As a result, the compiler can schedule the different instructions for greatest efficiency. Instruction packing, conditional branching, variable-width instructions, and prefetched branching are just some of the benefits of VelociTI architecture.
TI's package of development tools also includes an assembler that automatically schedules and parallelizes instructions from serial, in-line assembly code. "We've put a lot of emphasis on our C compiler," Shepard says, "but sometimes a customer wants to optimize even more. With our assembly optimizer, you can write linear assembly, run it through the assembly optimizer, and have it schedule everything for youyou don't have to worry about which functional unit is available." By simplifying the programming phase, DSPs such as the 'C6x expedite the development cyclealways a plus for the medical manufacturer. As Young puts it, "customers don't have time to spend on lots of assembler programmingthey've got devices to build."
Commonality and Multiprocessing
The 'C6x has caused quite a stir recently, and not just for its C compiler. Strauss comments that "lots of people are looking at it if not already designing with it," and Young calls it "the biggest development in the last year." Young explains that "most DSPs only execute instructions one at a time, but they are fairly complex instructions doing several things. The C6x can execute up to eight instructions in parallel. These are slightly simpler, but you can see the overall benefit. This is unique among DSPs at the moment."
The basic architecture not only allows the 'C6x to achieve high processing speeds, but also lets it function as either a fixed-point ('C62x) or floating-point ('C67x) chip. "The cores are very similar," says Shepard. "We just added floating-point functionality to six of the eight functional units." The benefit, she says, is that manufacturers can easily migrate from one DSP type to the other. Commonly, designers will develop their devices using the simpler floating-point chips, then move over to cheaper fixed-point chips later in the product life cycle. "That means starting over," says Shepard, "relearning the architecture, relearning the instruction set, relearning the tools." By using one common architecture, she says, designers can start with floating-point processing and move to fixed-point computation more easily. "You're talking one development team and a much faster transition." Designers will still have to rewrite some instructions, she says, "but they're not starting over from scratch."
Many computationally intensive applications may require a number of DSPs. Within one system, a manufacturer might use a bank of floating-point chips for functions requiring higher dynamic range together with a separate set of fixed-point chips for functions requiring greater speed. Such a system would require two development teams to work with the two distinct DSP architectures. With the 'C6x, says Shepard, "they can design the whole system around the one architecture. It saves a lotin terms of time to market, resources, and time spent climbing the learning curve."
Commonality is an important consideration for device manufacturers. Considering the projected growth in the DSP market (and the technological advances that such growth will bring), manufacturers will probably need to update their devices, moving to faster and more powerful chips as they're released. Unfortunately, most DSPs are not code-compatible across generations. "This incompatibility," says Jackson at ADI, "results in manufacturers needing to start from ground zero when moving to a faster DSP." ADI's second-generation SHARC, due out this year, will be fully code-compatible with the first generation. In contrast, a software translation tool is needed to transfer the programming from TI's 'C4x to the 'C6x. Jackson also points out that the basic SHARC architecture, like that of the 'C6x, can accommodate both fixed- and floating-point operation.
When using several DSPs in a single system, designers also need to consider how the individual chips will communicate with each other. Perhaps only two DSPs are specifically designed for multiprocessing applicationsTI's 'C4x and ADI's SHARC. "In using multiple processorswhere there's a whole added degree of complexityone aspect to consider is how do they talk with each other?" says BDTI's Bier. "Analog's SHARC family has some particularly sophisticated capabilities, giving many options for connecting multiple processors for efficient communications."
LSI's PCI/66-P2 SHARC board incorporates DSP processors, a PMC module site for I/O, and a high-speed, 32-bit PCI interface.
For example, the SHARC 21061 can be grouped in clusters of up to six units, thanks in part to a glueless multiprocessor arbitration scheme, which, explains Jackson, "saves the designer from having to add external glue logic at additional cost and board space to arbitrate which of the six processors has control of the bus." In another move to capture more of the multiprocessing market, ADI released its SHARC 21062 in a plastic ball grid array package measuring 23 mm on a sideless than half the size of a plastic quad flat packenabling designers to double the number of DSPs on a board. Of course, switching to a more powerful chip might achieve the same goal, but that most likely would entail developing a new software program.
Medical Applications
Capital medical equipment still represents a small portion of the overall DSP market, which is largely dominated today by cell phones, pagers, and similar devices. Still, more-powerful DSPs should invigorate the market and open up new opportunities. "One of the problems we've always had with CT and MRI is enough horsepower at a reasonable price," remarks Forward Concepts' Strauss. "That's improving considerably." Nowadays, he says, designers are no longer forced to go the multiprocessor route. More-powerful DSPs will not only improve image quality (and subsequent diagnosis) but, change the way the equipment is designed. Young at LSI expects to see floating-point performance increase by a factor of 10 in the next few years, paving the way for real-time CT-scan analysis. Strauss predicts that DSPs will become more cost-effectiveand therefore more prevalentin devices such as Doppler blood-flow monitors and fetal and infant monitors as well as in MRI, CT, spectroscopy, and ultrasound machines. "The electronics are going to get cheaper," he says. "Whether or not the equipment will get cheaper is a different matter." As a case in point, ADI will begin sampling a $10 SHARC this spring with applications predicted in EEG, ECG, and portable monitoring equipment. Young notes that DSPs are starting to replace application-specific integrated circuits (ASICs) in x-ray systems because they have finally become fast and cheap enough.
Ultrasound represents a particularly promising market for DSP technology. Application engineers at TI, says Shepard, are excited by the prospects of 3-D representation, which requires intensive computational capabilities. A DSP-enabled ultrasound system, for example, could present a sharp 3-D image of a developing fetus that keeps pace with the transducer head as it is moved to different positions, giving the health-care provider a more realistic picture for analysis. Such technology is already at work in the SieScape imaging system recently introduced by Siemens Medical (Issaquah, WA), which uses a pair of TI chips collectively capable of 4 billion operations per second. SieScape imaging presents panoramic sonograms, enabling health-care providers to analyze complete images rather than have to piece together a series of separate, smaller pictures. Strauss also foresees more-portable systems, suggesting that "as more horsepower gets into a smaller space, we'll start to see handheld ultrasound equipment."
Computationally intensive 3-D processing will also assist in medical simulations for R&D, drug development, and therapy. "I've seen 3-D computing used to simulate medical situations," says Shepard"for example, simulating a heart and then seeing what happens when you add a drug to it." Along similar lines, LSI recently helped develop a mapping system capable of constructing a 3-D image of the heart for real-time location of a cardiac catheter; the system is designed to replace less-accurate fluoroscopic techniques.
Of course, imaging is by no means the only medical sector that will benefit from better DSPs. Prosthetics, for example, also show great promise. Strauss points to the introduction last year of DSP-based hearing aids, which are orders of magnitude more efficient than traditional models. "Conventional hearing aids used an analog circuit and amplified everything, including background noise," he says. "They came out with some schemes for shaping the pass bands, cutting off the high and low ends, but it really only helped a little bit. Only DSPs can actually filter out background noise."
DSPs will also foster rapid growth in telemedicine and telepathology, Strauss predicts, an area, as he sees it, with "lots of government money thrown at it." To be useful, a telemedical system must combine real-time performance with clear transmission of video images and soundtasks well suited to a DSP. Telepathology requires a more sophisticated interfaceinvolving both electromechanical and video transmissionto enable a physician to examine a tissue sample through a microscope from hundreds of miles away, manipulating the instrument as needed. Strauss reports that telemedicine is currently the second-largest sector in the overall videoconferencing market.
The Market Future
The projected market for DSPs is lucrative enough to have drawn the attention of some major general-purpose processor manufacturers, who are trying to muscle their way in. "We're going to see more RISC processors that have better DSP performance," says Strauss, noting that IBM and Motorola have been collaborating on a chip based on the PowerPC architecture. Bier agrees, adding, "General-purpose microprocessors are beginning to be modified to give them serious DSP capability because vendors are recognizing that there are many applications that need DSP, and that over time, DSP may become more important." How do the DSP-mimicking chips stack up? "If you're looking at floating point, then yes, a general-purpose processor can compare very favorably," says Bier. As for fixed point, he says, "general-purpose processors with DSP enhancements can compete well on absolute performance but not on cost performance. But there are some exceptions."
The DSP market can expect to encounter new sources of competition and new kinds of competition, Bier predicts. To realize the 3035% annual growth projected for the market, DSP providers will have to work on enhancing their existing products while fending off challenges from new arenas. Whatever the outcome, medical equipment manufacturers can look forward to faster, cheaper, simpler chips for both high-end and commodity equipment and instrumentation.
Photos courtesy of Loughborough Sound Images, Ltd.
Further information about DSPs can be found at the following Web sites:
- Analog Devices: www.analog.com
- Berkeley Design Technology, Inc.: www.bdti.com
- Forward Concepts: www.fwdconcepts.com
- Loughborough Sound Images: www.lsi-dsp.com
- Texas Instruments: www.ti.com



