Skip to : [Content] [Navigation]
 

Pharmaceutical and Medical Packaging News Magazine
PMPN Article Index

Originally Published March 2000

LANGUAGE MATTERS

We're Living in a Two-Byte World

Are your labeling data Unicode compliant? If not, you may be building obstacles to on-demand printing.

by Robert Sprung

On-demand printing has practically become a mantra in medical and pharmaceutical labeling. The term refers to technology that permits labeling data to be recalled and printed in real time, anywhere in the world, ideally in any language.

Engineers are developing sophisticated database and document management systems to help usher in this new golden age. But many companies are overlooking whether their labeling data—the actual text of packaging and inserts—are truly universal and reusable. The fact is that vast oceans of such data are currently in formats that are anything but. Many companies use:

  • Different operating systems to print in multiple target languages.
  • Different fonts to print in various languages (e.g., separate fonts to print Greek on a Macintosh and on a PC).
  • Static art file formats such as PDF, EPS, or TIFF to capture and print simple text in Japanese, Chinese, or Russian.

Why the confusion? The problem goes back to the early days of computing, when a single byte was chosen to be the "word" that the computer spoke. One byte is composed of eight bits (zeroes or ones) and could represent 256 possible combinations (28). Thus was born the ASCII standard, which has remained with us to this day.

ASCII contained all common English characters—the standard was, after all, developed largely by U.S. powerhouses like IBM. It included diacritical marks for the major European languages (umlaut, acute and grave accents, tilde). If you used a computer anywhere in the world, an "A" on one machine would be an "A" on another ASCII-based machine—as long as your language was English, Spanish, or French.

But standard ASCII was simply too limited to accommodate non-Roman languages such as Russian or Czech, to say nothing of ideographic languages like Chinese or Japanese. Since there is no global standard for Greek, for example, a simple Greek text file on a Macintosh will often be unreadable on a PC without custom software—the antithesis of an "open" system.

The answer to this linguistic Babel has been known to computer scientists for years. Using two bytes instead of one would suddenly open up enough space for most of the world's languages. If one byte can store 256 characters, two bytes can store 216 possible characters (16 zeroes or ones in any combination), for a total of 65,536!

This is enough for all languages your company is currently using in labeling—French, Finnish, Greek, or Japanese—as well as any you might add. Unicode covers Arabic, Bengali, Korean, Tibetan. . . you get the picture. It also includes arrows, diacritical marks, math symbols, and dingbats.

If a two-byte standard like Unicode is so logical, why wasn't it adopted decades ago? Limited computing power made every byte precious, but the answer lies deeper. At midcentury, the computing world was English-centric (some felt that English would emerge as the universal language of the computer—both for written communication and for programming). Why invest heavily in a standard for Chinese or Hindi when practically no one even owned a computer in countries using those languages or when they might end up using English anyway?

Today, the computing world is decidedly international and multilingual. Given tremendous strides in computing power, booming international trade, and the World Wide Web, Unicode's day has dawned. The standard is supported by such firms as IBM, Xerox, Microsoft, Sun, and Oracle. And many products are already Unicode enabled: Microsoft Windows NT and Windows 2000; Java, IBM OS/2, Oracle 8; and Netscape Navigator.

What are the implications for those working in multilingual labeling?

  • Understand and communicate that Unicode will be the standard for multilingual character sets, at least among the cutting-edge companies that want to have maximum reusability and reliability of their data.
  • Do not invest in outdated technologies that do not support Unicode or lack a clear path to Unicode migration.
  • Develop a strategy for ensuring that all data in all languages are kept as live data elements, not as static artwork or as text that relies on proprietary operating systems or fonts.

To learn more about Unicode, visit http://www.unicode.org.

Robert Sprung is the chairman of Harvard Translations (Cambridge, MA). To contact him, send an e-mail to robert_sprung@htrans.com or visit http://www.htrans.com.



Return to the PMPN March table of contents | Return to the PMPN home page


Copyright ©2000 Pharmaceutical & Medical Packaging News