Originally Published MEM Fall 2001
MEDICAL SOFTWARE
Software Development for Safety-Critical Embedded Medical Applications
The tools and techniques for a successful development project are outlined.
Robert J. Duquaine and David W. Murphy
Developing software for safety-critical embedded medical systems is not much different from developing software for ordinary systems in terms of development models and verification techniques. But a safety-critical system has the potential to cause harm, in its intended function as well as owing to malfunction or misuse. Harm is defined as injury to a user, a bystander, a patient, or even the environment. Because of this potential for harm, the development of safety-critical software involves special tasks that must be incorporated into the project schedule in order to ensure the safety and robustness of the finished product. This article describes the key additional requirements, how to fulfill them successfully, and where they should fit into the development schedule.
|
Figure
1. The flow of a typical software development project.
|
The process of software development is charted in Figure 1. In the illustration, the phases of development are presented as a classic waterfall. Spiral, incremental, and evolutionary models of development are offshoots of the waterfall in which the same basic phases are visited more than once, with the output of each iteration depending on the particular model chosen to follow. In these models, the additional tasks involved in a safety-critical development process should be repeated at each iteration.
Many development tasks that are part of all software projects whether safety-critical or note.g., functional specifications, design documents, test plans, and other deliverablesare not discussed here except when special consideration is required. Instead, the presentation focuses on the following tasks, which are unique to safety-critical systems or have special significance in the development of safety-critical software:
- Generating a software development plan.
- Performing a risk analysis.
- Executing a fault tree analysis.
- Maintaining traceability.
- Managing change requests.
- Maintaining an accurate revision history.
The Software Development Plan
While there are many reasons that a software development project can be unsuccessful, successful projects have in common good project organization and well-established quality procedures.
The Institute of Electrical and Electronics Engineers (IEEE) has defined a pair of documents that can help a software development team structure its processes so as to meet with project success. These are the software quality assurance plan (SQAP) and the software configuration management plan (SCMP).1 The SQAP specifies the processes, documentation, and review schedules used in developing the software, while the SCMP addresses the software release procedure, tools used, and media control issues. Rather than maintaining two documents, it may be more convenient to consolidate the contents of the two IEEE documents into a single document called the software development plan (SDP). IEEE identifies the areas that a good development plan should address (see Table I). The following paragraphs discuss important areas addressed by the SDP.
| SDP Section | Function |
| Purpose | Explains why the plan is being written. |
| Reference documents | Describes documents that were used in writing the SDP. |
| Management | Describes project responsibilities and tasks. |
| Software documentation | Lists all documents used in the design, implementation, and verification of the product. |
| Standards and practices | Identifies the standards used in developing the product and explains how adherence to these standards is monitored. |
| Reviews and audits | Defines the schedule for all reviews and explains how corrective action is to be implemented and verified. |
| Test | Describes the types of testing performed on the product, the process of executing the tests, and the evaluation of test results. |
| Problem reporting and corrective action | Addresses the procedures for documenting all problems, assessing their causes, and implementing resolution of issues. |
| Tools, techniques, and methodologies | Identifies tools and methodologies, such as the universal markup language, used in development. |
| Configuration management | Describes the components of a software release (binaries, associated documentation, etc.), release procedures, media control, and supplier control. |
| Records collection, maintenance, and retention | Designates what software QA documentation is to be stored, how it is to be archived, and the duration of retention of these records. |
| Risk management | Describes how project risks are communicated. |
| Table I. The contents of a software development plan, as suggested by IEEE. | |
Software Documentation. This section of the SDP lists the set of documents used in developing and verifying the product. The list would typically include:
- A software requirements specification.
- Software design documents.
- A module-level test-plan document.
- A system-level test-plan document.
- A risk analysis.
- Fault tree analysis documentation.
- Traceability documentation.
Each listing should contain a document number and a description of the purpose of that document. In addition to providing a concise reference to documentation used throughout the project, maintaining this section of the plan forces the development team to consciously decide what documentation is required. It is a mistake to include the revision numbers of these documents in the SDP. Doing so will require constant maintenance of the SDP as revisions are made.
Reviews and Audits. The review process is of critical importance to the overall quality of the finished product. Because of this, it is common to document a review schedule in the SDP. The purpose of the review schedule is not to provide hard dates for performing reviews, but rather to place review occasions in relation to other milestones in the design process.
Table II is a template that can be used to create a review schedule.
| Review (Development Stage) | Timing | Document Completion Requirements | Participants | Governing Process |
| Software development plan | Prior to proposal signoff | Complete | Names of people involved in the review | Reference to a process document such as an ISO-related quality document |
| Risk analysis | 1. Before requirements signoff 2. During software design reviews | 90% complete | ||
| Table II. A template for a software development review schedule. | ||||
Note that the risk analysis is reviewed at multiple points in the design schedule (see Table II). Also, the risk analysis is an example of a living document that needs to be revisited throughout the development life cycle and periodically updated. It is valuable to review such documents even though they are not 100% complete. After the review at the 90% level, minimal changes are to be expected. Should a significant change occur, the document may require additional review. Therefore, it may be appropriate to show the document being reviewed after it is nearly but not quite complete, and to footnote the review schedule to explain the completion status.
Configuration Management. One of the most important responsibilities of the software developer is to manage the source code and follow a consistent procedure for releasing it. The purpose of the configuration management section of the SDP is to lay out a process for meeting these goals. At a minimum, this section must address the following areas:
- Software release set, which includes any binaries, source code, release notes, or anomaly logs that are relevant to the release.
- Software release process, which defines what is done when releasing the software and should be written so that the document remains flexible.
- Software revision control, which typically defines the method used to maintain a revision history linking different versions of the software, and commonly references tools used to facilitate this (for example, the revision control system, RCS, or concurrent versioning system, CVS).
- Software change control, which defines how change requests are tracked.
- Media control, which defines how the software is physically stored and retrieved, and indicates whether backups exist.
Developing the SDP
Putting together a solid software development plan can look like a daunting task at the beginning of a project, especially in view of the number of items that need to be addressed in the document. Fortunately, several reference sources outline ways to develop a good SDP. One is IEEE Standard 730.1-1995.1 This document describes in detail all of the areas the plan should address. Note that although all of these sections should be present in an SDP, each software development project may involve special considerations that need to be taken into account when generating the plan.
Besides taking advantage of convenient reference material, wise SDP writers will be sure to develop the SDP at the beginning of the project. Because the SDP defines the process that the development team will use for all phases of the development cycle, it is critical for this to be in place before other tasks are performed. Developing a process after the fact tends to document only what was done, which may differ significantly from what would have been done had the process been thought out ahead of time.
The document should be kept flexible. Often, developers unnecessarily impede their own work by defining their processes too strictly. One method of keeping the SDP flexible is to state only what tasks need to be done, and not how they are to be performed.2 This allows the process to be stated without it getting muddled by the details.
It is unnecessary to reinvent the wheel. If the equipment manufacturer has already developed an SQAP or SDP, that previous version might be used as a template for the current project. And of course, if this is the first time an organization has undertaken to create a document of this type, it should consider making the SDP the basis of a template that product developers can use on future projects.
Risk Analysis
Risk analysis is the process of identifying, and estimating the severity of, all potential ways that the system can cause harm. The type of harm potentially caused is called a hazard. Sometimes the term hazard analysis is used synonymously with risk analysis, but there is a subtle difference between the two that will be explained later.
A central goal of risk analysis is the design of mitigations for each hazard to either eliminate it or reduce its risk to an acceptable level. Risk analysis is a system-level activity. It should take into account the hardware-related and mechanical hazards of a system, not just those associated with the software.
Risk analysis should be started very early in the development cycle. The process should be revisited each time there is a change in the system in order to determine whether the change introduces new hazards or new causes for already existing hazards.
The Risk Analysis Matrix. Table III is a template representing the way the results of a risk analysis are documented. This table, sometimes referred to as the risk management summary, should be included in the document known as the risk management plan. The following discussion explains each component of the risk analysis matrix.
| Hazard ID | Hazard | Severity | Cause(s) | Likelihood | Risk Index | Mitigation |
| H0001 | Laceration to operator due to unexpected activation of motor. | Critical | Invalid motor position sent to control unit. | Occasional | ALARP | Design fault-detection logic that can identify errors in reported position. |
| H0002 | Patient experiences discomfort due to extended activation of device. | Marginal | Activation switch becomes shorted during operation. | Improbable | Acceptable | 1. Determine typical switch lifetime.2. Implement software diagnostic to inform the user that maintenance is required. |
| Table III. A sample risk-analysis matrix indicating how each identified system hazard should be documented. | ||||||
The Hazard ID is a unique name or number assigned to each hazard in the table. The purpose of assigning each hazard an identifier is to allow entries in this table to be referenced in other documents such as test plans, and in the traceability matrix (discussed below). The ID should never change or be used again for another hazard, because this may lead to documentation errors. For the same reason, the use of automatic heading or list numbers for the identifiers should also be avoided.
Hazards are potentially harmful effects of system operation on people, animals, or even the environment. The possible effects that must be described depend on the regulations applicable to a particular project.
The rating in the Severity column of the matrix is a qualitative judgment on the hazard's consequences. IEC 60601-1-4 defines the following severity levels:3
- Catastrophic: having potential for multiple deaths or serious injuries.
- Critical: having potential for death or serious injury.
- Marginal: having potential for injury.
- Negligible: having little or no potential for injury.
A cause is an event, or sequence of events, that introduces a hazard. More than one cause can be associated with a hazard. It is important to consider both proper and possible improper use of the system when determining causes. Tools such as fault tree analysis (see below) can be helpful in performing this function.
"Likelihood" is the probability that the listed possible cause will introduce the hazard with which it is associated. This is a qualitative estimate with a semiquantitative aspect. IEC 60601-1-4 catagorizes likelihood in six levels: frequent, probable, occasional, remote, improbable, and incredible. It is up to the project team to define what these levels mean. Rather than assign numerical probabilities such as 1 chance in 1 million, 1 in 10,000, and so on, the authors define probability levels in terms of the product, as follows:
- Frequent: could happen every time the product is used.
- Probable: could happen once per year.
- Occasional: could happen once in the product's lifetime.
- Remote: could happen, but has never been reported for this type of product.
- Improbable: not likely to happen in the product's lifetime.
- Incredible: not going to happen.
The risk index derives from a combination of the severity and likelihood ratings and categorizes risk as acceptable, intolerable, or what is commonly known as ALARP (as low as reasonably practicable). This is the column in the table that distinguishes risk analysis from simple hazard analysis. Hazard analysis considers only severity; likelihood and the risk index do not come into it.
The risk index is usually a chart or table, as in Table IV. The terms in the various regions are to be determined by the project.
| Frequent | ALARP | Intolerable | Intolerable | Intolerable | |
| Probable | ALARP | ALARP | Intolerable | Intolerable | |
| Occasional | ALARP | ALARP | ALARP | Intolerable | |
| Remote | Acceptable | ALARP | ALARP | ALARP | |
| Improbable | Acceptable | Acceptable | ALARP | ALARP | |
| Incredible | Acceptable | Acceptable | Acceptable | Acceptable | |
| Negligible | Marginal | Critical | Catastrophic | ||
| Increasing Severity | |||||
| Table IV. A risk index table. (Source: Medical DevicesApplication of Risk Management to Medical Devices, ANSI/AAMI/ISO 14971:2000.) | |||||
Indicated in the Mitigation column of the risk analysis table is the method used to reduce the risk index of a particular cause to an acceptable level.
Performing Risk Analysis. The first step in conducting a risk analysis is to document a detailed plan for filling out the risk analysis table for the project. Project-specific definitions for severity, likelihood, and the risk index table should be documented in this plan, as should the techniques the project team will use to determine hazards, causes, and mitigations. This information could serve as an introduction or an appendix to the document that contains the risk analysis table.
Once the plan has been fleshed out, the team should brainstorm to discover as many system hazards possible. These should be listed in the Hazard column in the table. A common mistake is to list causes of hazards in this column rather than the hazards themselves. The hazard is a discrete event in which someone or something is harmed; the cause is the scenario that triggers the event. An example of confusing the hazard with the cause would be to list "loose ground wire" as the hazard in a case where "electrical shock to user" is really the hazard and "loose ground wire" the cause.
Another brainstorming session should focus on coming up with as many plausible causes for each hazard as can be imagined. Again, it is important to consider both normal and incorrect use of the system when determining causes. In addition to concentrated thinking, tools such as fault tree analysis (FTA) and failure mode effects and criticality analysis (FMECA) can and should be used with the hazards that are of particular interest. These methods will usually identify a few causes that no one thought of during the brainstorming.
Once the cause-determination stage is complete, each cause is assigned a risk index based on the criteria the project team has chosen. All of the causes that fall into the Intolerable and ALARP categories need to be analyzed further for the purpose of determining mitigating actions. Deciding on courses of mitigation is not an easy task. The quality of the mitigation written depends greatly on the skill of team members.
Most agencies that review risk analyses will not specify what they do or do not consider an acceptable mitigation. This means that an equipment designer may need to argue the effectiveness of a mitigation during an audit and be able to convince the auditor of its validity. It can be tempting to write general mitigations like "prevent through proper system design and verification testing" to cover difficult cases, but the temptation should be resisted. Such language is a red flag to an auditor, signaling that the system has not had any real mitigations designed into it to reduce the risk index of a troublesome cause. Furthermore, because these risk-analysis mitigations ultimately become product safety requirements, they will need to accommodate verification tests in which the introduction of causes triggers appropriate and effective mitigations in response.
A risk analysis requires much time and effort. The schedule should include as many hours for risk analysis as are allotted to the creation of the requirements specifications. The composition of the risk analysis plan and the identification of hazards should be done concurrently with, or even prior to, the specification phase of the development project. Although the determination of causes for the hazards can begin during the specification phase, the process will extend into the design phase as tools such as FTA and FMECA are applied to the design.
Design of mitigation methods clearly is part of the project design phase, as the mitigations may take the form of new circuitry or code being added to the system. Mitigations are finally verified, naturally, during the verification phase. This final step closes the loop in terms of proving that the system is safe.
Even at this point, the risk analysis still needs to be maintained. Every time the project takes on a new requirement, or a new functionality is introduced into the system, the analysis must be revisited in order to determine whether any new hazards or causes of hazards have been generated.
Fault Tree Analysis
Once the hazards relating to a safety-critical system have been identified, an effective mitigation strategy must be implemented for each of them. Several methods are available to the developer engaged in this endeavor.
One method is to perform a fault tree analysis. In the FTA, the developer examines the hazards identified during the risk analysis phase and considers the possible ways that each hazard could be generated by the system. This top-down approach is the method generally preferred by software developers. Although an FTA can be highly exhaustive, it is customary to target only the most severe hazards for this level of analysis. The rationale behind selecting some hazards while omitting others should be documented, however.
Another approach is the failure mode effects and criticality analysis. This involves examining all system components and analyzing how a failure in each one could affect the rest of the system. The bottom-up nature of the FMECA generally is better suited for analyzing system hardware than software, however, and is not discussed here.
FTA Structure. An FTA involves determining what pathways in a system can cause a particular known hazard to occur. In the case of an FTA for software, therefore, the program flow is analyzed to see how a hazard might be generated.
|
Figure
2. Fault tree analysis of a system hazard with (a) a simple model and
(b) a more useful model after Boolean symbolism is added.
|
So what does an FTA look like? In its most elementary form, it is an inverted-tree chart that traces the system hazards identified during the risk analysis phase back to all of their possible causes. Figure 2a shows the early stage of an FTA that is performed on software that controls the operation of a motor. There are two causes for the system hazard. However, one crucial question is not being answered by this diagram: Do both failures need to be present for the hazard to occur? Certainly, many hazards can result from a single failure of the system, though others will require the occurrence of multiple failures before they manifest themselves and the system poses a safety risk.
The diagram in Figure 2a can be modified to make it more informative. A common practice in this type of analysis is to use the Boolean logical and and or gates to denote relationship of the fault conditions. Figure 2b uses an or gate to connect the faults to the hazard in the example, indicating that either of these system faults can cause the hazard to occur.
Not only does the use of or and and gates convey more information about how a hazard can occur, it also aids in developing a hazard mitigation strategy. In cases where several faults are charted as inputs to an or gate, all of them must be mitigated since any by itself could cause a hazard to occur. When an FTA shows multiple faults entering an and gate, mitigating one of the faults should be sufficient because all of the input conditions would have to be actual for the hazard to occur. Of course, although only one fault needs to be mitigated in this circumstance, it is still good practice to mitigate all of the defined faults.
It must be understood that the FTA is intended to address only hazardous states of the software. Ways in which the software may be faulty without compromising system safety are not identified by the FTA.4 Other methods must be used to detect these deficiencies.
Developing an FTA. The assessment of system faults and determination of how system hazards can occur is one of the most important parts of the design process. Being sure to heed a few simple principles can improve the quality of both the FTA and, ultimately, the software-controlled product.
First, sufficient time in the project schedule should be allotted to reviewing the FTA. Review by someone not involved in drafting the analysis can help identify additional faults that may have been missed, or discover errors in the document.
Also, give thought to the design changes that will need to be made while the FTA process is going on. This will give insight on whether a mitigation is unfeasible to implement, or cost prohibitive. In such a case, a different mitigation strategy will need to be found.
Hazard mitigations should be documented in the requirements specification. Documenting them there ensures that testing for the hazard will be undertaken as part of the product verification testing process. It also provides the convenience of having all of the software requirements listed in a single document.
Traceability
At the most critical level, traceability is the ability to map system-verification test cases to a corresponding system requirement. Tracing the requirements of a system to the system-level test plan makes it relatively easy to prove that the system functions exactly as specified in the requirements. Providing a traceable link between the requirements and the test cases can expose any holes that might be present in the test plan.
Tracing requirements to test cases is the primary use of traceability, but traceability also encompasses other areas of the system. For example, system hazards identified in the risk analysis phase should be traced to the requirements specification to show that the system accounts for these hazards appropriately. It is also useful to trace items in the design document to the requirements specification, particularly the safety-critical requirements.
Why is traceability of requirements so important to safety-critical systems? Consider an engineer involved in a project to develop a Class III medical device that uses an analog-to-digital converter (ADC) to measure the output activity of another device in the system. Suppose that the system monitors the operation of this device by polling the ADC every 5 milliseconds to read the device's output voltage, and uses this information to make adjustments to maintain the device in a safe operating state. Because this functionality has been determined to be safety critical, the engineer probably will have identified any hazards caused by this device during the risk analysis, will have developed a mitigation strategy for these hazards in the fault tree analysis, and will have designed the mitigation method into the system. Furthermore, the engineer would have generated system requirements for the device and created tests in the system-level test plan to verify the functionality of this subsystem.
Suppose, however, that testing of the first prototype of the product shows that the polling algorithm used is too slow for optimal system function. The engineer decides that the system needs to poll the ADC every 2 milliseconds in order to perform in a more desirable fashion. This system change may be easy to implement, but it generates several questions that need to be taken into account: Does the change introduce any new hazards? If there are new hazards, how are they going to be mitigated? Does the change require rethinking existing hazard mitigations? What areas of the design need to change? Which system requirements need to be modified? Which tests in the verification test plan will require modifications?
Only a small part of the system has changed, in a small way, yet there are trickle-down considerations that affect a number of project areas.
Now suppose that the engineer has not established any kind of traceability tool or matrix. This poses several problems. First, the engineer does not immediately know what impact the design change has on the system requirements, test plan, or hazard mitigations. Second, and perhaps more important, the engineer will have to spend substantial time reviewing each of these documents in order to find the affected areas. This process could be quite lengthy and the opportunity for error is high.
An engineer with recourse to a traceability matrix of some sort, on the other hand, would know very quickly which areas of the system documentation required updating. Also, that engineer would be unlikely to overlook an update that needs to be made, because the links would be laid out clearly.
Implementing Traceability. So how does one go about implementing traceability? Several methods are available, some more desirable than others. One way to establish traceability is to use a simple spreadsheet or matrix to map requirement numbers, test cases, and hazard mitigation references. This is a good method if the project is small and the requirements are not likely to change much.
Table V shows how a traceability matrix for a small project might be structured. Note that whenever multiple tests cover a single requirement, the requirement number is repeated on a separate line to accommodate another test. This is useful for implementing a traceability analysis in a spreadsheet that supports database query functions.
| Requirement Number | Relevant Hazard | Safety Critical? Analysis Tag | Relevant Module Test | Relevant System-Level Test |
| R0042 | H004 | Yes | Unit-0302 | System-0060 |
| R0043 | N/A | No | Unit-0310 | System-0061 |
| R0043 | N/A | No | Unit-0311 | System-0061 |
|
Table V. A segment of a simple traceability matrix. |
||||
While the sort of matrix exemplified in Table V works for smaller projects, use of a commercial tool is preferred for more-involved (or less-well-defined) projects. One reason for using commercially available traceability software is that such programs often automatically flag all documents that need to be updated when a specific requirement changes. This saves the developer from having to manually determine the documentation impact of a requirement change. Of course, it also reduces the potential for human mistakes.
Another benefit of using a commercial traceability solution is that report generation is much easier. These tools usually have default report templates that list only the information that is truly pertinent. By contrast, a developer working with a homegrown spreadsheet must spend time on document formatting. It is also too easy to provide too much information when developing a custom matrix for a complex project. The reports generated with such a system will tend to be cluttered.
Managing Traceability. Establishing traceability for a design need not be as difficult as some people make it. Potential problems due to a poorly thought out traceability management scheme can be avoided if the following points are kept in mind.
Avoid using tags that contain revision history. When requirements in the specification document are numbered, the tag should not include a revision number. For example, if the requirements specification is document 2340 and its revision is labeled 011, a numbering scheme formatted along the lines of 2340.011-0024 requires all of the requirement numbers to be updated every time a document changes revisions. This would be an unnecessary headache.
Requirement numbers should not be used as test-case numbers. A very primitive method of implementing requirements-to-test traceability is to use the same tag for the test case as was used for the requirement. The problem with this is that it makes the test plan totally dependent on the structure of the requirements document. If several requirements were consolidated and others were renumbered, what would happen to the test plan? Which test would now map to a particular requirement number?
Requirements should not be deleted or renumbered in midstream. Renumbering requirements is a surefire recipe for making traceability management a nightmare, especially if the further mistake of using requirement numbers as test numbers is made. It would mean that the mappings for requirements, test cases, and hazard mitigations must all be updated. At a minimum, this would require a time-consuming traceability review. Errors could be introduced. Deleting a requirement often proves to be just as bad as renumbering one. Many requirement numbers will be shifted, and the history behind the requirements in the document are lost.
Requirements should not be restated in subdocuments. Whenever writing a document, such as a test plan, for which the actual text of the requirement is important to the reader, the developer should consider referring to the requirement rather than restating the requirement itself. There is often a temptation to create a document that "stands by itself," but such documents tend to be quite difficult to manage. If the derived document uses just a reference, then, if the requirement referred to changes, little or no change in the subdocument is required. However, if the requirements that change are stated fully in all of the subdocuments, then the derived documents have to be updated as well. Using references also reduces the likelihood of the reader being presented with erroneous information.
All causes of hazards identified in the risk analysis must be traced to a verification test. Most approval agencies state this as a requirement of risk analysis. They will be looking for it.
Traceability should not be implemented by incorporating the links to other documents into each document being traced. It is not good practice, for example, to have a bullet item in a requirements specification that names the test to which that requirement is traceable. While doing that can provide full traceability, it is cumbersome to implement and also difficult to audit in order to verify that every requirement is traced. Instead, one document should encompass all of the traceability.
Change
In nearly every project, a change will have to be made after the software has been verified and released. Handling change in a safety-critical medical equipment project needs to be a more formal process than simply coding the change into the next release. There are many ways to incorporate change into a project and document it, but any method employed must maintain the paper trail and not allow either the product design or the documentation to become sloppy.
Types of Change. Design changes are of two basic types: voluntary and involuntary. Voluntary changes are requests for new features and functionality or for modifications of existing functionality. These usually originate in a company's marketing department and take the form of new requirements. Involuntary changes are associated with defects in the software. These do not necessarily represent new requirements, but rather result from a failure to meet existing requirements.
Changes of either type have to be documented. What impact on the rest of the documentation a change will have depends on whether the change is voluntary or involuntary. Voluntary changes will affect the specification and every document derived from the specification. Involuntary changes may or may not affect the specification, design document, or other project documentation. They must, however, be captured in a defect-tracking tool.
Defect trackers are used in just about every software development project. They are not unique to safety-critical medical software. Because of this, defect-tracking tools are widely available. They usually consist of a database with a front end designed to capture relevant information about software defects. Data fields are filled with a description of the defect, the name of the person who found it, an indication of whether the problem is repeatable, the name of the person responsible for fixing it, the nature of the fix, and so on. Each defect is characterized by a status that begins with "opened" and progresses to "closed," with intermediate states such as "confirmed," "fixed," and "verified."
Many developers make the mistake of treating the two types of changes differently. Changes from one software release to the next should be captured in one document, regardless of whether they are voluntary or involuntary. With nonembedded projects, some of the information in this document would be captured in the release notes. But in the case of embedded configurations, all of the changes that are typically found in the release notes, along with some other vital items, are recorded in a document called a change request.
The Change Request Document. The change request form is a document that can be used to capture all changes made since the previous release. More than one change can be recorded in a single change request document. Each change recorded should include the following elements.
Description. The description should summarize not only what the change is, but also why it is being made. This provides an auditorand the project team, lest they forget down the roadwith an account of the motivations for the change. If the change is being made for safety-related reasons, then a more thorough analysis and regression testing of the code will be necessary to prove that the system is still safe after the change. If it is an involuntary change, then a reference to the defect tracker detailing the defect should be included.
Documents affected. All the project documents that will need to be updated or reviewed as a result of the change should be indicated. In the revision history for each of these individual documents, there should be a reference to the change request as the place where changes are detailed. This helps maintain the paper trail by providing pointers to all documentation affected by the changes.
Schedule and cost impacts. Having to focus on exactly how a change would affect the project schedule and cost is important for the person making the change request, just as the details supplied are important to the project team. Expensive changes of the wouldn't-it-be-nice variety might have to be abandoned on cost-benefit grounds. Conversely, if a lot of expensive changes absolutely have to be made, then the scope of the project would be significantly changed. This situation could indicate that regression testing would not be sufficient to verify the product adequately.
Revision History
Maintenance of an accurate software revision history is crucial to the success of a safety-critical project. (Of course, one could argue that this is true for the development of any software-controlled product.) Having an accurate revision history enables the development team to perform code regression testing, and then test sections of the software that have changed since the test plan was last executed. It can also make clear at what point in the development of the code a software branch can be created. This is useful when the code under development is not ready for release, but a customer has demanded that a software fix be implemented immediately. In this situation, a branch program off a prior software release would be written to fix a problem prior to the next major release of the software.
How is a software revision history implemented? Several elements employed in combination are key to documenting revision history. Chief among them are the software control document (SCD), source-code control, and software release notes. Change requests are an important part of the SCD.
Software Control Document. The SCD is useful for mapping a particular software release to the revisions of all of the supporting software-development documents used to generate that release. For example, it can often be helpful to see at which stage of revision the requirements specification stood when a particular release occurred. Or perhaps someone wants to know which test-results document applies to a specific release. Table VI shows how a developer might choose to implement an SCD. In the table, one release of the software is tied to a specific latest revision of each of the supporting documents. Multiple software releases are commonly listed in such a table by the end of a project, each linked to the support document revisions in effect at the time of its release.
| Software Package | Software Develop-ment Plan | Software Require-ments Speci-fication | Risk-Analysis | Soft-ware Design Document | Fault Tree Analysis | Soft-ware Module Test Plan | Soft-ware Module Test Results | Software Veri-fication Test Plan (SVTP) | SVTP Results |
| 1.01 | R1.2 | R2.1 | R1.4 | R1.1 | R1.6 | R1.6 | R1.6 | R1.3 | R1.3 |
| Release Name | Revision | Revision | Revision | Revision | Revision | Revision | Revision | Revision | Revision |
|
Table VI. A possible format for a software control document (SCD). |
|||||||||
The SCD template should be created as soon as the software documentation set has been defined. Later, it should be updated every time a new version of the software is released. It is left to the discretion of the developer to decide whether or not all releases should be included in this document.
Source-Code Control. During the development of an embedded system, many distinct releases of software may need to be created. There are often releases for debugging hardware, for prototyping user interfaces, and for adding new features and bug fixes. Because of the multitude of releases potentially issued for a safety-critical system, a way of tracking the history of the source code is extremely helpful. One of the primary tools for managing revision history is a good source-code control system.
Various source-code control systems usually have in common the ability to track revisions of individual files, the ability to create a release, and the ability to create a software branch. Software branching is particularly important for safety-critical software, as it facilitates the rapid deployment of a software fix should any issues arise in the field. Additionally, it allows for the creation of special test versions of code that are derived from released code.
A product developer is generally advised to spend time setting up a revision control system as early as possible. This allows sufficient time to become comfortable with the workings of the system so that there will be no confusion later that could retard the development or diminish the quality of the product. In terms of the project schedule, code control is an ongoing task responsive to the needs of the development team. However, it usually takes no more than 10 to 20 minutes at a sitting to archive changed files, even on a moderately large project.
Software Release Notes. Engineers working on embedded software have long made it a practice to write notes pertinent to any release of new software. It is a task commonly performed to some degree, regardless of the type of product being developed. Therefore, it is not a great burden for a developer to add some revision history to the release notes. And it provides an excellent opportunity to archive useful information.
A wide range of information can be included in the software release notes. That which ought to appear as a basic minimum is given in Table VII. Changes from a previous revision can be extracted automatically from a revision control system if the comments in the commit logs are adequate.
| Field Name | Content |
| Date of release | Date when the release was created. |
| Symbolic name | The tag name of the release that is used in the revision control system. |
| Purpose | Why the release was created (developmental release, software fix, etc.). |
| Known issues | A listing of any issues that are present in this release. |
| Issues corrected | A list of any anomalies corrected by this release. |
| Changes from the previous version | A list of all of the changes that have been made to the software since the last release. |
| Table VII. Recommended basic fields for software release notes. | |
Most revision control systems require the developer to add a comment to each module that is checked into a repository. A commit log is simply a listing of all the associated comments from a particular code module.
The benefit of having this information in the release notes is that it provides the developer with an easy reference from which a regression analysis can be performed. This analysis can then be used to determine which modules require retesting.
Conclusion
The tools and techniques for developing safety-critical medical systems that have been discussed in this article are all important. Defining what constitutes a well-documented process in the software development plan is essential to the success of a safety-critical project, but each tool is one part of a whole development process and cannot stand alone. For instance, traceability by itself does not add value to a project that is devoid of any kind of governing process.
It should be kept in mind that many of the ideas presented here are intended as guidelines and suggestions for the development of safety-critical software. If the template for the SCD shown in Table VI, for example, does not meet an organization's needs, it is certainly acceptable to use a different format.
Processes and designs should be kept flexible to accommodate changes. It is hoped that this article has made clear how inflexibility can impede a software development project and undermine the robustness of a safety-critical product.
Robert J. Duquaine and David W. Murphy work at Plexus Technology Group Inc. (Neenah, WI) as an embedded software engineer and a design engineer, respectively.
References
1.IEEE 730.1-1995, "IEEE Guide for Quality Assurance Planning" Institute of Electrical and Electronics Engineers, Washington, DC, 1995.
2.G Hawley, "Ensure Quality with a Good Development Plan," Embedded Systems Programming 11, no. 11: 2844.
3.IEC 60601-1-4, "Medical Electrical EquipmentPart 1-4: General Requirements for Safety," International Electrotechnical Commission, Geneva, 2000.
4.NG Leveson, SS Cha, and TJ Shimeall, "Safety Verification of ADA Programs Using Software Fault Trees," IEEE Software (July 1991): 4859.
Copyright © 2001 Medical Electronics Manufacturing




