Special Notice

NASA/ARC plans to issue a Request for Proposals due after September 30, 2002 for co-operative agreements for research on High-Dependability Computing testbeds for software-based systems. This solicitation is related to the NSF solicitation on Highly Dependable Computing and Communication Systems Research, whose proposals are due early summer, 2002. The NASA solicitation will be limited to parties that are selected to receive an award under the NSF solicitation.

Description

NASA/ARC has a requirement for research on highly dependable software-based systems. Under a co-operative agreement with CMU (http://www.amesnews.arc.nasa.gov/releases/2002/02_03AR.html), testbeds are being set up at NASA Ames Research Park that will be accessible on-site and remotely. The testbeds are closely related to significant NASA software systems, and will provide a means to develop methods for predicting the dependability of software-based systems and to empirically validate technology for improving the dependability of such systems.

This web site will be updated periodically to provide information on the testbeds and the proposal process for cooperative agreements. A solicitation was sent out internally within NASA in February for testbed artifact proposals, and received an enthusiastic response. Two primary testbeds have been selected, representing the broad areas of dependable networks for flight and component-based software for mission-critical systems. The testbeds are expected to included artifacts that are analogues to real NASA systems, as described below. Some dependability questions that could be investigated with these testbeds are suggested, though researchers are not limited to these particular questions.

Technical Point of Contact : Michael R. Lowry

Title: High-Dependability Computing Program Manager

Phone: (650) 604-3369

Fax: (650) 604-3594

Email: mlowry@mail.arc.nasa.gov

Procurement Point of Contact: Carlos D. Torrez

Title: Contracting Officer

Phone: (650) 604-5797

Fax: (650) 604-3020

Email: ctorrez@mail.arc.nasa.gov

Dependable Networks for Flight Testbed

Overview and Importance to NASA

Distributed information networks arise in many critical aerospace contexts, including mission control for manned and unmanned missions, air traffic control, and increasingly onboard aerospace vehicles. This testbed is expected to be an analogue of a network that is currently being designed for the international space station. The testbed might also draw upon other aerospace networks. The International Space Station (ISS) Program Office at Johnson Space Center (JSC) in Houston has initiated a project to define and build an On-board Information Technology (OIT) infrastructure for supporting crew and payload operations. This OIT system consists of LAN resources, servers, client devices and commercial and custom software. It is used for mission support and has specific interfaces to Station avionics systems, but is not directly flight-critical at this time. The ISS Program hopes to evolve OIT dependability to a point where higher criticality functions can be migrated off of the current custom avionics platforms onto the OIT infrastructure. The OIT is an ideal candidate for insertion of commercial IT solutions, requiring at least enterprise-level dependability and performance levels that meet the unique requirements of the internal spacecraft environment. Both JSC Engineering and Mission Operations Directorates are involved in this project.

OIT is the primary system for performing payload operations - the major function of Station. In addition, OIT is the host for crew support applications such as scheduling, mapping, e-mail and communications, information access and vehicle diagnostic support. Given the crew resource constraint of only three people, leveraging their capabilities with good computer tools that have high dependability will ensure more of Station's potential as a unique science laboratory can be realized. The OIT also spans an international set of clients, connecting to modules of the ISS from different countries. Security is a major concern, particularly for a system that might eventually support remote telescience where ground-based science investigators interactively control and monitor on-board experiments.

The results from the OIT HDCP testbed could potentially impact the entire class of NASA on-orbit human space operations support IT systems anticipated for current and next-generation vehicles. Additionally, there is a significant possibility that the HDCP solutions provided for operations support might be applicable to general critical flight system needs for future upgrades and new vehicle designs. It is anticipated that results from the OIT HDCP testbed will generate insights into how to address the design and management of such critical systems, another pressing NASA need.

The Station OIT is a specific implementation of a general "operations support" data system. Traditionally, these operations systems have been extensions of the critical flight control avionics, but in the last ten years, commercial computing and network technology has been replacing these custom systems on both Shuttle and Station. While still requiring coordination with critical systems, these support systems require much higher performance levels, broad compatibility with commercial software development practices and much more sophisticated user applications. Therefore, it is anticipated that certain OIT solutions will be used aboard Shuttle within a year or two of its deployment aboard Station. Furthermore, next generation spacecraft have similar needs for mission operations support, resulting in further development and deployment of these IT systems. For example, 2^nd and 3^rd generation space launch vehicles, Mars transit and habitation spacecraft and even multi-vehicle planetary exploration missions are potential candidates for this dependable network-based information and computing infrastructure. Furthermore, there is a broad class of ground-based support functions also requiring dependable enterprise-class data systems for mission control, science and payload support and even air traffic control.

HDCP Testbed

This HDCP testbed will be a rough analogue of the evolving ISS LAN with representative servers/clients. It will consist of several server computers and network components with a range of client-side computers. The testbed will provide hardware and software infrastructure suitable for running software and networking dependability experiments. This infrastructure will be accessible from both NASA Ames Research Park and remotely through the internet.

Dependability attributes whose investigations might be supported on this testbed include:

Failure tolerance
Isolation of propagating failure modes
Elimination of common-mode failures
Ease of system resource/redundancy management (transparent reconfiguration)
Computational and data communication reliability and error tolerance
Sustaining minimum performance levels
Measurable confidence of the successful propagation of high priority (i.e. critical) packets in a medium of lower priority packets
Measurement of risk and changes in risk as a function of system software configuration changes
Security

This testbed is of interest to researchers primarily for the challenges of providing dependable computing and data service using a distributed network of commercial and custom components in the on-orbit domain subject to component failures. Research in architecture, network protocols, failure tolerance and isolation, and software verification are highly relevant. Radiation-induced failures, of significant concern for spacecraft computing, will be modeled as random errors in computation and data communications.

Special Notice

Overview

Since 1995 NASA is flying an increasing number of robotic deep space-missions with increasing reliance on software to perform mission functions. Architectures that support software reuse and component-based approaches to developing embedded mission software will be needed. This testbed will be an analogue to NASA flight software; and will enable experimentation with dependability issues in new and existing approaches to flight and ground software. It is expected that the testbed will draw artifacts for experimentation from JPL's Mission Data Systems project, and perhaps other mission software.

Mission Data System (MDS) is an end-to-end platform for improving the dependability of state-based mission software built from component frameworks, principally for robotic deep-space missions. MDS is both a systems engineering process and matching software architecture for developing unified flight, ground & test systems that enable missions requiring reliable, fault-tolerant, autonomous software systems. Adapting projects use the MDS analysis process and underlying frameworks to build mission software. MDS is being developed as a product line to provide a technology core for the next generation of deep-space mission customers. The Mars Smart Lander (MSL) Mission has baselined MDS. Although the process and architecture are shaped by the realities of unmanned space science missions, the concepts apply equally to applications requiring autonomous distributed monitoring and control of physical systems.

The MDS software architecture is expressed as design patterns that can be implemented in any programming language. Current MDS development efforts at JPL are focused on a C++ implementation. The MDS HDCP testbed will likely include a port to Real-time Java^¹, through an industrial partnership. This section outlines the MDS products and the benefits they offer to NASA.

MDS Products

Technology Core

MDS provides reusable software building blocks and tools. This framework software consists of over 35 packages for common functionality such as event logging, time services, data management, visualization, units of measurement, state variables, components, connectors, and many more. Each package is tested and documented. The entire set is organized into a layered physical architecture.

Spacecraft Design Process

MDS provides a collaborative engineering tool for systems engineers to capture requirements in terms of familiar concepts: states, commands, measurements, estimators, controllers, and hardware devices. The SDS tool (State Database Server) is a web server that allows systems engineers to collaborate simply using a desktop web browser. Requirements are captured in a state analysis database that can be checked for validity and completeness. The resulting requirements map directly into software elements of the technology core, eliminating errors of translation and reducing cycle time.

Tools for Simpler Operations

In MDS operators specify mission activities in terms of what rather than how, namely, in terms of goal networks rather than command sequences. A goal is simply a constraint on the value of a state variable over a time interval, and a goal network prescribes timing and parent-child dependencies among goals. Both are scripted in "GEL", the Goal Elaboration Language, providing a clear expression of operational intent. Goal-driven operation provides a level of control that is variable between purely time-scripted and fully autonomous, allowing a smooth transition to the more challenging autonomous missions.

Cost Estimation Model

MDS defines a model that estimates customer adaptation costs through objective metrics captured by previous adaptation efforts. These metrics are recorded automatically by a workflow management tool as work packages flow through different phases of requirements, design, development and testing. As missions build adaptations, the estimation model will benefit from additional data, improving a customer's ability to deliver on time and in budget.

Benefits

MDS was developed to address the development and operational needs of the next generation of mission systems. This section briefly surveys those benefits.

Mission System Development And Testing

Until recently, JPL missions were one-of-a-kind, spaced many years apart. Each mission team developed flight software independently. No provisions for reuse were made a priori. Now we are in a new era of frequent launches. Low-cost missions cannot afford to start from scratch. System verification and validation must be both efficient and effective. The solution must provide a pre-integrated, pre-tested reusable technology core.

To satisfy these requirements, MDS has implemented tools and process that provides the following:

Complex interactions understood at mission, system and component levels
State decomposition facilitates tracking of system scalability
Domain knowledge expressed explicitly as models
Component architecture facilitates measures of responsiveness
Architecture requires state be determined honestly from the evidence
Architecture provides a mechanism for identifying the range and impact of faults
Architecture authorizes and monitors all resource usage
Uniform architecture facilitates cost tracking associated with system complexity and system reconfiguration.
Cost models based on objective development data facilitate estimation of fault-related development.

Safer Software

MDS provides a component modeling tool that software engineers use to specify components and connections. Component-based software design elevates software interfaces and interactions to first-class design elements, eliminating 'invisible' interactions that cause hard-to-find problems in large software systems.

A runtime component manager enforces architectural rules, enabling early detection of design defects. Further, a component approach facilitates the use of powerful verification techniques for detecting synchronization errors that cause deadlocks and race conditions.

Verification By Design

In general, many aspects of MDS aim to reduce sources of human error in design, development and test. Here are a few ways that MDS simplifies system verification:

Goals specify explicit constraints on state and time that are continually monitored, so deviations from expected behavior are immediately reported.
Errors in units of measurement are eliminated through the SI units package.
'Smart pointers' eliminate problems of memory leaks and dangling pointers.
The component manager enforces architectural rules about legal connections.
The initialization/finalization package reports circular dependencies and improperly held resources.
The unified state architecture supports direct comparison between simulated state and estimated state.
The separation of models from reusable algorithms makes validation of mission-specific items much simpler.

Mission Operations

In the past, missions have been designed for human control from Earth. Flight software has used relatively simple time-based sequencing for commanding. Except for fault protection and a few "critical sequences", spacecraft required ground intervention for unplanned events.

New missions, in contrast, depend on more in situ operations in uncertain environments and, with longer round-trip light-time delays, have less frequent opportunities for communication with Earth. These realities demand more onboard decision-making. The solution must enable selective delegation of control from ground to flight.

To satisfy these requirements, MDS has implemented tools and process that provide the following:

Operations based on what (constraints on state) rather than how (command sequences)
Operations can migrate capability and responsibility from ground to flight system to simplify operations and reduce communication needs
Models express key functional relations among system states, command effects, and observed measurements
Disciplined architecture provides uniform fault-metric mechanisms
Fault sources are explicitly captured in goal-failure trees
Fault response is explicitly modeled as goal elaborations
Uniform data collection mechanism facilitates run-time monitoring and metrics associated with adaptivity and diagnosability

HDCP Testbed

This HDCP testbed, like an actual Mission Project, is expected to include an adaptation of MDS. This testbed could be used to address a wide range of dependability questions. MDS provides a well-disciplined architecture and system engineering approaches that facilitates experimentation with formal verification methodologies. MDS also provides tools to build reliable complex real-time control systems, thereby facilitating experimentation with real-time systems. Experimentation with improved software development methodologies for improved quality and defect reduction could be supported.

It is likely that a significant portion of MDS will be ported from C++ to Real Time Java for the HDCP testbed. Currently, C++ is the language of choice for most OO implementations of mission critical software. However, C++ is an extremely complex programming language. Java addresses some of the complexities of C++, and studies indicate an increase in programmer productivity and decrease in error rate. Java incorporates many features of modern programming language theory, providing an abstraction from details such as memory allocation. For this reason, it also raises issues for embedded, real-time mission systems; issues including real-time capabilities and performance. With the development of the RTSJ, Java now includes technology needed to develop embedded, real-time mission systems. JPL engineers performed an evaluation of Java, and determined that with proper risk management Java could be a suitable implementation language for a mission system. Much research remains to be done to determine the trade-offs for different dependability attributes related to programming language features for real-time embedded systems, and technologies to increase dependability.

The MDS testbed could be used for a wide variety of dependability experiments, as well as studies of the impact of different technologies or methodologies on dependability. Examples of categories of dependability attributes and possible measurements that might be supported are described below, under the broad headings of build/test categories, runtime categories, and operation time categories. Where applicable, it is indicated how the MDS technical approach potentially impacts a dependability category. This might provide a starting point for a hypothesis for a dependability investigation.

Researchers are not limited to these dependability categories nor to the hypothesis suggested by the MDS technical approach. However, researchers should indicate how they could make use of the testbed artifacts to support their dependability experiments.

Build/Test dependability categories address concerns associated with system definition, project management, implementation or verification.

Category		Description		MDS technical approach		Possible Measures and techniques
Architectural correctness of implementation		How well does the system's implementation reflect the analysis and design? How accurate is the translation from System Engineering to Software Engineering?		State analysis provides a system analysis and design methodology Component architecture provides rigorous method of composing software		Percent of erroneous component & connector specifications
Modeling of complex interactions		Does the system provide a suitable means of expressing interactions? Do a certain types of defects map to certain type of software architectures? Run against different defect classes.		State analysis provides a system analysis and design methodology that exposes complex interactions between subsystems MDS provides a model-driven architecture that make adaptation easier		Rate different defect class against different architectures
Model correctness		Is there a suitable separation of concerns to express physical models independently from information models? What's the right level of model fidelity for a particular application? How well does a model capture physical behavior?		MDS architecture provides for a disciplined use of models: structural, state effects, measurement, and command effects models.		TBD
Architecture suitability		Are some architectures better suited to certain business cases?		MDS's emphasis on state and the management of physical interactions is well suited to resource-constrained systems.		Measure design and development effort needed to accommodate new requirements.
COTS suitability		Is a real-time Java implementation suitable for a flight system? Are COTS products robust or efficient enough for use on the target system? How well does a COTS product scale to a real problem? What are the integration and process costs associated with incorporating a new product?		N/A		TBD
Predictability of schedule and budget		How good is the team at meeting budget and schedule?		MDS has an iterative/incremental development methodology with clear exit points for collection of objective data.		Earned value Process feedback measures
	Quality and defect reduction		Is the quality of the product improving? How do you know when the product is done?		MDS has an iterative/incremental development methodology with clear exit points for collection of objective data.		COQUALMO Defect seeding
Trade-space expressiveness		How do you establish criteria in the hardware/software trade space? (performance vs flexibility) Information sharing trade space? (security vs safety) System degradation trade space? (survivability vs quality of service)				TBD

Runtime dependability categories describe how well the system runs.

Category	Description	MDS technical approach	Possible Measures
Durability	How tolerant is the system to environmental variation? Does the system meet is up-time criteria? How do partial failures affect the ability of the system to meet mission objectives? Can the system be reliably upgraded using COTS capabilities like Java's dynamic loading?	Goal-driven operation permits highly tolerant success criteria. Partial failures handled at the lowest level possible, minimizing changes to goal network and thus to mission objectives.	Accomplishment of highest-priority goals in the face of unexpected conditions.
Diagnosability	How easy is it to identify the cause of a fault? Is the system prone to a particular kind of fault?	MDS defines integral fault protection interfaces, allowing for wide range of detection & diagnosis techniques.	Percent of false positives and false negatives during scenario-based testing
Quality of service guarantees	How accurately does the systems measure its state? How efficient is the system at doing the work for which it was designed?	State determination is a key architectural focus.	Precision and delay of estimated state vs. true state.

Operation Time dependability categories that describe how easy the system is to operate correctly.

Category	Description	MDS technical approach	Possible Measures
Diagnosability	How easy is it for an operator to find the cause of a system fault?	Device health reported in health state variables.	Measure how well a fault is localized to a specific failure mode of a specific unit.
Ease of error-free use	How easily can operators instruct the system? How much effort goes into avoid system damaging mistakes	Goals specify what, not how. Goal net elaboration takes system-level interactions into account.	Time to specify and validate a goal net versus a command sequence.
Command Verifiability	How much effort is needed to assure that the commands reflect intent? How easy is it to find command errors?	This is standard control law validation. MDS captures and reports command histories.	TBD
Level of security	How immune is the system from malicious behavior?	MDS is designed for a non-malicious community.	TBD
Level of Autonomy	Does the system provide autonomous capabilities that simplify operations? Can the system be customize to trade ease of development against ease of use?	Goal-driven operation intrinsically supports autonomy. The extent of automated goal elaboration trades ease-of-use against ease-of-development.	Measure operational load for goal-based vs. command-based operations.
Maintainability	How easy is it to maintain the system?	Explicit information in state, measurement, and command histories, as well as an event log, facilitate maintenance.	Measure how long it takes to detect and fix seeded defects.
Scalability	How easy is it to scale the system?	Explicit representation of states and modeling of interactions encourage confidence.	Measure architectural variation as a system evolves toward high-fidelity behavior.

1 Real-Time Specification of Java (RTSJ) was developed in response to Java Specification Request 1 (JSR 1). Java Specification Requests are managed as part of the Java Community Process (JCP), an industry consortium administered by Sun Microsystems. The specification, "The Real-Time Specification for Java" by Bollella, et. al., published by Addison-Wesley (ISBN: 0201703238).