ASIS Rationale

Benefits of Code Analysis

Why is something like ASIS needed? ASIS provides a basis for implementing portable tools that are aimed at analyzing static properties in Ada source code. Such code analysis capability has in general been under leveraged in software organizations; but for the Ada language in particular, it can greatly enhance the development process. Code analysis automation canharness the excellent software engineering features of Ada to facilitatecode comprehension, high reliability, and high quality of the software product. The following text presents some motivational background.

Definition

Code analysis is the inspection of software source code for the purpose of extracting information about the software. Such information can pertain to individual software elements (e.g., standards compliance, test coverage), the element attributes (e.g., quality, correctness, size, metrics),and element relationships (e.g., complexity, dependencies, data usage, call trees); thus, it can support documentation generation, code review, maintainability assessment, reverse engineering, and other software development activities.

Extracted information falls into two major categories: descriptive reports which present some view of the software without judgment (e.g., dependency trees, call trees), and proscriptive reports which look for particular deficiencies -- or their absence (e.g., stack overflow, excessive complexity, unintended recursion).

Applicability

Broadly speaking, the application of automated code analysis in the software development process promises, among other things, to:

Promote discipline and consistency during development, increasing productivity and reducing unintended variation.
Provide empirical evidence and metrics for process monitoring and improvement.
Supplement code inspection and review, diversifying beyond the limitations of testing or manual checking.
Preserve architectural integrity in the software as compromises are madeduring development.
Avoid violations of coding standards, such as the use of inefficient language constructs.
Increase the correctness and quality of delivered software, reducing defects via comprehensive assessment.
Enhance safety and security by applying formal methods to verify assertions in program code.
Expedite program comprehension during maintenance, for engineers new to the code.
Support reengineering and reuse of legacy code, reducing costs.
Result in reduced risk to budget and schedule.

The software development life-cycle phases where code analysis can be beneficially applied include all those in which source code exists: preliminary design (software architecture and interface definition), through testing and system integration, to maintenance and reengineering. Hence, automated code analysis is a technology that primarily supports the back endof the software life cycle.

Motivation

Over the years, a wide-ranging set of commercial code analysis tools hasbecome increasingly available [11]; examples of such tools include:

Data flow analysis and usage metrics
Invocation (call) trees and cross-reference
Dependency trees and impact analysis
Timing and sizing estimation
Test-case generation and coverage analysis
Usage counts of language constructs
Quality assessment metrics
Coding style and standards compliance
Safety and security verification
Code browsing and navigation
Documentation generation
Reverse engineering and re-engineering
Language translation and code restructuring

Unfortunately, the current state-of-the-practice in software developmenteither omits code analysis support altogether or only incorporates it asan ancillary, undisciplined, ad hoc resource. For example, it is not uncommon to find within a given project various home-grown tools that support the above goals but which are not recognized as overtly participating in the development process. Such tools can be quite obtuse (very indirect extraction of information) and are typically incomplete (handling only a subset of the development language). Further, they tend to be project-specific (or even person-specific), and cannot be reused in another project:they are later redeveloped from scratch.

These observations corroborate that the need for code analysis is genuine, and that a common set of uniform tools could provide significant benefits to projects. But in the case of Ada software, commercial code analysis tools have historically proven to be barely adequate, manifesting a variety of problems whose nature and origin are described below.

Technology for code analysis

For Ada, why is ASIS the best approach? Code analysis tools are not new,having been available for decades; but the advent of the Ada language has exposed a variety of analysis limitations and has consequently demandedmore comprehensive technology. The following text articulates various Ada-specific issues from a historical perspective: it reviews several technologies that have historically been applied -- with varying degrees of success -- to code analysis specifically targeted to Ada software. The review is not comprehensive, but it sketches the evolution of issues that have propelled the development of the ASIS concept.

Code parsers

Historically, many commercial code analysis tools have been supplied by compiler vendors in conjunction with their compiler products. But as the community of CASE tool vendors has grown, such tools are often available independent of any compiler. Tool developers have found that conventionalparser technology is sufficient for most traditional languages; thus when Ada came along, most vendors expected it would suffice to simply adapt their parsers to handle Ada syntax. But for Ada, the result has held manydisappointments:

Textual code editors are often sensitive to Ada syntax but not to Ada semantics.
Graphical design editors yield valid graphics, but invalid Ada designs.
Source-level tools such as debuggers are forced to understand and traverse the internal data structures of program libraries rather than the textof original source files.
Reverse engineering and test tools manifest difficulties when trying to resolve overloaded subprogram names or renamings.
Except for compilers, Ada tools do not require Ada Compiler Validation Capability (ACVC) certification; hence, such tools typically fail to handle the complete repertoire of Ada language features.

Consider the case of a toolsmith who wants to develop a call-tree analyzer. For such a tool to accurately process Ada source files, the toolsmithwould be forced to build almost the entire front end of an Ada compiler -- a decidedly major undertaking that far out-scopes the original tool building effort. But CASE tool vendors are not in the compiler business; most are reluctant to make this major investment, or have tried and failed. Yet tools built on parser technology alone are not able to fully support the semantic richness of Ada.

DIANA

Many Ada compilers store program units into libraries. They typically structure the information according to some proprietary internal form, suchas trees of DIANA (Descriptive Intermediate Attributed Notation for Ada -- note that the following discussion applies to all internal forms, but that DIANA is singled out due to its public documentation [5, 6]: DIANA had been intended for standardization, but failed due to the unexpectedly wide variation in internal forms). Such trees thus encode both syntactic and semantic information about Ada programs. The root of a DIANA tree corresponds to a compilation unit; the nodes correspond to smaller Ada structures, such as declarations and statements. Node attributes contain descriptive information and references to other nodes.

DIANA trees offer great convenience and power to toolsmiths, and are sufficient to support the implementation of a large variety of tools (including code generators in compilers). For example, with access to DIANA, thetoolsmith who wanted to develop the call-tree analyzer would have a fairly straightforward project. Furthermore, the tool would exhibit better performance, bypassing the needless regeneration (and redundant storage) ofintermediate compilation results that are already available in the Ada libraries.

The power of DIANA is sufficient to support the implementation of a virtually unlimited variety of tools. In general, any tool that requires the semantic information generated by an Ada compiler can benefit from accessto DIANA. But as with any technology, the use of DIANA also has drawbacks:

A given implementation of DIANA by a vendor is subject to change: upgrades can obsolete tools written against previous versions, hampering maintenance.
Similarly, DIANA implementations vary from vendor to vendor: porting a tool across platforms is a risky endeavor.
DIANA is hard to use: the trees are quite complex, making it difficult to write and debug tools written against a DIANA specification.
The lack of a simple mapping to Ada makes DIANA hard to understand: as an abstracted representation of an Ada program, it does not map intuitively to concrete Ada structures.
DIANA is not extensible; but tools may need to add attributes for storing graphical or other tool-specific information.

LRM-interface

Thus a growing need arose to make tool development possible at the Ada level rather than at the internal representation level. It was these issues that drove some Ada compiler vendors to independently develop proprietary higher-level interfaces to encapsulate their Ada program libraries.

In particular, to overcome the drawbacks of DIANA while retaining all ofits advantages, Rational Software Corporation developed their LRM-Interface product [8] in the late 1980's. It provided nearly the same power as DIANA, through services that extracted a variety of information from the internal DIANA trees. The LRM-Interface was also considerably easier to understand than DIANA, because it used the already-familiar terminology defined in the Ada LRM (the original Reference Manual for the Ada Programming Language [12], or its more recent version, the Ada 95 Reference Manual[7]). Furthermore, the LRM-Interface was not subject to change (or at least much less so than was the underlying DIANA), so tools written againstit were easily migrated to updated implementations.

Regardless of LRM-Interface specifics, this and other similar approachesgenerally provide great flexibility: for example, an ad hoc tool can be easily and quickly built by in-house engineers, without funding the development or specialization of a commercial tool. But as expected, this approach also has shortcomings:

While DIANA as a data structure is not extensible, the above interfaces can be extended in the sense of user-supplied secondary functions built on the functions already provided; even so, all the functions are read-only and cannot modify any state.
Importing the subject source code into the tool environment can require edits that necessarily result in code distortion, such that original code attributes might not be preserved (e.g., line numbers or the byte sizingof data).
Tools are vendor dependent, such that a given tool cannot access Ada libraries from multiple vendors, or equivalent tools from multiple vendors cannot access a given Ada library.
Data interchange is not standardized among tools, so users can't configure their own integrated toolsets by choosing from competing or complementary vendors.
Within a software engineering environment (SEE), Ada semantic information remains isolated from, and not integrated with, other engineering data present in the environment.

ASIS

Historically, only a few Ada vendors provided access to the information contained in their proprietary Ada program libraries, and each such interface was unique. Thus began to emerge the need for an open standard that would allow uniform, vendor-independent access to that information.

Leveraging some informal efforts, the STARS program initiated the development of the Ada Semantic Interface Specification (ASIS) in 1990; but shortly thereafter, the activity became unfunded due to the STARS decision to no longer support standardization efforts. Despite this, several of theinvolved vendors (primarily TeleSoft) continued the ASIS work on a volunteer basis. Some time later, Rational also became an active participant and seeded the draft standard by contributing their LRM-Interface specification to ASIS.

In 1992, the Ada Board recognized the potential benefits to the Ada community of an ASIS standard, and recommended that the Ada Joint Program Office (AJPO) director support "by whatever means possible the development of an ASIS standard and its submission to ISO/WG9 for publication." The Association for Computing Machinery (ACM) Special Interest Group on Ada (SIGAda) took on this important work though volunteer effort in the ASIS Working Group (ASISWG) [2]. The ASISWG developed the interface to ISO/IEC 8652:1987. In December 1993, ASIS was viable and the director of the AJPO recommended this interface be used by tools needing information from the Ada program library. The ASISWG then became focused towards developing ASIS for ISO/IEC 8652:1995. As the ASISWG has no standardization authority, an ASIS Rapporteur Group (ASISRG) was established on 28 April 1995 by theISO/IEC JTC1/SC22 WG9 to standardize ASIS as an International Standard for Ada. ASISWG and ASISRG jointly cooperated to evolve ASIS as an important interface to the Ada compilation environment.

Like its LRM-Interface predecessor, ASIS defines a set of queries and services that provide uniform programmatic access to the syntactic and semantic information contained within Ada library units (i.e., vendor independence). In addition, for each Ada vendor, ASIS clients are shielded from the evolving proprietary details that implement the vendor's library representations and internal forms (i.e., version independence). ASIS is designed for implementation on a variety of machines and operating systems, while also supporting the Ada semantic requirements of a wide range of client tools.

ASIS services are essentially primitive, intended to support higher level, more sophisticated services that address the varied needs of specialized tools. While ASIS currently operates in a read-only mode, it could eventually be extended to support some (probably limited) update capability,enabling client tools to save application-dependent data (e.g., graphical information) within an existing Ada library. Although an ASIS implementor could readily support read-write features, members of the safety-critical community have emphasized the danger of providing a generalized writecapability, since this could enable editing of the internal representation to differ from the original source code.

The long-term key is to achieve a critical mass of ASIS implementations.This will promote a new generation of semantically integrated Ada tools,which in turn will increase programmer productivity and decrease Ada development costs. In summary, the availability of ASIS implementations promises to:

Stimulate improved quality within existing Ada CASE tools; currently, these tend to be weak in supporting full Ada semantics (e.g., in preservingrenamed entities, resolving overloaded subprogram names, etc.).
Enhance safety and security by providing for a new class of powerful analysis tools that apply formal methods to verify assertions in program code (e.g., using Pragma Annotate).
Eliminate the need to import Ada source code into secondary Ada compilation environments, resulting in no distortion or loss in the subject code (e.g., preserving original line numbers and the byte sizing of data).
Maximize interoperability between Ada CASE tools and Ada compilation environments, thus maximizing tool availability.
Enable the data interchange of Ada semantic information between complementary or competing Ada CASE tools, thus maximizing user choices for the best capabilities of each.
Improve the overall performance of Ada CASE tools, by eliminating the regeneration (and redundant storage) of Ada semantic information that already exists in Ada libraries.
Facilitate in-house development of informal but powerful ad hoc Ada tools, providing flexibility as needed without funding Ada CASE vendor specializations.
Promote standardization in software engineering environments (SEEs), enabling data integration of Ada semantic information with other engineeringdata present in the environment.
Establish enabling technology for new Ada CASE tools, by eliminating theneed for tool vendor investment in proprietary Ada compiler technology; this will have a major impact on stimulating the development of new code analysis capabilities.

[Up] [SIGAda] [ACM]

Last update 17 August 1998. Questions, comments to Clyde Roby (CRoby@IDA.Org)