[Up]
[SIGAda]
[ACM]
ASIS Rationale
Benefits of Code Analysis
Why is something like ASIS needed? ASIS provides a basis for
implementing portable tools that are aimed at analyzing static
properties in Ada source code. Such code analysis capability has in
general been under leveraged in software organizations; but for the
Ada language in particular, it can greatly enhance the development
process. Code analysis automation canharness the excellent software
engineering features of Ada to facilitatecode comprehension, high
reliability, and high quality of the software product. The following
text presents some motivational background.
Definition
Code analysis is the inspection of software source code for the
purpose of extracting information about the software. Such information
can pertain to individual software elements (e.g., standards
compliance, test coverage), the element attributes (e.g., quality,
correctness, size, metrics),and element relationships (e.g.,
complexity, dependencies, data usage, call trees); thus, it can
support documentation generation, code review, maintainability
assessment, reverse engineering, and other software development
activities.
Extracted information falls into two major categories: descriptive
reports which present some view of the software without judgment
(e.g., dependency trees, call trees), and proscriptive reports which
look for particular deficiencies -- or their absence (e.g., stack
overflow, excessive complexity, unintended recursion).
Applicability
Broadly speaking, the application of automated code analysis in the
software development process promises, among other things, to:
-
Promote discipline and consistency during development, increasing
productivity and reducing unintended variation.
- Provide empirical
evidence and metrics for process monitoring and improvement.
- Supplement code inspection and review, diversifying beyond the
limitations of testing or manual checking.
- Preserve architectural
integrity in the software as compromises are madeduring development.
- Avoid violations of coding standards, such as the use of inefficient
language constructs.
- Increase the correctness and quality of
delivered software, reducing defects via comprehensive assessment.
- Enhance safety and security by applying formal methods to verify
assertions in program code.
- Expedite program comprehension during
maintenance, for engineers new to the code.
- Support reengineering and
reuse of legacy code, reducing costs.
- Result in reduced risk to
budget and schedule.
The software development life-cycle phases where code analysis can be
beneficially applied include all those in which source code exists:
preliminary design (software architecture and interface definition),
through testing and system integration, to maintenance and
reengineering. Hence, automated code analysis is a technology that
primarily supports the back endof the software life cycle.
Motivation
Over the years, a wide-ranging set of commercial code analysis tools
hasbecome increasingly available [11]; examples of such tools include:
- Data flow analysis and usage metrics
- Invocation (call) trees and cross-reference
- Dependency trees and impact analysis
- Timing and sizing estimation
- Test-case generation and coverage analysis
- Usage counts of language constructs
- Quality assessment metrics
- Coding style and standards compliance
- Safety and security verification
- Code browsing and navigation
- Documentation generation
- Reverse engineering and re-engineering
- Language translation and code restructuring
Unfortunately, the current state-of-the-practice in software
developmenteither omits code analysis support altogether or only
incorporates it asan ancillary, undisciplined, ad hoc resource. For
example, it is not uncommon to find within a given project various
home-grown tools that support the above goals but which are not
recognized as overtly participating in the development process. Such
tools can be quite obtuse (very indirect extraction of information)
and are typically incomplete (handling only a subset of the
development language). Further, they tend to be project-specific (or
even person-specific), and cannot be reused in another project:they
are later redeveloped from scratch.
These observations corroborate that the need for code analysis is
genuine, and that a common set of uniform tools could provide
significant benefits to projects. But in the case of Ada software,
commercial code analysis tools have historically proven to be barely
adequate, manifesting a variety of problems whose nature and origin
are described below.
Technology for code analysis
For Ada, why is ASIS the best approach? Code analysis tools are not
new,having been available for decades; but the advent of the Ada
language has exposed a variety of analysis limitations and has
consequently demandedmore comprehensive technology. The following text
articulates various Ada-specific issues from a historical perspective:
it reviews several technologies that have historically been applied
-- with varying degrees of success -- to code analysis specifically
targeted to Ada software. The review is not comprehensive, but it
sketches the evolution of issues that have propelled the development
of the ASIS concept.
Code parsers
Historically, many commercial code analysis tools have been supplied
by compiler vendors in conjunction with their compiler products. But
as the community of CASE tool vendors has grown, such tools are often
available independent of any compiler. Tool developers have found that
conventionalparser technology is sufficient for most traditional
languages; thus when Ada came along, most vendors expected it would
suffice to simply adapt their parsers to handle Ada syntax. But for
Ada, the result has held manydisappointments:
- Textual code editors are often sensitive to Ada syntax but not to Ada
semantics.
- Graphical design editors yield valid graphics, but invalid
Ada designs.
- Source-level tools such as debuggers are forced to
understand and traverse the internal data structures of program
libraries rather than the textof original source files.
- Reverse
engineering and test tools manifest difficulties when trying to
resolve overloaded subprogram names or renamings.
- Except for
compilers, Ada tools do not require Ada Compiler Validation Capability
(ACVC) certification; hence, such tools typically fail to handle the
complete repertoire of Ada language features.
Consider the case of a toolsmith who wants to develop a call-tree
analyzer. For such a tool to accurately process Ada source files, the
toolsmithwould be forced to build almost the entire front end of an
Ada compiler -- a decidedly major undertaking that far out-scopes the
original tool building effort. But CASE tool vendors are not in the
compiler business; most are reluctant to make this major investment,
or have tried and failed. Yet tools built on parser technology alone
are not able to fully support the semantic richness of Ada.
DIANA
Many Ada compilers store program units into libraries. They typically
structure the information according to some proprietary internal form,
suchas trees of DIANA (Descriptive Intermediate Attributed Notation
for Ada -- note that the following discussion applies to all internal
forms, but that DIANA is singled out due to its public documentation
[5, 6]: DIANA had been intended for standardization, but failed due to
the unexpectedly wide variation in internal forms). Such trees thus
encode both syntactic and semantic information about Ada programs. The
root of a DIANA tree corresponds to a compilation unit; the nodes
correspond to smaller Ada structures, such as declarations and
statements. Node attributes contain descriptive information and
references to other nodes.
DIANA trees offer great convenience and power to toolsmiths, and are
sufficient to support the implementation of a large variety of tools
(including code generators in compilers). For example, with access to
DIANA, thetoolsmith who wanted to develop the call-tree analyzer would
have a fairly straightforward project. Furthermore, the tool would
exhibit better performance, bypassing the needless regeneration (and
redundant storage) ofintermediate compilation results that are already
available in the Ada libraries.
The power of DIANA is sufficient to support the implementation of a
virtually unlimited variety of tools. In general, any tool that
requires the semantic information generated by an Ada compiler can
benefit from accessto DIANA. But as with any technology, the use of
DIANA also has drawbacks:
- A given implementation of DIANA by a vendor is subject to change:
upgrades can obsolete tools written against previous versions,
hampering maintenance.
- Similarly, DIANA implementations vary from
vendor to vendor: porting a tool across platforms is a risky endeavor.
- DIANA is hard to use: the trees are quite complex, making it difficult
to write and debug tools written against a DIANA specification.
- The
lack of a simple mapping to Ada makes DIANA hard to understand: as an
abstracted representation of an Ada program, it does not map
intuitively to concrete Ada structures.
- DIANA is not extensible; but
tools may need to add attributes for storing graphical or other
tool-specific information.
LRM-interface
Thus a growing need arose to make tool development possible at the Ada
level rather than at the internal representation level. It was these
issues that drove some Ada compiler vendors to independently develop
proprietary higher-level interfaces to encapsulate their Ada program
libraries.
In particular, to overcome the drawbacks of DIANA while retaining all
ofits advantages, Rational Software Corporation developed their
LRM-Interface product [8] in the late 1980's. It provided nearly the
same power as DIANA, through services that extracted a variety of
information from the internal DIANA trees. The LRM-Interface was also
considerably easier to understand than DIANA, because it used the
already-familiar terminology defined in the Ada LRM (the original
Reference Manual for the Ada Programming Language [12], or its more
recent version, the Ada 95 Reference Manual[7]). Furthermore, the
LRM-Interface was not subject to change (or at least much less so than
was the underlying DIANA), so tools written againstit were easily
migrated to updated implementations.
Regardless of LRM-Interface specifics, this and other similar
approachesgenerally provide great flexibility: for example, an ad hoc
tool can be easily and quickly built by in-house engineers, without
funding the development or specialization of a commercial tool. But as
expected, this approach also has shortcomings:
- While DIANA as a data structure is not extensible, the above
interfaces can be extended in the sense of user-supplied secondary
functions built on the functions already provided; even so, all the
functions are read-only and cannot modify any state.
- Importing the
subject source code into the tool environment can require edits that
necessarily result in code distortion, such that original
code attributes might not be preserved (e.g., line numbers or the byte
sizingof data).
- Tools are vendor dependent, such that a given tool
cannot access Ada libraries from multiple vendors, or equivalent tools
from multiple vendors cannot access a given Ada library.
- Data
interchange is not standardized among tools, so users can't configure
their own integrated toolsets by choosing from competing or
complementary vendors.
- Within a software engineering environment
(SEE), Ada semantic information remains isolated from, and not
integrated with, other engineering data present in the environment.
ASIS
Historically, only a few Ada vendors provided access to the
information contained in their proprietary Ada program libraries, and
each such interface was unique. Thus began to emerge the need for an
open standard that would allow uniform, vendor-independent access to
that information.
Leveraging some informal efforts, the STARS program initiated the
development of the Ada Semantic Interface Specification (ASIS) in
1990; but shortly thereafter, the activity became unfunded due to the
STARS decision to no longer support standardization efforts. Despite
this, several of theinvolved vendors (primarily TeleSoft) continued
the ASIS work on a volunteer basis. Some time later, Rational also
became an active participant and seeded the draft standard by
contributing their LRM-Interface specification to ASIS.
In 1992, the Ada Board recognized the potential benefits to the Ada
community of an ASIS standard, and recommended that the Ada Joint
Program Office (AJPO) director support "by whatever means possible the
development of an ASIS standard and its submission to ISO/WG9 for
publication." The Association for Computing Machinery (ACM) Special
Interest Group on Ada (SIGAda) took on this important work though
volunteer effort in the ASIS Working Group (ASISWG) [2]. The ASISWG
developed the interface to ISO/IEC 8652:1987. In December 1993, ASIS
was viable and the director of the AJPO recommended this interface be
used by tools needing information from the Ada program library. The
ASISWG then became focused towards developing ASIS for ISO/IEC
8652:1995. As the ASISWG has no standardization authority, an ASIS
Rapporteur Group (ASISRG) was established on 28 April 1995 by
theISO/IEC JTC1/SC22 WG9 to standardize ASIS as an International
Standard for Ada. ASISWG and ASISRG jointly cooperated to evolve ASIS
as an important interface to the Ada compilation environment.
Like its LRM-Interface predecessor, ASIS defines a set of queries and
services that provide uniform programmatic access to the syntactic and
semantic information contained within Ada library units (i.e., vendor
independence). In addition, for each Ada vendor, ASIS clients are
shielded from the evolving proprietary details that implement the
vendor's library representations and internal forms (i.e., version
independence). ASIS is designed for implementation on a variety of
machines and operating systems, while also supporting the Ada semantic
requirements of a wide range of client tools.
ASIS services are essentially primitive, intended to support higher
level, more sophisticated services that address the varied needs of
specialized tools. While ASIS currently operates in a read-only mode,
it could eventually be extended to support some (probably limited)
update capability,enabling client tools to save application-dependent
data (e.g., graphical information) within an existing Ada
library. Although an ASIS implementor could readily support read-write
features, members of the safety-critical community have emphasized the
danger of providing a generalized writecapability, since this could
enable editing of the internal representation to differ from the
original source code.
The long-term key is to achieve a critical mass of ASIS
implementations.This will promote a new generation of semantically
integrated Ada tools,which in turn will increase programmer
productivity and decrease Ada development costs. In summary, the
availability of ASIS implementations promises to:
- Stimulate improved quality within existing Ada CASE tools; currently,
these tend to be weak in supporting full Ada semantics (e.g., in
preservingrenamed entities, resolving overloaded subprogram names,
etc.).
- Enhance safety and security by providing for a new class of
powerful analysis tools that apply formal methods to verify assertions
in program code (e.g., using Pragma Annotate).
- Eliminate the need to
import Ada source code into secondary Ada compilation environments,
resulting in no distortion or loss in the subject code (e.g.,
preserving original line numbers and the byte sizing of data).
- Maximize interoperability between Ada CASE tools and Ada compilation
environments, thus maximizing tool availability.
- Enable the data
interchange of Ada semantic information between complementary or
competing Ada CASE tools, thus maximizing user choices for the best
capabilities of each.
- Improve the overall performance of Ada CASE
tools, by eliminating the regeneration (and redundant storage) of Ada
semantic information that already exists in Ada libraries.
- Facilitate
in-house development of informal but powerful ad hoc Ada tools,
providing flexibility as needed without funding Ada CASE vendor
specializations.
- Promote standardization in software engineering
environments (SEEs), enabling data integration of Ada semantic
information with other engineeringdata present in the environment.
- Establish enabling technology for new Ada CASE tools, by eliminating
theneed for tool vendor investment in proprietary Ada compiler
technology; this will have a major impact on stimulating the
development of new code analysis capabilities.
[Up]
[SIGAda]
[ACM]
Last update 17 August 1998.
Questions, comments to
Clyde Roby (CRoby@IDA.Org)