IRAC 4.1: Objects, Relationships, and Attributes

4.1 Objects, Relationships and Attributes

In order to express the requirements for the PCIS, it proved necessary to have a very precise description of the way terms were used. There are two orthogonal distinctions:

The distinction between things in the real world and their representation in the computer
The distinction between the abstract concept of a thing and its observed or manipulable information (this distinction has been known for centuries, see for example Aristoteles, Peri Hermeneias (de Interpretatione) approximately 350 B.C)

leading to four separate concepts:

Things in the real world, for example a person named "John Smith". Note that the concern here is with the abstract concept of this particular person. He may change his name, he may or may not be physically present when we talk about him, but the abstract concept of this particular person is quite separate from the information we may have about him.
There are facts that we can observe in the real world about these things, such as a particular person's name, age, height, weight, etc. We may know many more facts about the real world thing than are actually recorded in the computer.
The representation of the abstract concept of a thing in the computer. This is known as an "object" and may be considered as an instance of an abstract data type. It is quite separate from the data values that may be recorded about it; indeed, you can do nothing with such instances, except apply the operations relevant for the type.
The facts recorded in the computer system about objects. These are known as attributes.

Note that relationships are quite separate from the four concepts of a thing, an object to represent a thing, facts about things, and their representations as attributes.

In order to state some more specific requirements that the PCIS shall satisfy, it is necessary to specify a particular data model within which those requirements may be expressed.

The term "data model" may be used in two distinct senses:

It is used to refer to the way in which data are structured and manipulated, for example, the network model, the hierarchical model, the relational model and the Entity Relationship (ER) model.
It is used to refer to a particular schema to represent the things of interest in the environment, which are modelled within the facilities of a data model of the first sort.

The term "data model" is used in the first sense here. Within the IRAC the data storage and manipulation needs of a software project have been considered in terms of an "Object Management System".

Software projects deal with a great many collections of data, devices, people, and other things which need to be treated as single units. The representations of these in the computer (concept 2 above) are given the abstract title "objects". A computer-based system for storing, naming, and manipulating objects is called an "Object Management System".

Much of the PCIS work has had the underlying assumption that the typical flat or hierarchical file system found in a modern computer system is inadequate for the needs of software development projects. This assumption originates in the fact that most projects and companies are forced to supplement the file system's facilities with additional functions, tools, and conventions to be able to do their job. For example, an attempt is often made to give files a "type" that indicates the general form of their contents by establishing a naming convention. Files containing Ada source code might be required to have a name of the form xxxx.ada, where xxxx is the user-meaningful name of the file. It may be that the Ada compiler will only accept as input files whose names are of this form. The convention may reduce the number of characters that a user can use to make a file name meaningful and is far from foolproof. The IRAC makes typing an intrinsic part of the system and thus eliminates these and other problems. The kn

The reader is referred to the list of definitions for the precise definitions of and rationale for the following terms which are used throughout this section: OBJECT, RELATIONSHIP, and ATTRIBUTE. An understanding of the definitions of these terms is crucial to understanding the requirements and rationale that follow.

4.1A Data. The PCIS shall provide mechanisms for representing data using:

a) Objects. An object is the PCIS unit for representing "things" which are relevant to the needs of tools.

b) Relationships. A relationship is an ordered association among objects. A relationship among N objects (not necessarily distinct) is known as an "N-ary" relationship. The PCIS may restrict relationships to be binary.

c) Attributes. An attribute is an association of an object or relationship with a value. This is generally the value of a property of the object or relationship, describing its state.

d) Components of objects. An object may be specified (through a means left undefined here) to be a component of another object. An object's components then form a set. This supports abstraction, by allowing the set to be treated as a single object.

A software project involves many kinds of objects, relationships of objects, and attributes of objects and relationships, all of which must be stored and made available. For example, a piece of program source, some test data, a document, a person or a particular assignment may all be represented by objects; the actual source text, test data, text of a document, date created, storage format and access allowed may be attributes of an object; "compiled from", "referenced in", "written by", "working on" may be relationships between objects; and "date compiled" may be an attribute of a relationship.

With more advanced systems, source text may be stored as trees of smaller-than-file units of text (for example, representing statements or lexical units); documents may be held as trees of sections, paragraphs, sentences, and references to object attributes where names and properties appear; design chart graphics may be held as graphs of individual defining objects with their graphical placement data. Because the name of an object in a repository may change, and the name may appear throughout source text and documents, the source text and documents may be implemented with relationships to the repository object's name attribute. This eliminates multiply embedded, separately located copies of the name attribute. The OMS-based system provides the facilities which can be used to build powerful application models of this sort.

Attributes represent the actual data about objects or relationships. Examples are the actual source or object code of a program, the date an object or relationship was created, or a status value to describe an object. The value associated with an object or relationship by means of an attribute is referred to as the attribute value. Values are numbers, names (enumeration values), or character sequences, or aggregates thereof. Relationships are "associations" from one object to another. Examples are source code to object code, old to new revision of a document, and owner/user to owned object. Relationships may be functional mappings (one-to-one or many-to-one) or relational mappings (one-to-many or many-to-many) and may have attributes.

At least some objects represent the concept of those collections of data which we normally think of as files or data sets. For these objects the attribute that contains the data that would (in a conventional file system) be held in a file is of particular importance. The data in this attribute may consist of multiple records in a given format or of undifferentiated sequences of characters or bits. Examples are source text, test results and cross references. Other objects may represent hardware devices (either abstract or virtual), groupings of objects for purposes like naming, and users of the system.

Within the software engineering domain it is often desirable to operate on data at different levels of abstraction. The ability to decompose an object into a number of components allows the levels of abstraction to be modelled within the OMS. This has a number of important benefits including potential performance gains. For example, in a tool which operates on Ada program libraries it may be sufficient to regard the program library as a single object, whereas in other tools it may be necessary to examine the components of the program library, that is, the source files, compilation units and their inter-relationships. An object that can have components is often referred to as a "composite object".

4.1B Attribute Values. The PCIS:

a) Shall provide mechanisms for attributes whose values are at least integer and enumeration types.

This is the minimum requirement. In addition to general mechanisms for enumeration types, specialized mechanisms (such as for Boolean and character) may also be provided (for example, if efficiency concerns warrant).

b) Should provide mechanisms for attributes whose values are real numbers. The PCIS may provide mechanisms for non-scalar data, but it is not required to do so.

It is left to the PCIS designer to enumerate the additional attribute value types that are supported, for example, fixed and floating point types, array and record types.

Several communities are striving to produce standards giving portability and interoperability to applications. The choice of attribute types to be supported is crucial to those goals. So this, of all areas, is one in which the PCIS designers must give careful consideration to the choices made through other standards, to maximize interoperability.

c) Shall provide mechanisms for attributes whose values can be used as bulk data.

Bulk data may correspond to files in a conventional operating system. The definition permits, but does not require, an ability for a single object to have any number of such bulk data attributes.

4.1C Shared Components. The PCIS shall provide that any objects may share components. PCIS shall also provide that objects may constrain the number of composite objects of which they are a component, down to one.

This requirement allows for objects that are composites of overlapping sets of components, for example, two different Ada program libraries might share a source file. The ability to constrain the number of objects of which one is a component is necessary in order to model basic data structures, such as trees. There is no intent in the requirement to suggest whether the expression of the constraint is to be done on a per-object basis, or instead on a class-basis through the object type. Either approach has benefits, though the latter has the uniformity of being consistent with the expression of bounds on the number of incoming and outgoing relationships for a given object (see Requirement 4.2B(c)).

4.1D Context Sensitive Interpretation of Relationships. The PCIS shall provide the ability to interpret different relationships involving shared components in the contexts of the objects containing those components.

This is best explained diagrammatically (see below). C1 and C2 are composite objects with the common component S, each composite having a component A. The navigation from S to A in the context of C1, arrives at the package A that is correct in the context of C1, while the navigation in the context of C2 arrives at the package A that is correct for C2.

  +===============================================+
  ||                                             ||
  ||   +-----------+      +======================++=======================+
  ||   | package A | 'with(A)   +-----------+    ||                  C2  ||
  ||   | .         +------------+ with A;   | 'with(A)   +-----------+   ||
  ||   | -- vers 1 |      ||    | package S +------------+ package A |   ||
  ||   | .         |      ||    | .         |    ||      | .         |   ||
  ||   | end A;    |      ||    | .         |    ||      | -- vers 2 |   ||
  ||   +-----------+      ||    | end S;    |    ||      | .         |   ||
  ||                      ||    +-----------+    ||      | end A;    |   ||
  ||                      ||                     ||      +-----------+   ||
  ||  C1                  ||                     ||                      ||
  +=======================++======================+                      ||
                          ||                                             ||
                          +===============================================+

4.1E Granularity.

a) The PCIS mechanisms shall support objects representing data which range in size from large (up to the level of granularity of, for example, a DBMS database or the text of a book) to small (down to the granularity of, for example, paragraphs within a document or nodes within a diagram).

The data that is being dealt with in a PSE has a wide range of sizes of structures from a book or complete specification down to structures in which the grain of detail goes down to the individual sentence in a requirements document, to nodes and arcs within diagrams, and to nodes within the abstract syntax tree of a program. Further, all this structure is relevant to the software engineering process and as such should be explicitly modeled within the OMS.

Clearly the OMS can be used to represent data at any level of granularity. The important point is that this be achieved efficiently and economically. If, for example, the PCIS specifies that all objects have a large number of predefined attributes, then representing fine grain data as separate objects might cause unacceptable overhead.

b) The PCIS shall facilitate implementations to exploit common properties of composite objects in order to get good access performance to all their components.

One of the main interests of introducing facilities to define composite objects is the fact that an implementation can anticipate operations on all its components at the time where its root component is accessed. It may for instance apply a transitive locking of all components in advance or transfer data supporting the components in a cache. Facilities such as composite access control lists may be provided to allow for a centralized security control (at the level of the root, for instance).

c) The PCIS shall present to tools a uniform interface for facilities to define and manage instances of, and relationships between, data of differing granularity, even when different facility implementations are used for different degrees of granularity.

Tools and users should not be aware of the presence or absence of different facilities for differing granularities of data. Administration of data by differing facilities based on granularity may be necessary for access times commensurate with data size. However, the user should not suffer the inconveniences of a non-uniform interface to those facilities, such as reworking tools and end user procedures. Tools should be oblivious to the existence of multiple or differing facilities in the same way as they now are oblivious to NFS hosting of files on a network.

4.1F Data Consistency. The PCIS shall provide mechanisms to ensure the consistency of the data represented in the Object Management System. These mechanisms shall include at least typing, access control, synchronization, transactions, robustness and restoration. These mechanisms should support consistency which ranges, in time-scales, from short term (for example, over an individual operation) to long term (for example, configuration control and checkout/checkin).

The requirement for data consistency must include recognition of the fact that systems do fail unexpectedly, due to hardware and software faults. The PCIS must be implementable on a wide variety of hosts; therefore it is unacceptable to require fault-tolerant or multiply-redundant hardware. The PCIS design must actively support recovery from unexpected failures.

There are several basic approaches to this problem. Modern data base managers usually implement the "transaction", a unit of work that appears, from the viewpoint of the rest of the system and of unexpected faults, to happen "all at once". This is sometimes implemented by a journal system, in which the data base is recorded at a given time and subsequent transactions are recorded in a journal. After a crash, the data base is restored from the recording and the transactions from the journal are re-applied to it.

Modern operating systems often build a certain amount of redundancy into their data structures and include a "scavenger" program that can scan the structure after a crash and use the redundancy to correct any inconsistencies in it. This is sometimes formalized as a system of "truths" and "hints", in which the truths are handled in such a way that they cannot be made inconsistent by a crash. Hints are used by the operating system in its normal operation, for instance, to speed access, but are always checked against the truths. When a hint and truth conflict, the hint is discarded. A scavenger program can recreate the hint system from the truths.

The PCIS has aspects of both data base and operating system, and both of the above approaches are possible implementations. In fact, they are quite similar at a basic level, differing mostly in the terminology used. The PCIS designer is not constrained to adopt either approach except insofar as there are specific requirements for a usable transaction mechanism. However, data consistency is a very important requirement, so much so that it may set the tone for the entire PCIS design.

The consistency mechanisms enumerated here are all enumerated in other, more specific requirements. The purpose here is to ensure their complementary working over wide time-scales.

Go forward to Section 4.2, Typing.