Formal System Identifiers

This annex states the requirements for the formal definition of notations used in system identifiers to specify access to the storage objects in which entities are stored. Access is provided by "storage managers" such as file systems, data bases, and main memory managers. Objects may be stored individually, or as part of larger storage objects, called "containers" or "archives", with a defined format for multiple-object storage (e.g., PKZIP, TAR, etc.). Access may involve auxiliary processes, such as coding conversion, record boundary recognition, and other processes required to present storage objects to the SGML parser as entities.

System identifiers

An entity is a virtual storage object. A system identifier parameter of a markup declaration can be used to map an entity onto one or more real storage objects (or portions thereof), and to specify processes to be performed in the course of accessing the object as an entity. The format of a system identifier is normally system-specific (an "informal system identifier"). However, when access to storage is specified in the manner described in this annex, the system identifier is called a "formal system identifier" (FSI). .*

Storage object specification (SOS)

A formal system identifier consists of one or more "storage object specifications", each of which identifies a storage object. The format of an SOS resembles an element, in that it consists of a tag followed by content. The storage objects are concatenated in the order specified to comprise the storage of the entity.

In the SOS tag, the name appearing as the generic identifier is that of the storage manager (SMName). It is the name of a "storage manager notation" that was identified as such by a declaration. There can also be an attribute specification list, consisting of attributes defined for the storage manager notation. These SOS attributes serve as parameters that govern the access to the storage object.

The content of an SOS is known as the "storage object identifier" (SOI). Its syntax and semantics depends on the individual storage manager.

An SOS tag is recognized by the occurrence of a start-tag open delimiter followed immediately by a declared storage manager name (see followed either by an SGML "s" separator or a tag-close delimiter. The concrete syntax of an SOS is that of the prolog in which the SOS appears.

SGML numeric character references are recognized in the attribute value literals and content of an SOS. They can be used to avoid false delimiter recognition. .*

Informal system identifiers

A system identifier is recognized as an FSI only if it begins with an SOS tag. Informal system identifiers can be used in the same document as FSI's as long as storage manager names are chosen so that informal system identifiers don't appear to begin with an SOS tag. .*** .*

Auxiliary processes

Several auxiliary processes may be required to convert a newly-created entity into the form in which it will be stored. Conversely, auxiliary processes may be needed to convert the bits of a storage object to the bit combinations seen by an SGML parser.

The SGML language is designed so that SGML documents can be stored in the same form as other text files. This design feature allows access and processing with normal text processing tools, in case SGML-aware tools are unavailable or are deficient in some respect. As a result, after an entity is created or modified a number of processes can take place as it is stored. First, if the entity is not to be stored in a single storage object, it is divided into as many portions as there are to be storage objects. The storage objects can be in different storage systems (that is, under different storage managers). For each portion, the following steps may be performed:

  1. Record boundaries are converted to the storage system form of line endings for text files (for example, carriage returns, line feeds, or both).
  2. The fixed-width bit combinations seen by the SGML parser may be converted to a variable-width code to save storage space (most likely when multi-byte codes are used).
  3. The stored text may be encrypted.
  4. The stored text may be compressed.
  5. The stored text may be "sealed" by calculating a check number that will no longer be valid if the storage object is modified.
  6. The location that the stored text is to occupy in the storage object is determined (if it is not the entire object). The text can occupy one or more extents in the storage object.

When an entity is accessed, the entity manager invokes the storage manager for each storage object specification. For each storage object, the following steps may be performed:

  1. The extents of the storage object that are occupied by the entity are located and concatenated into a single portion of stored text.
  2. The integrity of the stored text (if sealed) is verified.
  3. The stored text is decompressed (if it was compressed).
  4. The stored text is decrypted (if it was encrypted).
  5. If the code in which the text is stored requires translation and/or conversion to fixed-width bit combinations, the code is normalized.
  6. Record boundaries are recognized and converted to SGML RE and RS characters.
Although the auxiliary processing is described sequentially for clarity, an implementation can perform the processes in parallel and in any order as long as identical results are achieved. .***

Code normalization

Code normalization processes are "registered" by declaring them as notations. The following ones are defined in this International Standard: utf8 Converts UTF8 to fixed-width encoding. Invalid multi-byte sequences are represented by the character 0xFFFD. ucs2 Converts UCS2 to a fixed width encoding. The more significant octet of each character always precedes the less significant octet irrespective of the system's native byte-order. The codes 0xFFFE and 0xFEFF are not treated specially in any way. unicode Converts the Unicode coding system to a fixed-width encoding. The Unicode coding system treats each pair of octets as a character in the system's byte order. If the first character is the byte order mark character (0xFEFF), it will be discarded. If the first character is the byte order mark character byte-swapped, it will be discarded and the remaining characters will be byte-swapped. ujis Converts from the variable-width (packed) UJIS (EUC) coding scheme, to an entity coding system that represents each character in the same way as the EUC complete two-byte format. In the entity coding system the code of characters in the G0 set (usually the Japanese version of ISO 646) is unchanged; The code of characters in the G1 set (usually JIS X 0208-1990) is ORed with 0x8080; the code of characters in the G2 set (usually half-width katakana from JIS X 0201-1986) is ORed with 0x0080; the code of characters in the G3 set (JIS X 0212-1990) is ORed with 0x8000. sjis Performs an encoding conversion where the storage coding system is Shift JIS and the entity coding system is the same as with the ujis encoding (except for characters in the G3 set which are not representable using Shift JIS.) zero Converts bytes to characters by zero-extending each character. same The encoding conversion of the storage object in which the system identifier was specified is used. .***

Encryption, compression, and sealing

The methods used for encryption, compression, and sealing are "registered" by declaring them as notations. .**

Containers .*** THIS CLAUSE NEEDS A REWRITE FOR CONSISTENCY ***

A container is an entity whose storage is partitioned so that the data of several other entities ("contained objects") can be kept in it. The locations of the contained objects are specified by their entity declarations. The entity declarations serve as entries in a "table of contents" or directory. The name "sbento" comes from the Japanese word "bento": "A box or basket with multiple compartments, containing a collection of disparate entities arranged in an esthetically pleasing manner." It is an acronym for "Standard Bento Entity for Natural Transport of Objects". Container entities provide a storage organization that applications may take advantage of to avoid redundant descriptor information. Containers may also facilitate interleaving and other techniques that optimize access to multimedia data. HyTime does not prohibit overlapping of contained objects; it is for the application to determine whether overlapping is valid. Overlapping can be a useful technique, for example, for identifying the color table in a graphics entity. The color table would be declared as a separate entity, but its offset and size would position it within the storage occupied by the graphics entity.

The external identifier of the entity declaration of a contained object must be identical to that of its container entity. Container entities can be nested by specifying this attribute on the subordinate (inner) container entity. ]]> .* .*********

Identification facilities

A document indicates the presence of formal system identifiers by using the APPINFO parameter of the SGML declaration and the other facilities described in this sub-clause.

APPINFO parameter of SGML declaration

Potential use of one or more storage managers defined in accordance with these requirements is indicated by specifying the keyword "FSISM" as a sub-parameter of the APPINFO parameter of the SGML declaration. The keyword indicates the potential presence in one or more DTDs of a formal system identifier storage managers (FSISM) declaration that identifies the "storage object specification notations".

The format of the sub-parameter is: The sub-parameter can also specify the name of the FSISM declarations in the document's DTDs if it is other than "FSISM". The format is: where "FSIUsed" is replaced by the declaration name. .*

Formal system identifier storage managers declaration

A FSISM declaration identifies one or more storage manager notations used in system identifiers. System identifiers in which such notations occur are known as "formal system identifiers". A FSISM declaration should precede the storage manager notation declarations pertaining to the storage managers that it identifies. There can be more than one FSISM declaration in a DTD.

Syntactically, the FSISM declaration is a processing instruction (PI), not an SGML markup declaration. In the template below, it is shown in the reference concrete syntax. In use, the SMName-list parameter must be replaced by one or more storage manager names (SMName) declared as notation names, separated by SGML ts separators (white space). It is an RAE if the DTD or meta-DTD does not declare a notation for each SMName specified.

The declaration name is the initial character string, up to the first ts separator. The name is always "FSISM" in meta-DTDs. It should also be "FSISM" in DTDs, but provision is made for changing it in the APPINFO parameter of the SGML declaration, if necessary, to avoid the (admittedly unlikely) possibility of conflicts when retro-fitting an architecture to a document that already has PIs that begin with "FSISM ". ]]> .****

Storage manager support declarations

Each storage manager name (SMName) specified in an FSISM declaration must be declared as a notation name in a notation declaration. The declaration identifies the storage manager definition document, which defines the syntax and semantics of the storage object specifications for that storage manager. Associated with the notation declaration can be an attribute definition list declaration for "storage manager support attributes". It is not necessarily an error if the definition document cannot be accessed, as an implementation might not require access to it. The primary purpose of the declarations is to identify the storage operators that are used and to declare any support attributes that they require. .****

Storage manager notation declaration

A storage manager is a program (or a combination of programs or a portion of a program) that manages the storage of real physical objects (as opposed to an entity manager, which manages the storage of virtual objects). For a given system environment, there is likely to be a well-defined set of available storage managers. The declarations for these could be maintained in a public entity and made available to all users.

The content of an SOS (that is, the SOI) conforms to the rules of the storage manager. Where those rules allow a relative filename, it is interpreted relative to the file in which the system identifier is specified.

Declarations for an incomplete list of well-known storage managers are provided below. Because they are well-known, this International Standard is referenced as the definition document. Alternatively, declarations referencing the actual definition documents could be used.

The list also includes declarations for special storage managers actually defined in this International Standard. They are: fd The SOI is a number. The storage object specification locates the storage object that is created when the system reads from the file descriptor with that number. For example, in Unix and DOS systems, fd:0 will read the storage object from standard input. url The SOI is a Uniform Resource Locator, as used in the Internet's WorldWide Web. literal The SOI is treated as the literal text of a storage object. SGML named character references are recognized in the literal text. Literal text is used chiefly as a connector when concatenating other storage objects. The named character references can be used to insert record boundaries between the concatenated objects. thisone The SOI must be empty. The storage object is that in which the formal system identifier occurs. This one can be used in containers to make the system identifiers portable. ]]> .****

Storage manager support attributes

The designer of a storage object specification can optionally provide an attribute definition list declaration. The attributes are used to specify parameters to the storage access, in addition to the storage object identifier.

The template for such a declaration is shown below. In use, SMName is replaced by the actual notation name for the storage manager. The template includes all of the storage manager support attributes defined in this International Standard. In practice, an actual declaration might include some or none of them, possibly with other attributes unique to a particular storage manager.

The attribute container (containr) identifies the object as a contained object of a container entity. The SOI locates the object within the container. If the container is an sbento entity, the SOI must be empty or the same as an SOI for the container. The "extents" attribute then provides the necessary table of contents information.

The attribute occupied extents (extents) specifies the extents of the storage object occupied by the entity as a HyTime dimlist. The number of bits per quantum is determined by the "extquant" attribute. Multiple dimension specifications can be specified if the entity is segmented and distributed in several locations within the storage object. For example, this technique can be used to interleave the text of entities that are accessed concurrently.

The attribute extent quantum (extquant) specifies the unit of storage in which the partitioning of the container is expressed. For example, the partitioning unit would be "8" if an 8-bit bit combination were used and the desired granularity for specifying offset and size of the contained objects was the bit combination. The unit would be "16" if either a 16-bit bit combination were used and bit combination granularity was wanted, or if an 8-bit bit combination were used but the granularity for offset and size was to be pairs of bit combinations. ]]>