The notion of
This chapter describes the areas in which these terms are defined and
specifies their meaning. It also proposes other terms for related
concepts and points out some dangers in the careless use or application
of these terms.
The terms described here should be considered technical terms for
users and implementors of the TEI Guidelines and should be used only in
the senses given and with the usages described.
A document is The term A document is in A TEI-local-processing-format document may be described as requiring
The following terms are synonymous: A document is in A TEI-interchange-format document may be described as requiring
The following terms are synonymous:
A document is in TEI packed interchange format with a given
With prior agreement between parties to an exchange, interchange
documents may use character code set switching as defined in ISO 2022,
its national analogues, or successor standards.
A full description of a document in TEI packed interchange format
A document follows A document follows the The SGML declaration for TEI interchange documents may differ
from that provided in TEI documentation in these ways:
The following portions of the SGML declaration may not be modified in
TEI interchange documents:
The SGML declaration for TEI-local-format documents may be modified
without restriction. Some recommendations for usage are made in
document TEI P1, but these recommendations are not normative.
A TEI-conformant document (whether for local processing or for
interchange) may make any change to the TEI-supplied document type
declarations which is allowed by SGML and the controlling SGML
declaration. All such changes should be effected within the SGML DTD
subset. For further discussion, see chapter The following must remain true of the DTD after modification:
A TEI-conformant document may be said to require It is expected that the notion of DTD extension will be particularly
useful in describing the classes of documents accepted or
validated by software.
This section is included for illustrative purposes only; it does
not restrict the processing of TEI or other documents. It simply
distinguishes a number of typical ways in which a project may choose
to apply the TEI Guidelines to different kinds of processing.
First, data might be captured by keyboarding into a locally defined
data capture format, or by scanning into a locally defined scanner-file
format. From these initial forms, transducers might convert the files
into a standard local storage format.
The local storage format might be the input format of some
application program used frequently by the project. In this case,
transducers might be necessary to prepare data for processing by other
applications. Alternatively, the local storage format might be
independent of the formats used by application programs; transducers
would be needed to prepare data for any processing. Such an
independent format is useful if the local storage format needs to
contain more information than any single application can conveniently
handle.
The local storage format might be SGML-conformant without being
TEI-conformant, e.g. because it uses local DTDs instead of the standard
TEI DTDs, or because it uses a TEI local processing format. Local
software may be used to validate a TEI
local-processing format, to transduce documents into the input formats
needed by applications, and when appropriate to transform documents
into the TEI interchange format for exchange with other sites.
Finally, the TEI interchange format may be used as a local storage
format. It is not expected that this will be a very common practice,
since it is expected that most sites interested in TEI conformance
will eventually acquire SGML-conformant software which allows for a
more compact local storage than does the interchange format. In the
absence of SGML software, however, some projects may find the TEI
interchange format (or perhaps a restrictive variant of it) useful,
because such a format can be relatively easy to parse with ad hoc
software.
Whether the local storage format is strictly TEI conformant or not,
it may follow TEI-recommended practice in its selection of textual
features to be marked up, in its tag names, in its documentation
practices, etc.
Over the course of the project, analysis and processing may result
in interim results which may be incorporated into the locally stored
copy of the text so that the interim results can be used in later
processing. This process of enrichment can be carried out either by
manual editing of the documents using conventional text editors, or by
application programs.
When a document is to be exchanged with another site using the TEI
interchange format, it must first be transduced from the local storage
format to TEI interchange form. If local documents are already
TEI-conformant, this requires either no processing at all, or a
relatively simple normalization which can be handled readily by the
normalization facilities of most SGML parsers. If the local storage
form is non-SGML conformant, some transducer must be used to transform
it into the TEI interchange format.
The TEI-interchange-format document must then be packed for shipping
into the TEI packed interchange format, using a packing program. This
program will gather the constituent parts (files) of a document into a
single file, and ensure that the file contains no characters whose safe
passage to the recipient of the data is endangered by the transmission
path. If the ultimate recipient of the document is unknown, the set of
safe characters is very small. The specific When a document is received from another site using the TEI packed
interchange format, it must first be unpacked into a TEI
interchange-format document in the local character set. It may then be
necessary to The notions of TEI interchange format and TEI packed interchange
format are central to the exchange of documents using the TEI
guidelines, whether the local storage format is TEI-conformant or not.
The TEI interchange format and the TEI local-processing format may each
be used as a local storage format, though the local storage format
might well differ from either of these without materially affecting the
use of TEI formats for interchange. The TEI interchange format being
less flexible than the local-processing format, it is expected that
sites using SGML-conformant software may use the latter, while sites
without such software may prefer the former.
The notion of TEI recommended practice, it is hoped, will be
relevant to decisions about what textual features should be recorded
during data capture and will thus affect data-capture formats and the
transducers which render captured files into the local storage format.
The TEI abstract structure may be useful in developing local
non-SGML markup schemes for data capture or for processing with ad hoc
application programs. It is strongly recommended that the TEI
recommendations, as well as the TEI abstract structure, be used for
such development as well.
Neither the character sets used for local processing nor those used
for transmission of interchange documents are restricted by the
definition of TEI conformance. For local processing, users will
typically use the system character set of their local system or some
modification thereof. For exchange with known partners, users should
choose any convenient character set; typically the most convenient is
the set of all characters which:
For blind exchange with unknown partners a conservative choice of
transmission set is needed to ensure that characters arrive
correctly. How conservative the choice need be depends on the medium
of transmission. The In transmission by disk or tape, however, no silent translation is
likely to occur, and so larger sets may be successfully used in blind
interchange. The primary danger is a failure of software in the
receiving machine to process the characters correctly; at this time
(1991), ASCII or 94-character U.S. EBCDIC appear to represent the
largest safe choices; other national character sets may of course be
used if good internal documentation is also provided.
Note that the transmission character set does not associate specific
binary encodings with the characters in the set. In the technical
senses, it is a character set, not a For further discussion of the topics addressed in this section,
reference
should be made to chapter The utility of various SGML constructs is discussed in section
2.2 of document TEI P1 version 1. The restrictions on SGML declarations
and SGML usage in TEI interchange documents discussed above under
No restrictions are made on SGML usage in the local processing
format because such usage is best determined locally and has no impact
on interchange.
The document type declaration provided by the TEI is intended to
cover as wide a variety of document types and processing needs as
proved feasible. It is impossible, however, for any finite list of
text elements to cover every need of textual research and processing.
As a result, extension of the TEI DTD has no effect
on strict TEI conformance, as long as certain restrictions are
observed; these have the effect of ensuring that later users of a file
can easily see what changes have been made to the DTDs and what the new
tags are intended to mean.
The requirement that all new or modified tags be documented,
however, is formally verifiable only to a limited extent. It is
possible for a program to verify that for every tag introduced in a DTD
modification, a corresponding record exists in a Tag Set Declaration.
It is impossible, however, to verify using formal means that the entry
in the tag set declaration makes sense. Purely formal conformance
measures, therefore, must be supplemented with human inspection of the
documentation.
The concept of DTD extension is introduced to allow the concise
description of software which is designed to handle documents encoded
using the published DTDs but which is not prepared to deal with tags
not included there.
All sections of the TEI DTD are subject to modification by the
user, except that a documentary header must be provided and
distinguished from the text itself, and that documentary header must
include tagged elements identifying the document encoded and those
responsible for the encoding. This ensures that all TEI-conformant
documents will have at least this bare minimum of accompanying
documentation.
The basic design principles of the TEI require the notion of
At the same time, the TEI is charged with formulating advice to
those engaged in the creation of new electronic texts and is required
to distinguish what is actively recommended for general use from what
is merely optional, provided for use by those engaged in a particular
sort of work.
The notion of In exchanging texts for use by others, the goal of an interchange
format is to ensure that the information encoded in an electronic
version of a text can be correctly understood and processed by the
recipient as well as by the originator of the text. To assure the
achievement of this goal, the definition offered here of TEI
conformance restricts markup in TEI conformant documents to SGML markup
and to properly declared non-SGML notations. The latter are explicitly
recommended for the encoding of tables,
figures, etc. and so cannot reasonably be excluded. Since they do
place a burden on the recipient for proper processing, the use of any
such non-SGML notation is defined to fall within the class of DTD
extensions.
Because of the escape clause for graphics, etc., it is in principle
possible to create a TEI conformant document by embedding a document
using any arbitrary markup into a driver file containing a TEI header
and a declaration for the appropriate markup as a non-SGML notation.
Though it falls within the letter, such a practice falls outside the
spirit of TEI-conformant document interchange.
<
Any
pre-transmission processing required to convert a document to meet the
above requirements for conformance to the TEI-interchange-format is
called
Without requiring DTD extensions, therefore, any TEI document may:
For local-processing purposes, a TEI document may also, without
requiring DTD extension:
Note that TEI interchange documents may
The Preparation of Text Encoding
Guidelines.