.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 .* Document proper begins. .sr docdate '10 November 1991' Relevance of ODA (ISO 8613) to the TEI <author>David G. Durand <address> <aline>Department of Computer Science <aline>Boston University </address> <docnum>TEI &docfile. <date>&docdate. </titlep> <!> </frontm> <!> <body> <h1>Introduction <p>This document contains a fulfillment of part of the TEI's mission: an examination of the relevance of the ISO/CCITT Office Document Architecture to the Text Encoding initiative. As SGML<fn>ISO standards 8879, 8879-1 (Annex)Information Processing --- Text and office systems --- Standard Generalized Markup Language (SGML)</fn> and ODA<fn>ISO standards 8613-1, 8613-2, 8613-4, 8613-5, 8613-6, 8613-7, 8613-8 Information processing --- Text and office systems --- Office Document Architecture (ODA) and interchange format.</fn> are the two most important international standards for text representation, they should both be examined for suitability for the purposes of the TEI. This paper attempts to investigate the good and bad points of both SGML and ODA, with special reference to the problems that the TEI faces. In preparing this report I have been greatly helped by an ISO working paper<fn> <q>Comparison between the main ODA and SGML Objectives and Proposal for Future Work</q> by U. Flasche, TU Berlin, A . Scheller, and HMI Berlin. Extracted from <q>ISOTEXT --- and ODA/SGML WYSIWYG Editor/Formatter</q> by S. Schindler, U. Flasche, C. Bormann, TU Berlin / TELES, A. Scheller, and HMI Berlin. Submitted for publication in <q>Informatik-Spektrum</q>, Springer Verlag.</fn> comparing ODA and SGML. <p>The main need of the TEI is given in its name. The primary goal is the <ital>encoding</ital> or description of any features that might be objects of literary, linguistic, or lexicographic research. While some new documents will doubtless be authored using the TEI formats, the primary purpose of TEI encodings is a descriptive one --- the fundamental model is of a scholar or other specialist preparing a machine readable edition of a pre-existing document of some sort. As preliminary work on the TEI has shown, the range of documents to be represented is extremely broad, from incunabula and modern editions, to manuscripts, papyrus fragments and dictionaries. Thus the TEI really has 3 goals --- not only to define nomenclature and techniques to describe many existing kinds of texts, but to demonstrate by example how text encoding can be done in a principled and portable way, and finally to select and describe adequately techniques that can be used to conform to the second goal while extending the guidelines where they may be insufficient to meet specialized needs. <h1>SGML, ODA and the TEI <p>Probably the most significant characteristic of SGML is that it is a metalanguage for describing texts as well as a method for marking their structure. Indeed, SGML does not have a defined semantics, in the manner of many text description languages, instead it provides a flexible language for structuring text as a labelled tree (or several concurrent labelled trees if the CONCUR option is used). It does not specify what interpretation is to be placed on the nodes of the document. This no-semantics semantics seems needlessly weak, but in fact provides the great power of SGML --- by freeing the text from dependence on a particular semantics (definition of how the text should be processed) it allows (or at least does not prevent) the same text from being used for a multitude of different purposes with ease. <p>This ability to describe texts in an abstract way is the primary need of the TEI, since so many different disciplines and features may be involved in the coding of any text. However, some of the sorts of information that may need to be coded about texts, such as page layout, line breaks, font or face changes, and the like have been the traditional province of text-formatting languages. <p>ODA also has the capability to describe a document as a hierarchically organized structure, but also has extensive facilities for the representation of document formats, from page layout and fonts to hyphenation rules and kerning. Thus ODA seems to have the capacity to describe at least some things that SGML cannot (at least without additional work). <p>It is worth briefly noting the irrelevance of a frequently mentioned fact about ODA and SGML. The reference syntax for ODA is a complex binary data stream designed to be integrated into an OSI network architecture. The ODA standard defines a representation for ODA binary data streams (ODL) that uses SGML syntax to represent ODA constructs. This is sometimes taken as showing one of several things that ODA is <q>really</q> a subset of SGML; that SGML is <q>really </q> useless without such an encoding; or that an ODA document coded in SGML using this format would be useful to a typical SGML application designed without ODA in mind. None of these conclusions is a proper representation of the the situation. While it could be useful in some interchange contexts to be able to create a non-binary ODA stream, and while it might be useful for an ODA-knowledgeable SGML application to be able to read ODA files without having two input languages, the encoding does not operate at a high enough level to allow easy interoperation of the two standards. This is not surprising given SGML's emphasis on generality and multi-usage texts, and ODA's emphasis on authoring and transmitting formatted (but potentially reformattable) office documents. <h1>ODA and SGML Document Structure Description <p>ODA provides the ability to represent a document as two separate hierarchies --- a layout hierarchy, and a logical hierarchy. The logical hierarchy corresponds to the level of description provided by SGML. At most 1 logical, and 1 layout hierarchy are permitted in an ODA document. Facilities exist for defining logical hierarchies by creating <q>generic documents</q> whish specify names and attributes to be used by a document's logical objects. Generic documents (in those portions relating to logical structure) can specify similar sets of relationships as are specified by SGML DTDs, though some constraints expressible by SGML DTDs may not be possible in ODA. ODA attributes have flexible default-value and data type semantics, but do not correspond to SGML attributes, as there is a fixed set with required semantics. User-defined logical attributes all reside as the values of a special <q>bindings attribute</q>, which allows arbitrary attribute values expressed as character strings. Thus the user definable attributes of ODA correspond fairly closely in power with the SGML attribute features. <p>SGML does not have any layout semantics, so any attempt to describe the layout of a document requires the creation of specific tags for layout features to be indicated. While there are no obstacles to this process, there is no aid in the form of predefined structures or functions oriented to layout description. ODA on the other hand, possesses a rich vocabulary of document format structures, indeed so many that a detailed description and evaluation of them would exceed the bounds of this paper. However, detailed consideration of the full set of layout features is not necessary in order to evaluate their usefulness to the TEI. Indeed, the most important aspect of the layout specifications is the extent to which they can be used as <ital>descriptive tools</ital> for describing the layout of already existing texts in the TEI's likely domain of application. <p>On these grounds, the ODA format descriptions, extensive as they are, fail to meet the full needs of the TEI. The fundamental orientation of ODA is the interchange of texts that are to be authored in an ODA compliant environment. In essence, ODA is describing a text-formatting language, capable of producing reasonably reproducible graphic effects of the type used in contemporary office documents. Indeed, while there is a wide range of formatting specifications expressible in ODA, there is no way to extend the standard even to annotate a layout feature as possessing some non-standard layout feature. A great deal of the ODA formatting specifications (and these constitute the bulk of the standard), is devoted to descriptions of how a document processor is required to handle particular formatting features both in the normal case, and in exceptional conditions that might arise in formatting some document. For at least some documents, such as manuscripts or early printed works, the ODA formatting descriptions are inadequate. <H1>Summary <p>ODA, while it seems attractive on the surface, especially given its inclusion of both format and logical structures in the same description language, falls short of meeting the TEI's needs in several respects <ul> <li>restriction of logical to one hierarchy; <li>binary file format that requires special software to manipulate; <li>formatting abilities oriented to document processing rather than document description; <li>lack of formatting extensibility; </ul> <p>Thus it is SGML's lack of commitment to any particular semantics that makes it so useful for the TEI's purposes. While ODA has comparable descriptive capabilities, its emphasis on document layout and its binary file format make it more complex than SGML --- for the relevant features. Its limitation to a single hierarchical view of a document is also a string limitation. Indeed, even within the SGML multiple hierarchy model, there are descriptive difficulties that argue for less-strongly hierarchical descriptions in some cases. </body> <!> </gdoc>