Aggregates & Alignment <author>Steven DeRose <docnum>TEI &docfile. <date>&docdate. </titlep> <!> </frontm> <!> <body> <h1>Aggregates of references (draft) <p>Given a mechanism for referring to a location such as a given element, point, or span (here called an "atom"), it is necessary to provide a mechanism for referring to aggregates of such locations. Aggregation is needed at least for these purposes: <ul> <li>Referring to a single conceptual location that happens not to be contiguous. <li>Referring to a set of locations, all of which are conceptually connected. <li>Creating collections of links, which reside apart from any of their ends. </ul> <p>These needs interact; for example, one may wish to create a link between two locations, each of which is in itself discontiguous. Two main structures were proposed at the Providence meeting for dealing with such structures (the examples below show atomic location specifiers as an attribute, but the argument is the same however they are expressed). <p>First, one could have an aggregate-location element, which collects any number of atom references: <xmp> <![ CDATA [ <aggregate> <atom loc="..."> <atom loc="..."> ... </aggregate> ]]> </xmp> <p>These would then be built into links as needed. Such links could either be embedded in the text at their origin, or be collected into a separate area to form an external web. <xmp> <![ CDATA [ <link> <aggregate> <atom loc="..."> </aggregate> <aggregate> <atom loc="..."> </aggregate> ... </link> ]]> </xmp> <p>The snag with this method is that it is unwieldy for cases where a link is marked up at one of its ends. It seems excessively verbose to code an entire link where one end is the current location; a more natural encoding would leave the local end implied, and specify only the remote location: <xmp> <![ CDATA [ <link loc="..."> ]]> </xmp> <p>This could be accomplished by allowing atoms and aggregates to occur anywhere, together or separately; links would then be a special item used just for constructing out-of-line links, and hence would require at least 2 aggregates. The DTD would then be like: <xmp> <![ CDATA [ <!ELEMENT tei1 - - (...) +(aggregate|atom) > <!ELEMENT link - - (aggregate*) > <!ELEMENT aggregate - - (atom*) > <!ELEMENT atom - - EMPTY > <!ATTLIST atom location-stuff... > ]]> </xmp> <p>An alternative is to do the same thing, but provide a special name or names for those references which are from a location out, as opposed to those within a web. In effect, a link out is a special type of aggregate. This is similar to HyTime's distinction of c-links versus a-links. Thus, <xmp> <![ CDATA [ <!ELEMENT tei1 - - (...) +(alink|clink) > <!ELEMENT alink - - (aggregate,aggregate+) > <!ELEMENT clink - - (aggregate) > <!ELEMENT aggregate - - (atom*) > <!ELEMENT atom - - EMPTY > <!ATTLIST atom location-stuff... > ]]> </xmp> <h1>Directionality <p>For some hypertext systems all links are bidirectional, and for some they are not; hence we ought to maintain the capability of expressing the distinction. The term "anchor" has been avoided above, since it can be used either in relation to markup which delimits a link target, or a link origin. <p>In an SGML framework, a link can be of several structural kinds, according to whether its origin and destination are elements, point locations, or aggregates. Also, links can be either mono- or bi-directional. The current guidelines provide means of identifying any atomic location. <h2>2.1. Note on attaching IDs when the DTD does not permit <p>The most robust method of pointing to a location is via an SGML ID, and although the TEI DTDs always permit ID attributes on all elements, other typical DTDs do not. For some users it may be useful to add to the PATH and TPATH syntax a means for stepping upward in the document tree, in order to identify a containing element of an identified element. Thus, a user with documents for which CHAPTER cannot have an ID, could point to a CHAPTER by: <ol> <li>Adding a TARGET element to the DTD, which can occur anywhere (as an inclusion exception of the document element, therefore not affecting the rest of the DTD), and which has a declared content of EMPTY: <xmp> <![ CDATA [ <!ELEMENT target - - EMPTY > <!ATTLIST target id ID #REQUIRED > ]]> </xmp> <li>Inserting a TARGET anywhere within the chapter(s) or other element(s) to be referenced: <xmp> <![ CDATA [ <CHAPTER><TARGET ID=c12>...</CHAPTER> ]]> </xmp> <li>Pointing to the ID, and stepping upwards to its parent via a new syntax such as this, which finds the TARGET and then steps up to its containing element of type CHAPTER (direct or indirect containment could be permitted): <xmp> <![ CDATA [ <XREF LOC="ID=c12/PARENT=CHAPTER"> ]]> </xmp> </ol> <h2>2.2. Bidirectional links <p>For bi-directional links between elements, the easiest solution is to put a link, of the sort described above, at each end. Since, like all TEI tags, these links can have IDs, they can use the simplest and most reliable location pointer: they can point to each other's IDs. <p>The second easiest approach, but the only possible one when the documents involved are unmodifiable, is to collect the links outside the documents, and to group them into pairs (or, conceivably, n-ary groups) within webs or similar constructs (see below). This approach can be used as a general mechanism in all cases where document ought not or cannot be modified. <h2>2.3. One-directional from a point to anything <p>For a one-directional link from a point location to anything, the link can most easily be represented by a link element inserted at the origin, which may refer to the target by any of the location-pointer mechanisms dealt with elsewhere, including aggregation. <h2>2.4. One-directional from a (non-specifically-link) element to anything <p>For a one-directional link from an element per se, the difficulty is that the element (such as "paragraph") probably did not provide in its attribute list or content model for whatever additional information is needed to specify a link. And the only global override for these, the inclusion exception, only permits the addition of sub-elements, which by definition do not share the scope of their parent. <p>Because current TEI location-pointers take up only a single attribute (except for pointers to aggregates), it would be possible to make this a universal attribute. <p>If it is sufficient for the link origin to <emph>almost</emph> share the scope of the conceptual origin element, then a normal link element could be inserted just within the bounds of the origin element: <xmp> <![ CDATA [ <P><link loc="...">Text of paragraph.</link></P> ]]> </xmp> <p>However, the (declared) content for "link" can only be ANY, which adversely affects the SGML parser's capacity for validation within the scope of the link origin. <p>A better solution is needed for this case. <h2>2.5. One-directional from a non-element location to anything <p>For a one-directional link from a non-element location, there is no natural way to locate the link at the origin in SGML. If the origin document can be modified, in many cases a link element can be inserted which exactly subtends the desired scope, solving this problem cleanly. However, if the origin document's tagging cannot be modified, or the scope would cross-cut other elements, the easiest solution is to put it out-of-line, in a separate "links" section or a web. For example, consider a link from the discontiguous location including the three lama-words: <xmp> <![ CDATA [ <line>The one-L <i>lama</i>, he's a priest,</line> <line>The two-L <i>llama</i>, he's a beast.</line> <line>And I will bet a silk pajama,</line> <line>There isn't any 3-L <i>lllama</i></line> ]]> </xmp> <p>For this to be possible, the link must be specified elsewhere, or three markup items must be coded in-line, and must all be co-indexed so a parser can determine that they go together. that is, either a link somewhere else, such as in a web: <xmp> <![ CDATA [ <web>... <link> <atom loc="the other end"> <aggregate> <atom loc='...lama'> <atom loc='...llama'> <atom loc='...lllama'> </web> ]]> </xmp> <p>or the ends can be co-indexed (this seems undesirable): <xmp> <![ CDATA [ <line>The one-L <link loc='...' co-ends="ID1 ID2"<i>lama</i>,</> he's a priest,</line> <line>The two-L <co-end ID=ID1><i>llama</i>,</> he's a beast.</line> <line>And I will bet a silk pajama,</line> <line>There isn't any 3-L <co-end ID=ID1><i>lllama</i></></line> ]]> </xmp> <h2>2.6. One-directional to an element <p>Ideally, the target has an ID attribute already, or can be assigned one (this is always possible in the TEI DTD). If it is necessary to point to an element without an ID, the location-pointer methods already in P1 are effective. <p>One possibility addition is to the syntax of the PATH and/or TPATH address specifiers: a way to step <emph>upwards</emph> in the tree. Thus, one could insert a target element (say "TARGET") to hold an ID, and yet refer not only to the TARGET element, but to the containing element. This would permit attaching links to non-read-only documents without compromising their DTDs. <h2>2.7. One-directional to a point <p>Both TEI and HyTime provide options for pointing to byte and token offsets. These are problematic because they are easily invalidated, described below, and because the invalidation cannot often be detected (any byte-offset still points to <emph>something</emph> unless the document becomes so small that it cannot; the fact that it points to the <emph>wrong</emph> place is hard to detect). Among the things that can invalidate such offsets are: <ul> <li>Any editing of the file, however minor. <li>Translating a file from one system to another, where something differs, such as whether line-ends are marked by one or by two characters. <li>Changing to an alternate character set. <li>Modifying an entity definition (perhaps the most dangerous example is the SDATA entity, which is defined to differ from system to system). <li>Normalizing a document that uses optional SGML features, especially DATATAG, wherein data content characters can also serve as markup. </ul> <p>Users may not perceive some of these actions as "changing" a file at all, and so may not anticipate that links will break. Indeed, some of these changes may happen automatically, without the user's knowledge. A change to the P1 guidelines which would make the use of offsets less dangerous would be to require that such offsets not extend across element boundaries. That is, a TEI location specifier would be required to point to an element, and only then could character offsets within it be applied (arbitrary ranges can still be specified by specifying two endpoints). <h1>Collections of links <p>Under any of these models, a web is merely a collection of links (or alinks). To be useful, applications should be able to organize the links kept in any given web, according to some properties such as those discussed below. <xmp> <![ CDATA [ <!ELEMENT web - - (link*) > ]]> </xmp> <p>A path is almost the same thing, except that it a collection of location-references (essentially clinks), and has a particular ordering. For paths, certain properties may be appropriate to the location-references, which encode the relation of each to the next. Thus we should permit associating properties with location-references, not just with links in their entirety. <h1>Properties of links <p>The use of webs, or collections of links, immediately raises the question of how to sort, search, filter, or otherwise distinguish among individual links. The natural way to do this is by having instance-specific information coded on attributes or link and or In keeping with TEI philosophy, it is probably best not to constrain the range of information permitted, but to provide ways to label it. Thus the following attributes, at least, should be available: <gl> <gt>DIRECTION <gd>whether the link is inbound, outbound, or both. <gt>TYPE <gd>a descriptor for the link's rhetorical purpose. <gt>EDITOR <gd>who made the link <gt>DATE <gd>when the link was made <gt>RENDITION <gd>the rendition properties appropriate to links differ from those for other elements, but should be available. For example: <sl> <li>Highlighting (color, font, reverse video, etc.) <li>Icon type and placement (before, after, in margin,...) <li>Cursor-change (i.e., when user places cursor over the item) <li>Display method (Replacement, NewWindow, Pop-up, etc.). </sl> </gl> <h1>Synchronization of elements <p>For certain applications it is desirable to present several pieces of information in a co-ordinated fashion. Such alignment may be in either time or space, for example: <ul> <li>Elements in two different documents may need to be displayed at the same time (such as an original and one or more translations). The correspondance across documents would most often be determined on the basis of a reference scheme encoded in attributes. <li>Elements in the same or different document may need to be interleaved in a display, such as for representation grammatical annotations in interlinear form, or critical appratus inline with arcane punctuation. <li>For temporally-extended media, it is desirable to co-ordinate a sequence of presentational events. </ul> <p>Further interaction with the committee dealing with alignment maps is desired here. If possible, a single unified mechanism should be developed for this class of problems. The current alignment map mechanism is described on pp. 138-143 and 162-165 of P1. The first step in such unification is to define al.ptr, al.list, and al.range to be identical to location pointers to elements, aggregates, and spans (respectively). This requires minimal changes to the DTDs, those changes are of simplification, and this method will provide greater flexibility than the current al.range does. <p>This reduction makes an alignment map structurally equivalent to a web. However, the display semantics are different, and indeed the appropriate display of the objects referenced by an alignment map may differ from one to another. Thus, we would recommend that the alignment map have a type attribute, by which the creator can specify the intended meaning of the alignment. Plausible values would include "temporally simultaneous", "temporally sequential", "interlinear", and so on, according to the meaning to be imputed to the particular map. </body> <!> </gdoc>

.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 .* Document proper begins. .sr docdate '15 August 1991' Aggregates & Alignment <author>Steven DeRose <docnum>TEI &docfile. <date>&docdate. </titlep> <!> </frontm> <!> <body> <h1>Aggregates of references (draft) <p>Given a mechanism for referring to a location such as a given element, point, or span (here called an "atom"), it is necessary to provide a mechanism for referring to aggregates of such locations. Aggregation is needed at least for these purposes: <ul> <li>Referring to a single conceptual location that happens not to be contiguous. <li>Referring to a set of locations, all of which are conceptually connected. <li>Creating collections of links, which reside apart from any of their ends. </ul> <p>These needs interact; for example, one may wish to create a link between two locations, each of which is in itself discontiguous. Two main structures were proposed at the Providence meeting for dealing with such structures (the examples below show atomic location specifiers as an attribute, but the argument is the same however they are expressed). <p>First, one could have an aggregate-location element, which collects any number of atom references: <xmp> <![ CDATA [ <aggregate> <atom loc="..."> <atom loc="..."> ... </aggregate> ]]> </xmp> <p>These would then be built into links as needed. Such links could either be embedded in the text at their origin, or be collected into a separate area to form an external web. <xmp> <![ CDATA [ <link> <aggregate> <atom loc="..."> </aggregate> <aggregate> <atom loc="..."> </aggregate> ... </link> ]]> </xmp> <p>The snag with this method is that it is unwieldy for cases where a link is marked up at one of its ends. It seems excessively verbose to code an entire link where one end is the current location; a more natural encoding would leave the local end implied, and specify only the remote location: <xmp> <![ CDATA [ <link loc="..."> ]]> </xmp> <p>This could be accomplished by allowing atoms and aggregates to occur anywhere, together or separately; links would then be a special item used just for constructing out-of-line links, and hence would require at least 2 aggregates. The DTD would then be like: <xmp> <![ CDATA [ <!ELEMENT tei1 - - (...) +(aggregate|atom) > <!ELEMENT link - - (aggregate*) > <!ELEMENT aggregate - - (atom*) > <!ELEMENT atom - - EMPTY > <!ATTLIST atom location-stuff... > ]]> </xmp> <p>An alternative is to do the same thing, but provide a special name or names for those references which are from a location out, as opposed to those within a web. In effect, a link out is a special type of aggregate. This is similar to HyTime's distinction of c-links versus a-links. Thus, <xmp> <![ CDATA [ <!ELEMENT tei1 - - (...) +(alink|clink) > <!ELEMENT alink - - (aggregate,aggregate+) > <!ELEMENT clink - - (aggregate) > <!ELEMENT aggregate - - (atom*) > <!ELEMENT atom - - EMPTY > <!ATTLIST atom location-stuff... > ]]> </xmp> <h1>Directionality <p>For some hypertext systems all links are bidirectional, and for some they are not; hence we ought to maintain the capability of expressing the distinction. The term "anchor" has been avoided above, since it can be used either in relation to markup which delimits a link target, or a link origin. <p>In an SGML framework, a link can be of several structural kinds, according to whether its origin and destination are elements, point locations, or aggregates. Also, links can be either mono- or bi-directional. The current guidelines provide means of identifying any atomic location. <h2>2.1. Note on attaching IDs when the DTD does not permit <p>The most robust method of pointing to a location is via an SGML ID, and although the TEI DTDs always permit ID attributes on all elements, other typical DTDs do not. For some users it may be useful to add to the PATH and TPATH syntax a means for stepping upward in the document tree, in order to identify a containing element of an identified element. Thus, a user with documents for which CHAPTER cannot have an ID, could point to a CHAPTER by: <ol> <li>Adding a TARGET element to the DTD, which can occur anywhere (as an inclusion exception of the document element, therefore not affecting the rest of the DTD), and which has a declared content of EMPTY: <xmp> <![ CDATA [ <!ELEMENT target - - EMPTY > <!ATTLIST target id ID #REQUIRED > ]]> </xmp> <li>Inserting a TARGET anywhere within the chapter(s) or other element(s) to be referenced: <xmp> <![ CDATA [ <CHAPTER><TARGET ID=c12>...</CHAPTER> ]]> </xmp> <li>Pointing to the ID, and stepping upwards to its parent via a new syntax such as this, which finds the TARGET and then steps up to its containing element of type CHAPTER (direct or indirect containment could be permitted): <xmp> <![ CDATA [ <XREF LOC="ID=c12/PARENT=CHAPTER"> ]]> </xmp> </ol> <h2>2.2. Bidirectional links <p>For bi-directional links between elements, the easiest solution is to put a link, of the sort described above, at each end. Since, like all TEI tags, these links can have IDs, they can use the simplest and most reliable location pointer: they can point to each other's IDs. <p>The second easiest approach, but the only possible one when the documents involved are unmodifiable, is to collect the links outside the documents, and to group them into pairs (or, conceivably, n-ary groups) within webs or similar constructs (see below). This approach can be used as a general mechanism in all cases where document ought not or cannot be modified. <h2>2.3. One-directional from a point to anything <p>For a one-directional link from a point location to anything, the link can most easily be represented by a link element inserted at the origin, which may refer to the target by any of the location-pointer mechanisms dealt with elsewhere, including aggregation. <h2>2.4. One-directional from a (non-specifically-link) element to anything <p>For a one-directional link from an element per se, the difficulty is that the element (such as "paragraph") probably did not provide in its attribute list or content model for whatever additional information is needed to specify a link. And the only global override for these, the inclusion exception, only permits the addition of sub-elements, which by definition do not share the scope of their parent. <p>Because current TEI location-pointers take up only a single attribute (except for pointers to aggregates), it would be possible to make this a universal attribute. <p>If it is sufficient for the link origin to <emph>almost</emph> share the scope of the conceptual origin element, then a normal link element could be inserted just within the bounds of the origin element: <xmp> <![ CDATA [ <P><link loc="...">Text of paragraph.</link></P> ]]> </xmp> <p>However, the (declared) content for "link" can only be ANY, which adversely affects the SGML parser's capacity for validation within the scope of the link origin. <p>A better solution is needed for this case. <h2>2.5. One-directional from a non-element location to anything <p>For a one-directional link from a non-element location, there is no natural way to locate the link at the origin in SGML. If the origin document can be modified, in many cases a link element can be inserted which exactly subtends the desired scope, solving this problem cleanly. However, if the origin document's tagging cannot be modified, or the scope would cross-cut other elements, the easiest solution is to put it out-of-line, in a separate "links" section or a web. For example, consider a link from the discontiguous location including the three lama-words: <xmp> <![ CDATA [ <line>The one-L <i>lama</i>, he's a priest,</line> <line>The two-L <i>llama</i>, he's a beast.</line> <line>And I will bet a silk pajama,</line> <line>There isn't any 3-L <i>lllama</i></line> ]]> </xmp> <p>For this to be possible, the link must be specified elsewhere, or three markup items must be coded in-line, and must all be co-indexed so a parser can determine that they go together. that is, either a link somewhere else, such as in a web: <xmp> <![ CDATA [ <web>... <link> <atom loc="the other end"> <aggregate> <atom loc='...lama'> <atom loc='...llama'> <atom loc='...lllama'> </web> ]]> </xmp> <p>or the ends can be co-indexed (this seems undesirable): <xmp> <![ CDATA [ <line>The one-L <link loc='...' co-ends="ID1 ID2"<i>lama</i>,</> he's a priest,</line> <line>The two-L <co-end ID=ID1><i>llama</i>,</> he's a beast.</line> <line>And I will bet a silk pajama,</line> <line>There isn't any 3-L <co-end ID=ID1><i>lllama</i></></line> ]]> </xmp> <h2>2.6. One-directional to an element <p>Ideally, the target has an ID attribute already, or can be assigned one (this is always possible in the TEI DTD). If it is necessary to point to an element without an ID, the location-pointer methods already in P1 are effective. <p>One possibility addition is to the syntax of the PATH and/or TPATH address specifiers: a way to step <emph>upwards</emph> in the tree. Thus, one could insert a target element (say "TARGET") to hold an ID, and yet refer not only to the TARGET element, but to the containing element. This would permit attaching links to non-read-only documents without compromising their DTDs. <h2>2.7. One-directional to a point <p>Both TEI and HyTime provide options for pointing to byte and token offsets. These are problematic because they are easily invalidated, described below, and because the invalidation cannot often be detected (any byte-offset still points to <emph>something</emph> unless the document becomes so small that it cannot; the fact that it points to the <emph>wrong</emph> place is hard to detect). Among the things that can invalidate such offsets are: <ul> <li>Any editing of the file, however minor. <li>Translating a file from one system to another, where something differs, such as whether line-ends are marked by one or by two characters. <li>Changing to an alternate character set. <li>Modifying an entity definition (perhaps the most dangerous example is the SDATA entity, which is defined to differ from system to system). <li>Normalizing a document that uses optional SGML features, especially DATATAG, wherein data content characters can also serve as markup. </ul> <p>Users may not perceive some of these actions as "changing" a file at all, and so may not anticipate that links will break. Indeed, some of these changes may happen automatically, without the user's knowledge. A change to the P1 guidelines which would make the use of offsets less dangerous would be to require that such offsets not extend across element boundaries. That is, a TEI location specifier would be required to point to an element, and only then could character offsets within it be applied (arbitrary ranges can still be specified by specifying two endpoints). <h1>Collections of links <p>Under any of these models, a web is merely a collection of links (or alinks). To be useful, applications should be able to organize the links kept in any given web, according to some properties such as those discussed below. <xmp> <![ CDATA [ <!ELEMENT web - - (link*) > ]]> </xmp> <p>A path is almost the same thing, except that it a collection of location-references (essentially clinks), and has a particular ordering. For paths, certain properties may be appropriate to the location-references, which encode the relation of each to the next. Thus we should permit associating properties with location-references, not just with links in their entirety. <h1>Properties of links <p>The use of webs, or collections of links, immediately raises the question of how to sort, search, filter, or otherwise distinguish among individual links. The natural way to do this is by having instance-specific information coded on attributes or link and or In keeping with TEI philosophy, it is probably best not to constrain the range of information permitted, but to provide ways to label it. Thus the following attributes, at least, should be available: <gl> <gt>DIRECTION <gd>whether the link is inbound, outbound, or both. <gt>TYPE <gd>a descriptor for the link's rhetorical purpose. <gt>EDITOR <gd>who made the link <gt>DATE <gd>when the link was made <gt>RENDITION <gd>the rendition properties appropriate to links differ from those for other elements, but should be available. For example: <sl> <li>Highlighting (color, font, reverse video, etc.) <li>Icon type and placement (before, after, in margin,...) <li>Cursor-change (i.e., when user places cursor over the item) <li>Display method (Replacement, NewWindow, Pop-up, etc.). </sl> </gl> <h1>Synchronization of elements <p>For certain applications it is desirable to present several pieces of information in a co-ordinated fashion. Such alignment may be in either time or space, for example: <ul> <li>Elements in two different documents may need to be displayed at the same time (such as an original and one or more translations). The correspondance across documents would most often be determined on the basis of a reference scheme encoded in attributes. <li>Elements in the same or different document may need to be interleaved in a display, such as for representation grammatical annotations in interlinear form, or critical appratus inline with arcane punctuation. <li>For temporally-extended media, it is desirable to co-ordinate a sequence of presentational events. </ul> <p>Further interaction with the committee dealing with alignment maps is desired here. If possible, a single unified mechanism should be developed for this class of problems. The current alignment map mechanism is described on pp. 138-143 and 162-165 of P1. The first step in such unification is to define al.ptr, al.list, and al.range to be identical to location pointers to elements, aggregates, and spans (respectively). This requires minimal changes to the DTDs, those changes are of simplification, and this method will provide greater flexibility than the current al.range does. <p>This reduction makes an alignment map structurally equivalent to a web. However, the display semantics are different, and indeed the appropriate display of the objects referenced by an alignment map may differ from one to another. Thus, we would recommend that the alignment map have a type attribute, by which the creator can specify the intended meaning of the alignment. Plausible values would include "temporally simultaneous", "temporally sequential", "interlinear", and so on, according to the meaning to be imputed to the particular map. </body> <!> </gdoc>