Alignment mechanisms <author>D. Terence Langendoen <address> <aline>Department of Linguistics <aline>University of Arizona <aline>Tucson, AZ 85721 USA <aline>langendt@arizona.edu </address> <docnum>TEI &docfile. <date>&docdate. </titlep> <!> <preface>Request for comments <p>Comments are welcome, but if you have any, send them quickly, to the sender, and, if you think they are important enough, to the file server or the TEI editors. </preface> </frontm> <!> <body> <h1 n=1>Background <h2>1.1 Explicit alignment in TEI P1 <p>The mechanism for explicit alignment of different parts of text with one another is described in detail in <cit>TEI P1, section 6.2.5, pp. 142-144</cit>. It consists of an <tag>alignment</tag>, made up of pointers (in the form of <tag>al.ptr</tag> and other elements), which point from the <tag>alignment</tag> into the <tag>text</tag>. However, it is possible also to have pointers from the <tag>text</tag> into the <tag>alignment</tag>, if <tag>al.ptr</tag> and the other elements in the <tag>alignment</tag> are permitted to have their own <att>id</att>s. This mechanism can be used to align not only the different linguistic analyses of text portions (the only application described in that section), but also parallel texts. <h2>1.2 Proposal in TEI AI2 W1 for <timeline> <p>The Spoken Text Workgroup chaired by Stig Johansson proposed that encodings of spoken texts be accompanied by a <tag>timeline</tag>, consisting of pointers from the <tag>text</tag> into the <tag>timeline</tag>. One purpose for the <tag>timeline</tag> is to align the elements of spoken text according to when they were spoken, so that, for example, temporal overlapping of speakers can be represented. <h1 n=2>Discussion of alignment mechanisms at Myrdal meeting <p>At the meeting of Working Group chairs and other TEI representatives recently concluded at Myrdal, Norway, the similarity of the <tag>alignment</tag> and <tag>timeline</tag> mechanisms was noted, and I was charged with the task of reconciling the two mechanisms for <cit>TEI P2</cit>. <fn>See <cit>Norway meeting report, posted on TEI-L, 24 November 1991, item 7</cit>.</fn> <p>In the next section, <hdref refid=d3>, I outline my solution to this problem. As this is being written, my research assistant Steven Zepp is writing the formal specifications and validating them. Once they are prepared and validated, they will be circulated. <h1 id=d3 n=3>Outline of proposal for TEI P2 <h2>3.1. Justification for distinct tagsets <p>The two mechanisms are sufficiently different that they require different tagsets. An <tag>alignment</tag> as described in <cit>TEI P1</cit> consists of one or more <tag>al.map</tag>s, which in turn must consist of at least two <tag>al.ptr</tag>s or other pointing elements (<tag>al.list</tag> and <tag>al.range</tag>). The two or more pointers that are grouped together within an <tag>al.map</tag> may be said to <emph>correspond</emph> to one another. On the other hand, a <tag>timeline</tag> contains a set of <tag>point</tag>s, which as described in <cit>TEI AI2 W1</cit> are unstructured, but which can be thought of as consisting of zero or more pointing elements, which are aligned with those points (and perhaps understood as synchronous with those points). The <tag>timeline</tag> moreover requires attributes which the <tag>alignment</tag> does not; similarly the <tag>point</tag>s require attributes which the <tag>al.map</tag>s do not. <h2>3.2. Restructuring of the tagset for analytic and textual correspondence <p>First, I suggest that certain tags in the tagset for textual and analytic correspondence (what <cit>TEI P1</cit> called <cited>explicit alignment</cited>) be renamed as follows. <ol> <li><tag>alignment</tag> becomes <tag>corresp.grp</tag>; <li><tag>al.map</tag> becomes <tag>corresp</tag>; <li><tag>al.ptr</tag> becomes <tag>xref</tag>;<fn>On the use of <tag>xref</tag> as a general purpose pointer, see <cit>Norway meeting report, posted on TEI-L, 24 November 1991</cit>. I assume here that <tag>xref</tag> is characterized as in <cit>TEI P1</cit>. However, its definition may be expected to change in ways which are not material here.</fn> <li><tag>al.list</tag> becomes <tag>xref.grp</tag>, and should consist of two or more <tag>xref</tag>s, not one or more <tag>al.ptr</tag>s, as in <cit>TEI P1</cit>. </ol> The names <cited>corresp.grp</cited> and <cited>corresp</cited> more accurately reflect the intended semantics of their respective elements than do <cited>alignment</cited> and <cited>al.map</cited>. <p>Second, I propose to eliminate <tag>al.range</tag>, as its function can be subsumed under <tag>xref</tag>, in virtue of the <att>target.end</att> attribute on <tag>xref</tag>. <h2>3.3. Restructuring of the tagset for synchronizing text <p>First, I suggest that certain tags in the tagset proposed in <cit>TEI AI2 W1, Spoken Texts</cit> for synchronizing text be renamed as follows. <ol> <li><tag>timeline</tag> becomes <tag>align</tag>; <li><tag>point</tag> becomes <tag>loc</tag>. </ol> The reason for giving <tag>timeline</tag> a more neutral name is that it can be used not only for alignment with time but also with any one-dimensional structure associated with a text, such as lineation and word position. The reason for giving <tag>point</tag> a new name is to dissociate it from the notion of a <term>pointer</term>; the name <cited>loc</cited> indifferently represents <gloss>locus</gloss> or <gloss>location</gloss>. <p>Second, I suggest that <tag>align</tag> should have certain attributes which specify whether it is a temporal or spatial (textual) alignment; what the origin is, if any; whether the <tag>loc</tag>s are understood to be a fixed distance apart; how the distance beween <tag>loc</tag>s is measured; whether the distance to a particular <tag>loc</tag> is being measured from the origin or from the immediately preceding <tag>loc</tag>; etc. Similarly <tag>loc</tag>s should have attributes which indicate their value in the dimension represented by <tag>align</tag> (at minimum, the <tag>loc</tag> identified as the origin should be so specified); their distance from the previous <tag>loc</tag> or the origin; etc. <p>Third, I suggest that the content model for an <tag>align</tag> be one or more <tag>loc</tag>s, like that of a <tag>corresp.grp</tag>, which consists of one or more <tag>corresp</tag>s. However, a <tag>loc</tag> should consist of zero or more <tag>xref</tag>s, in contrast to a <tag>corresp</tag>, which consists of two or more <tag>xref</tag>s or <tag>xref.grp</tag>s. The content model for <tag>loc</tag> need not include <tag>xref.grp</tag>s. <p>Finally, note that both <tag>corresp.grp</tag> and <tag>align</tag> permit pointing from the text to the map and from the map to the text. This bidirectionality is illustrated in the following illustration of the use of <tag>align</tag>. <h3 id=d331>3.3.1 Example of the use of a temporal <align> <p>The following example is adapted from <cit>TEI AI2 W1, section 8.5, Speaker overlap</cit>. <xmp> <![ CDATA [ <text> <u who=A><xref id=x1 target=p1>this <xref id=x2 target=p2>is <xref id=x3 target=p3>my <xref id=x4 target=p4>turn<xref id=x5 target=p5></u> <u who=B><xref id=x6 target=p2>balderdash<xref id=x7 target=p4></u> <u who=C><xref id=x8 target=p3>no <xref id=x9 target=p4>it's mine<xref id=x10 target=p5></u> <kinesic who=B id=k1 start=p4 end=p5 desc="waves arms"> </text> <align origin=p1 interval=1 measured.from=previous> <loc id=p1 value=0> <xref target=x1> </loc> <loc id=p2> <xref target=x2> <xref target=x6> </loc> <loc id=p3> <xref target=x3> <xref target=x8> </loc> <loc id=p4> <xref target=x4> <xref target=x7> <xref target=x9> <xref target=k1> </loc> <loc id=p5> <xref target=x5> <xref target=x10> <xref target=k1> </loc> </align> ]]> </xmp> <h1 n=4>Possible extensions of <align> to multidimensional alignment <p>Each <tag>align</tag> represents a one-dimensional structure (i.e., sequence) of <tag>loc</tag>s. It is easy to see how this concept can be extended to two-, three- and even higher-dimensional structures, to represent, for example the alignment of text on a page (two-dimensional structure) or a book (three-dimensional structure in which the third dimension is the page number). </body> </gdoc>

.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 .* Document proper begins. .sr docdate '25 November 1991' Alignment mechanisms <author>D. Terence Langendoen <address> <aline>Department of Linguistics <aline>University of Arizona <aline>Tucson, AZ 85721 USA <aline>langendt@arizona.edu </address> <docnum>TEI &docfile. <date>&docdate. </titlep> <!> <preface>Request for comments <p>Comments are welcome, but if you have any, send them quickly, to the sender, and, if you think they are important enough, to the file server or the TEI editors. </preface> </frontm> <!> <body> <h1 n=1>Background <h2>1.1 Explicit alignment in TEI P1 <p>The mechanism for explicit alignment of different parts of text with one another is described in detail in <cit>TEI P1, section 6.2.5, pp. 142-144</cit>. It consists of an <tag>alignment</tag>, made up of pointers (in the form of <tag>al.ptr</tag> and other elements), which point from the <tag>alignment</tag> into the <tag>text</tag>. However, it is possible also to have pointers from the <tag>text</tag> into the <tag>alignment</tag>, if <tag>al.ptr</tag> and the other elements in the <tag>alignment</tag> are permitted to have their own <att>id</att>s. This mechanism can be used to align not only the different linguistic analyses of text portions (the only application described in that section), but also parallel texts. <h2>1.2 Proposal in TEI AI2 W1 for <timeline> <p>The Spoken Text Workgroup chaired by Stig Johansson proposed that encodings of spoken texts be accompanied by a <tag>timeline</tag>, consisting of pointers from the <tag>text</tag> into the <tag>timeline</tag>. One purpose for the <tag>timeline</tag> is to align the elements of spoken text according to when they were spoken, so that, for example, temporal overlapping of speakers can be represented. <h1 n=2>Discussion of alignment mechanisms at Myrdal meeting <p>At the meeting of Working Group chairs and other TEI representatives recently concluded at Myrdal, Norway, the similarity of the <tag>alignment</tag> and <tag>timeline</tag> mechanisms was noted, and I was charged with the task of reconciling the two mechanisms for <cit>TEI P2</cit>. <fn>See <cit>Norway meeting report, posted on TEI-L, 24 November 1991, item 7</cit>.</fn> <p>In the next section, <hdref refid=d3>, I outline my solution to this problem. As this is being written, my research assistant Steven Zepp is writing the formal specifications and validating them. Once they are prepared and validated, they will be circulated. <h1 id=d3 n=3>Outline of proposal for TEI P2 <h2>3.1. Justification for distinct tagsets <p>The two mechanisms are sufficiently different that they require different tagsets. An <tag>alignment</tag> as described in <cit>TEI P1</cit> consists of one or more <tag>al.map</tag>s, which in turn must consist of at least two <tag>al.ptr</tag>s or other pointing elements (<tag>al.list</tag> and <tag>al.range</tag>). The two or more pointers that are grouped together within an <tag>al.map</tag> may be said to <emph>correspond</emph> to one another. On the other hand, a <tag>timeline</tag> contains a set of <tag>point</tag>s, which as described in <cit>TEI AI2 W1</cit> are unstructured, but which can be thought of as consisting of zero or more pointing elements, which are aligned with those points (and perhaps understood as synchronous with those points). The <tag>timeline</tag> moreover requires attributes which the <tag>alignment</tag> does not; similarly the <tag>point</tag>s require attributes which the <tag>al.map</tag>s do not. <h2>3.2. Restructuring of the tagset for analytic and textual correspondence <p>First, I suggest that certain tags in the tagset for textual and analytic correspondence (what <cit>TEI P1</cit> called <cited>explicit alignment</cited>) be renamed as follows. <ol> <li><tag>alignment</tag> becomes <tag>corresp.grp</tag>; <li><tag>al.map</tag> becomes <tag>corresp</tag>; <li><tag>al.ptr</tag> becomes <tag>xref</tag>;<fn>On the use of <tag>xref</tag> as a general purpose pointer, see <cit>Norway meeting report, posted on TEI-L, 24 November 1991</cit>. I assume here that <tag>xref</tag> is characterized as in <cit>TEI P1</cit>. However, its definition may be expected to change in ways which are not material here.</fn> <li><tag>al.list</tag> becomes <tag>xref.grp</tag>, and should consist of two or more <tag>xref</tag>s, not one or more <tag>al.ptr</tag>s, as in <cit>TEI P1</cit>. </ol> The names <cited>corresp.grp</cited> and <cited>corresp</cited> more accurately reflect the intended semantics of their respective elements than do <cited>alignment</cited> and <cited>al.map</cited>. <p>Second, I propose to eliminate <tag>al.range</tag>, as its function can be subsumed under <tag>xref</tag>, in virtue of the <att>target.end</att> attribute on <tag>xref</tag>. <h2>3.3. Restructuring of the tagset for synchronizing text <p>First, I suggest that certain tags in the tagset proposed in <cit>TEI AI2 W1, Spoken Texts</cit> for synchronizing text be renamed as follows. <ol> <li><tag>timeline</tag> becomes <tag>align</tag>; <li><tag>point</tag> becomes <tag>loc</tag>. </ol> The reason for giving <tag>timeline</tag> a more neutral name is that it can be used not only for alignment with time but also with any one-dimensional structure associated with a text, such as lineation and word position. The reason for giving <tag>point</tag> a new name is to dissociate it from the notion of a <term>pointer</term>; the name <cited>loc</cited> indifferently represents <gloss>locus</gloss> or <gloss>location</gloss>. <p>Second, I suggest that <tag>align</tag> should have certain attributes which specify whether it is a temporal or spatial (textual) alignment; what the origin is, if any; whether the <tag>loc</tag>s are understood to be a fixed distance apart; how the distance beween <tag>loc</tag>s is measured; whether the distance to a particular <tag>loc</tag> is being measured from the origin or from the immediately preceding <tag>loc</tag>; etc. Similarly <tag>loc</tag>s should have attributes which indicate their value in the dimension represented by <tag>align</tag> (at minimum, the <tag>loc</tag> identified as the origin should be so specified); their distance from the previous <tag>loc</tag> or the origin; etc. <p>Third, I suggest that the content model for an <tag>align</tag> be one or more <tag>loc</tag>s, like that of a <tag>corresp.grp</tag>, which consists of one or more <tag>corresp</tag>s. However, a <tag>loc</tag> should consist of zero or more <tag>xref</tag>s, in contrast to a <tag>corresp</tag>, which consists of two or more <tag>xref</tag>s or <tag>xref.grp</tag>s. The content model for <tag>loc</tag> need not include <tag>xref.grp</tag>s. <p>Finally, note that both <tag>corresp.grp</tag> and <tag>align</tag> permit pointing from the text to the map and from the map to the text. This bidirectionality is illustrated in the following illustration of the use of <tag>align</tag>. <h3 id=d331>3.3.1 Example of the use of a temporal <align> <p>The following example is adapted from <cit>TEI AI2 W1, section 8.5, Speaker overlap</cit>. <xmp> <![ CDATA [ <text> <u who=A><xref id=x1 target=p1>this <xref id=x2 target=p2>is <xref id=x3 target=p3>my <xref id=x4 target=p4>turn<xref id=x5 target=p5></u> <u who=B><xref id=x6 target=p2>balderdash<xref id=x7 target=p4></u> <u who=C><xref id=x8 target=p3>no <xref id=x9 target=p4>it's mine<xref id=x10 target=p5></u> <kinesic who=B id=k1 start=p4 end=p5 desc="waves arms"> </text> <align origin=p1 interval=1 measured.from=previous> <loc id=p1 value=0> <xref target=x1> </loc> <loc id=p2> <xref target=x2> <xref target=x6> </loc> <loc id=p3> <xref target=x3> <xref target=x8> </loc> <loc id=p4> <xref target=x4> <xref target=x7> <xref target=x9> <xref target=k1> </loc> <loc id=p5> <xref target=x5> <xref target=x10> <xref target=k1> </loc> </align> ]]> </xmp> <h1 n=4>Possible extensions of <align> to multidimensional alignment <p>Each <tag>align</tag> represents a one-dimensional structure (i.e., sequence) of <tag>loc</tag>s. It is easy to see how this concept can be extended to two-, three- and even higher-dimensional structures, to represent, for example the alignment of text on a page (two-dimensional structure) or a book (three-dimensional structure in which the third dimension is the page number). </body> </gdoc>