![]() |
Text Encoding Initiative |
4. Encoding the Body |
As indicated above, a simple TEI document at the textual level consists of the following elements:
Elements specific to front and back matter are described below in section 19. Front and Back Matter. In this section we discuss the elements making up the body of a text.
The body of a prose text may be just a series of paragraphs, or these paragraphs may be grouped together into chapters, sections, subsections, etc. In the former case, each paragraph is tagged using the <p> tag. In the latter case, the <body> may be divided either into a series of <div1> elements, or into a series of <div> elements, either of which may be further subdivided, as discussed below:
When structural subdivisions smaller than a <div1> are necessary, a <div1> may be divided into <div2> elements, a <div2> into smaller <div3> elements, etc., down to the level of <div7>. If more than seven levels of structural division are present, one must either modify the TEI tag set to accept <div8>, etc., or else use the unnumbered <div> element: a <div> may be subdivided by smaller <div> elements, without limit to the depth of nesting.
All these division elements take the following three attributes:
The attributes id and n, indeed, are so widely useful that they are allowed on any element in any TEI DTD: they are global attributes. Other global attributes defined in the TEI Lite scheme are discussed in section 8.3. Linking Attributes.
The value of every id attribute must be unique within a document. One simple way of ensuring that this is so is to make it reflect the hierarchic structure of the document. For example, Smith's Wealth of Nations as first published consists of five books, each of which is divided into chapters, while some chapters are further subdivided into parts. We might define id values for this structure as follows:
<div1 id="WN1" n="I" type="book"> <div2 id="WN101" n="I.1" type="chapter"> ... </div2> <div2 id="WN102" n="I.2" type="chapter"> ... </div2> ... <div2 id="WN110" n="I.10" type="chapter"> <div3 id="WN1101" n="I.10.1" type="part"> ... </div3> <div3 id="WN1102" n="I.10.2" type="part"> ... </div3> </div2> ... </div1> <div1 id="WN2" n="II" type="book"> .... </div1> ...
A different numbering scheme may be used for id and n attributes: this is often useful where a canonical reference scheme is used which does not tally with the structure of the work. For example, in a novel divided into books each containing chapters, where the chapters are numbered sequentially through the whole work, rather than within each book, one might use a scheme such as the following:
<div1 id="TS01" n="1" type="Volume"> <div2 id="TS011" n="1" type="Chapter"> ... </div2> <div2 id="TS012" n="2"> ...</div2> </div1> <div1 id="TS02" n="2" type="Volume"> <div2 id="TS021" n="3"type="Chapter"> ...</div2> <div2 id="TS022" n="4"> ...</div2> </div1>Here the work has two volumes, each containing two chapters. The chapters are numbered conventionally 1 to 4, but the id values specified allow them to be regarded additionally as if they were numbered 1.1, 1.2, 2.1, 2.2.
Every <div>, <div1>, <div2>, etc., may have a title or heading at its start, and (less commonly) a closing such as ‘End of Chapter 1’. The following elements may be used to transcribe them:
Some other elements which may be necessary at the beginning or ending of text divisions are discussed below in section 19.1.2. Prefatory Matter .
Whether or not headings and trailers are included in a transcription is a matter for the individual transcriber to decide. Where a heading is completely regular (for example ‘Chapter 1’) or has been given as an attribute value (e.g. <div1 type="Chapter" n="1">), it may be omitted; where it contains otherwise unrecoverable text it should always be included. For example, the start of Hardy's Under the Greenwood Tree might be encoded as follows:
<div1 id="UGT1" n="Winter" type="Part"> <div2 id="UGT11" n="1" type="Chapter"> <head>Mellstock-Lane</head> <p>To dwellers in a wood almost every species of tree ...
As noted above, the paragraphs making up a textual division should be tagged with the <p> tag. For example:
<body> <p>I fully appreciate Gen. Pope's splendid achievements with their invaluable results; but you must know that Major Generalships in the Regular Army, are not as plenty as blackberries. </p> </body>
A number of different tags are provided for the encoding of the structural components of verse and performance texts (drama, film, etc.):
Here, for example, is the start of a poetic text in which verse lines and stanzas are tagged:
<lg n="I"> <l>I Sing the progresse of a deathlesse soule,</l> <l>Whom Fate, with God made, but doth not controule,</l> <l>Plac'd in most shapes; all times before the law</l> <l>Yoak'd us, and when, and since, in this I sing.</l> <l>And the great world to his aged evening;</l> <l>From infant morne, through manly noone I draw.</l> <l>What the gold Chaldee, of silver Persian saw,</l> <l>Greeke brass, or Roman iron, is in this one;</l> <l>A worke t'out weare Seths pillars, bricke and stone,</l> <l>And (holy writs excepted) made to yeeld to none,</l> </lg>
Note that the <l> element marks verse lines, not typographic lines: the original lineation of the first few lines above has not therefore been made explicit by this encoding, and may be lost. The <lb> element described in section 5. Page and Line Numbers may be used to mark typographic lines if so desired.
Sometimes, particularly in dramatic texts, verse lines are split between speakers. The easiest way of encoding this is to use the part attribute to indicate that the lines so fragmented are incomplete, as in this example:
<div1 type ="Act" n="I"><head>ACT I</head> <div2 type ="Scene" n="1"><head>SCENE I</head> <stage rend="italic"> Enter Barnardo and Francisco, two Sentinels, at several doors</stage> <sp><speaker>Barn</speaker><l part="Y">Who's there?</l></sp> <sp><speaker>Fran</speaker><l>Nay, answer me. Stand and unfold yourself.</l></sp> <sp><speaker>Barn</speaker><l part="i">Long live the King!</l></sp> <sp><speaker>Fran</speaker><l part="m">Barnardo?</l></sp> <sp><speaker>Barn</speaker><l part="f">He.</l></sp> <sp><speaker>Fran</speaker><l>You come most carefully upon your hour.</l></sp>
The same mechanism may be applied to stanzas which are divided between two speakers:
<sp><speaker>First voice</speaker> <lg type="stanza" part="I"> <l>But why drives on that ship so fast</l> <l>Withouten wave or wind?</l> </lg> <sp><speaker>Second Voice</speaker> <lg part="F"> <l>The air is cut away before.</l> <l>And closes from behind.</l> </lg>
This example shows how dialogue presented in a prose work as if it were drama should be encoded. It also demonstrates the use of the who attribute to bear a code identifying the speaker of the piece of dialogue concerned:
<sp who="OPI"><speaker>The reverend Doctor Opimiam</speaker> <p>I do not think I have named a single unpresentable fish.</p> </sp> <sp who="GRM"><speaker>Mr Gryll</speaker> <p>Bream, Doctor: there is not much to be said for bream.</p> </sp> <sp who="OPI"><speaker>The Reverend Doctor Opimiam</speaker> <p>On the contrary, sir, I think there is much to be said for him. In the first place....</p> <p>Fish, Miss Gryll -- I could discourse to you on fish by the hour: but for the present I will forbear.</p> </sp>
Up: Contents Previous: 3. The Structure of a TEI Text Next: 5. Page and Line Numbers