Text Encoding Initiative

4. Encoding the Body


As indicated above, a simple TEI document at the textual level consists of the following elements:

<front>
contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found before the start of a text proper.
<group>
contains a number of unitary texts or groups of texts.
<body>
contains the whole body of a single unitary text, excluding any front or back matter.
<back>
contains any appendixes, etc., following the main part of a text.

Elements specific to front and back matter are described below in section 19. Front and Back Matter. In this section we discuss the elements making up the body of a text.

4.1. Text Division Elements

The body of a prose text may be just a series of paragraphs, or these paragraphs may be grouped together into chapters, sections, subsections, etc. In the former case, each paragraph is tagged using the <p> tag. In the latter case, the <body> may be divided either into a series of <div1> elements, or into a series of <div> elements, either of which may be further subdivided, as discussed below:

<p>
marks paragraphs in prose.
<div>
contains a subdivision of the front, body, or back of a text.
<div1>
contains a first-level subdivision of the front, body, or back of a text (the largest, if <div0> is not used, the second largest if it is).

When structural subdivisions smaller than a <div1> are necessary, a <div1> may be divided into <div2> elements, a <div2> into smaller <div3> elements, etc., down to the level of <div7>. If more than seven levels of structural division are present, one must either modify the TEI tag set to accept <div8>, etc., or else use the unnumbered <div> element: a <div> may be subdivided by smaller <div> elements, without limit to the depth of nesting.

All these division elements take the following three attributes:

type
This indicates the conventional name for this category of text division. Its value will typically be ‘Book’, ‘Chapter’, ‘Poem’, etc. Other possible values include ‘Group’ for groups of poems, etc., treated as a single unit, ‘Sonnet’, ‘Speech’, and ‘Song’. Note that whatever value is supplied for the type attribute of the first <div>, <div1>, <div2>, etc., in a text is assumed to apply for all subsequent <div>, <div1>s (etc.) within the same <body>. This implies that a value must be given for the first division element of each type, or whenever the value changes.
id
This specifies a unique identifier for the division, which may be used for cross references or other links to it, such as a commentary, as further discussed in section 8. Cross References and Links. It is often useful to provide an id attribute for every major structural unit in a text, and to derive the ID values in some systematic way, for example by appending a section number to a short code for the title of the work in question, as in the examples below.
n
The n attribute specifies a mnemonic short name or number for the division, which can be used to identify it in preference to the value given for the id attribute. If a conventional form of reference or abbreviation for the parts of a work already exists (such as the book/chapter/verse pattern of Biblical citations), the n attribute is the place to record it.

The attributes id and n, indeed, are so widely useful that they are allowed on any element in any TEI DTD: they are global attributes. Other global attributes defined in the TEI Lite scheme are discussed in section 8.3. Linking Attributes.

The value of every id attribute must be unique within a document. One simple way of ensuring that this is so is to make it reflect the hierarchic structure of the document. For example, Smith's Wealth of Nations as first published consists of five books, each of which is divided into chapters, while some chapters are further subdivided into parts. We might define id values for this structure as follows:

<div1 id="WN1" n="I" type="book">
  <div2 id="WN101" n="I.1" type="chapter">
   ... </div2>
  <div2 id="WN102" n="I.2" type="chapter">
   ... </div2>
   ...
  <div2 id="WN110" n="I.10" type="chapter">
     <div3 id="WN1101" n="I.10.1" type="part">
      ... </div3>
     <div3 id="WN1102" n="I.10.2" type="part">
      ... </div3>
  </div2>
  ...
 </div1>
 <div1 id="WN2" n="II" type="book">
   ....
 </div1>
...

A different numbering scheme may be used for id and n attributes: this is often useful where a canonical reference scheme is used which does not tally with the structure of the work. For example, in a novel divided into books each containing chapters, where the chapters are numbered sequentially through the whole work, rather than within each book, one might use a scheme such as the following:

<div1 id="TS01" n="1" type="Volume">
   <div2 id="TS011" n="1" type="Chapter">
      ... </div2>
   <div2 id="TS012" n="2">
      ...</div2>
 </div1>
 <div1 id="TS02" n="2" type="Volume">
   <div2 id="TS021" n="3"type="Chapter">
      ...</div2>
   <div2 id="TS022" n="4">
      ...</div2>
</div1>
Here the work has two volumes, each containing two chapters. The chapters are numbered conventionally 1 to 4, but the id values specified allow them to be regarded additionally as if they were numbered 1.1, 1.2, 2.1, 2.2.

4.2. Headings and Closings

Every <div>, <div1>, <div2>, etc., may have a title or heading at its start, and (less commonly) a closing such as ‘End of Chapter 1’. The following elements may be used to transcribe them:

<head>
contains any heading, for example, the title of a section, or the heading of a list or glossary.
<trailer>
contains a closing title or footer appearing at the end of a division of a text.

Some other elements which may be necessary at the beginning or ending of text divisions are discussed below in section 19.1.2. Prefatory Matter .

Whether or not headings and trailers are included in a transcription is a matter for the individual transcriber to decide. Where a heading is completely regular (for example ‘Chapter 1’) or has been given as an attribute value (e.g. <div1 type="Chapter" n="1">), it may be omitted; where it contains otherwise unrecoverable text it should always be included. For example, the start of Hardy's Under the Greenwood Tree might be encoded as follows:

<div1 id="UGT1" n="Winter" type="Part">
<div2 id="UGT11" n="1" type="Chapter">
<head>Mellstock-Lane</head>
<p>To dwellers in a wood almost every species of tree ...

4.3. Prose, Verse and Drama

As noted above, the paragraphs making up a textual division should be tagged with the <p> tag. For example:

<body>
<p>I fully appreciate Gen. Pope's splendid achievements
with their invaluable results; but you must know that
Major Generalships in the Regular Army, are not as
plenty as blackberries.
</p>
</body>

A number of different tags are provided for the encoding of the structural components of verse and performance texts (drama, film, etc.):

<l>
contains a single, possibly incomplete, line of verse. Attributes include:

part
specifies whether or not the line is metrically complete. Legal values are: F for the final part of an incomplete line, Y if the line is metrically incomplete, N if the line is complete, or if no claim is made as to its completeness, I for the initial part of an incomplete line, M for a medial part of an incomplete line.

<lg>
contains a group of verse lines functioning as a formal unit e.g. a stanza, refrain, verse paragraph, etc.
<sp>
contains an individual speech in a performance text, or a passage presented as such in a prose or verse text. Attributes include:

who
identifies the speaker of the part by supplying an ID.

<speaker>
contains a special form of heading or label, giving the name of one or more speakers in a performance text or fragment.
<stage>
contains any kind of stage direction within a performance text or fragment. Attributes include:

type
indicates the kind of stage direction. Suggested values include entrance, exit, setting, delivery, etc.

Here, for example, is the start of a poetic text in which verse lines and stanzas are tagged:

<lg n="I">
<l>I Sing the progresse of a
   deathlesse soule,</l>
<l>Whom Fate, with God made,
  but doth not controule,</l>
<l>Plac'd in most shapes; all times
  before the law</l>
<l>Yoak'd us, and when, and since,
  in this I sing.</l>
<l>And the great world to his aged evening;</l>
<l>From infant morne, through manly noone I draw.</l>
<l>What the gold Chaldee, of silver Persian saw,</l>
<l>Greeke brass, or Roman iron, is in this one;</l>
<l>A worke t'out weare Seths pillars, bricke and stone,</l>
<l>And (holy writs excepted) made to yeeld to none,</l>
</lg>

Note that the <l> element marks verse lines, not typographic lines: the original lineation of the first few lines above has not therefore been made explicit by this encoding, and may be lost. The <lb> element described in section 5. Page and Line Numbers may be used to mark typographic lines if so desired.

Sometimes, particularly in dramatic texts, verse lines are split between speakers. The easiest way of encoding this is to use the part attribute to indicate that the lines so fragmented are incomplete, as in this example:

<div1 type ="Act" n="I"><head>ACT I</head>
<div2 type ="Scene" n="1"><head>SCENE I</head>
<stage rend="italic">
Enter Barnardo and Francisco, two Sentinels, at several doors</stage>
<sp><speaker>Barn</speaker><l part="Y">Who's there?</l></sp>
<sp><speaker>Fran</speaker><l>Nay, answer me. Stand and unfold 
  yourself.</l></sp>
<sp><speaker>Barn</speaker><l part="i">Long live the King!</l></sp>
<sp><speaker>Fran</speaker><l part="m">Barnardo?</l></sp>
<sp><speaker>Barn</speaker><l part="f">He.</l></sp>
<sp><speaker>Fran</speaker><l>You come most carefully upon 
  your hour.</l></sp>

The same mechanism may be applied to stanzas which are divided between two speakers:

<sp><speaker>First voice</speaker>
<lg type="stanza" part="I">
<l>But why drives on that ship so fast</l>
<l>Withouten wave or wind?</l>
</lg>
<sp><speaker>Second Voice</speaker>
<lg part="F">
<l>The air is cut away before.</l>
<l>And closes from behind.</l>
</lg>

This example shows how dialogue presented in a prose work as if it were drama should be encoded. It also demonstrates the use of the who attribute to bear a code identifying the speaker of the piece of dialogue concerned:

<sp who="OPI"><speaker>The reverend Doctor Opimiam</speaker>
 <p>I do not think I have named a single unpresentable fish.</p>
</sp>
<sp who="GRM"><speaker>Mr Gryll</speaker>
 <p>Bream, Doctor: there is not much to be said for bream.</p>
</sp>
<sp who="OPI"><speaker>The Reverend Doctor Opimiam</speaker>
 <p>On the contrary, sir, I think there is much to be said for him.
 In the first place....</p>
<p>Fish, Miss Gryll -- I could discourse to you on fish by
the hour:  but for the present I will forbear.</p>
</sp>

Up: Contents Previous: 3. The Structure of a TEI Text Next: 5. Page and Line Numbers



Date: (revised October 2004) Author: Lou Burnard (revised SPQR).
Copyright TEI 1995