TEI Lite: a 90% solution
Lou Burnard

TEI Lite: a view of the TEI
what do most people want, most of the time?
realistic for existing texts, e.g. OTA, UVA, HTI...
realistic for  document production, e.g. TEI technical documentation
see http://www.tei-c.org/Lite/
Tutorial now available in French, Italian, Russian, Japanese, and Korean

Basic structure(s)
A TEI-conformant document comprises a header  followed by a text
the header is essential for:
bibliographic control and identification
resource documentation and processing (see later)

Basic TEI structure...

Structure of a TEI text
A text may be  unitary  or composite
a unitary text contains
front matter
back matter
a  body
in a composite text, the body is a  group  of texts (or nested groups)

Basic TEI structure...

A text usually has  divisions
generic, hierarchic subdivisions
vanilla or numbered
type attribute
associated head and trailer elements from the divtop class

for example...

Use of global attributes
Applicable to all elements
id for unique identification
n for  (non-unique) name or number
rend for rendition (appearance)
lang for language and hence writing-system
Extensible, like other classes

Character Encoding Recommendations
non normative
extend, using standard entity sets or transliteration
document transliteration scheme with formal Writing System Declaration

Text components in TEI Lite
What are divisions made of?
prose is mostly paragraphs (<p>)
verse is mostly lines (<l>), sometimes in    hierarchic groups (<lg>)
drama is mostly speeches (<sp>) containing <p> or <l> and interspersed with stage directions (<stage>)
These may be mixed, and may also appear directly within undivided texts.

Verse: an example

Drama: an example

Texts are not just words...
… but probably only people know that
an encoding may claim to capture
just visual salience,
just its assumed causes
both
encoding makes explicit one (or more) sets of interpretations

For example...
    And this Indenture further witnesseth  that the said Walter Shandy, merchant, in consideration of the said intended marriage...

"And this Indenture further witnesseth"
    And this Indenture further witnesseth  that the said Walter Shandy, merchant, in consideration of the said intended marriage...

Who does the work?
TEI scheme allows for close reading -- and the reverse
can tag very detailed features of discourse function
can normalise or simplify (e.g. dates numbers, names)
… or leave well alone

Phrase level elements
are often by convention typographically distinct
“data-like” (names, numbers, dates, times, addresses)
editorial intervention (corrections, regularizations, additions, omissions ...)
cross references and links (see later)

for example...

Direct speech
Use the who attribute to show speakers
Speeches can be nested in other speeches

“Foreign” language phrases
The lang attribute may be attached to any element
Use <foreign> if nothing else is available
Define each language in <langUsage> in header

Referring strings
The <rs> (referring string) element is used for any kind of name or reference

Dates, times, numbers
attributes can be used to quantify <date> and <dateRange> expressions
similarly, times <time>, <timeRange> and numbers <num>

Correction and Regularization
 <corr> and <sic> for correction (or non-correction)
<reg> and <orig> for regularization (or the reverse)

Omissions, Deletions, Additions
<gap> omission by transcriber
<del> cancellation in source or by editor
<add> or  <supplied> insertion in source or by editor
<unclear> material uncertain because illegible
<damage> physical damage to text carrier

The multiple hierarchy problem
XML allows only one hierarchy at a time
Is a document
chapter-paragraph-phrase
gathering-page-leaf
or both?
discontinuous segments
links and milestones

Boundary markers
page, column, and line breaks (<pb>, <cb>, <lb>)
generic <mileStone>

Some chunks are also phrases
<list> lists of all kinds
<note> notes (authorial or editorial)
<figure> pictures or figures
<formula> formulae
<table> tables
<bibl> bibliographic descriptions

Lists
use <list> for lists of any kind (use type attribute to distinguish)
use <label> in two-column lists as alternative to n attribute
may be nested as necessary

for example...

Figures and graphics
The presence of a graphic is indicated by the <figure> element
The title of the graphic is tagged as a <head>
A description of the graphic may be supplied (as a <figDesc>) for use by software unable to render the graphic
The graphic itself is specified as an external entity

  for example...

Tables
a <table> element contains <row>s of <cell>s
spanning is indicated by rows and cols attributes
role attribute indicates whether row or column holds data or a label
embedded tables are permitted

for example...

Bibliography
Use simple <bibl> with optional subcomponents:
<respStmt> (for any kind of responsibility) or <author>, <editor>, etc.
<title> with optional level attribute
<imprint> groups publication details
<biblScope> adds page references etc.
Use <listBibl> for list of references

for example...

Notes
Use <note> for notes of any kind (editorial or authorial)
if in-line, use place attribute to specify location
if out of line, either
use target attribute to specify attachment point
or mark attachment point as a <ref>

for example...

Out of line bibliographic notes
<p>As Blenkinsop <ref target=“N32”> 32</ref> remarks …
<!--  or equivalently  -->
<p>As Blenkinsop <ptr target=“N32”/> remarks …

Links and pointers
cross-referencing
association of text and annotation
association of image and text or audio and transcript
alignment of text and translation...

Terminology
A pointer points from here (where it is) to there (somewhere else)
A ref does the same, but has some content
A link points to two or more places and asserts some (linking) relation between them. Its own location is not significant
An anchor exists only to be pointed at

Cross References
Use <ptr> (empty element) or <ref> (with content)
use target to specify an identifier (ID value)

HTML-style pointers in TEI
a URL identifies an external entity
it is pointed to by an xref or xptr

TEI X-pointers
Allow you to point outside the current document
 “location ladder” technique
 15 different location methods

TEI X- pointers
target  specified by location ladder (within external entity named by doc attribute)
The most reliable location methods are tree based

... and links
freestanding links can associate anything that has an ID, including x-pointers
can also be grouped and typed

Slide 47

A three way alignment

Front and back matter
contain generic divisions and <titlepage>
which contains arbitrary mixture of <titlePart>s and
stated author:<docAuthor> and <byLine>
original title and edition: <docTitle>, <docEdition>,
original imprint and date: <docImprint> <docDate>

Example titlepage elements

Not covered here...
specialised front and back matter
analytic tagging
segmentation
interpretations
the header
tags for documentation

Summary
How TEI Lite handles…
Structural divisions
Rendition vs. interpretation
Phrases, chunks, and chunky phrases
Pointers and links
Any dtd dealing with ordinary text will need a similar range