TEI Lite: a 90% solution
TEI Lite: a view of the TEI
|
|
|
|
what do most people want, most of the
time? |
|
realistic for existing texts, e.g. OTA,
UVA, HTI... |
|
realistic for document production, e.g. TEI technical documentation |
|
see http://www.tei-c.org/Lite/ |
|
Tutorial now available in French,
Italian, Russian, Japanese, and Korean |
Basic structure(s)
|
|
|
|
A TEI-conformant document comprises a header
followed by a text |
|
the header is essential for: |
|
bibliographic control and
identification |
|
resource documentation and processing
(see later) |
Basic TEI structure...
Structure of a TEI text
|
|
|
|
A text may be unitary or composite |
|
a unitary text contains |
|
front matter |
|
back matter |
|
a
body |
|
in a composite text, the body is a group
of texts (or nested groups) |
Basic TEI structure...
A text usually has divisions
|
|
|
generic, hierarchic subdivisions |
|
vanilla or numbered |
|
type attribute |
|
associated head and trailer elements
from the divtop class |
for example...
Use of global attributes
|
|
|
|
Applicable to all elements |
|
id for unique identification |
|
n for
(non-unique) name or number |
|
rend for rendition (appearance) |
|
lang for language and hence
writing-system |
|
Extensible, like other classes |
Character Encoding
Recommendations
|
|
|
non normative |
|
extend, using standard entity sets or
transliteration |
|
document transliteration scheme with
formal Writing System Declaration |
Text components in TEI Lite
|
|
|
|
|
What are divisions made of? |
|
prose is mostly paragraphs (<p>) |
|
verse is mostly lines (<l>),
sometimes in hierarchic groups (<lg>) |
|
drama is mostly speeches (<sp>)
containing <p> or <l> and interspersed with stage directions
(<stage>) |
|
These may be mixed, and may also appear
directly within undivided texts. |
Verse: an example
Drama: an example
Texts are not just words...
|
|
|
|
… but probably only people know that |
|
an encoding may claim to capture |
|
just visual salience, |
|
just its assumed causes |
|
both |
|
encoding makes explicit one (or more)
sets of interpretations |
For example...
|
|
|
And this Indenture further witnesseth
that the said Walter Shandy, merchant, in consideration of the said
intended marriage... |
"And this Indenture
further witnesseth"
|
|
|
And this Indenture further witnesseth
that the said Walter Shandy, merchant, in consideration of the said
intended marriage... |
Who does the work?
|
|
|
TEI scheme allows for close reading --
and the reverse |
|
can tag very detailed features of
discourse function |
|
can normalise or simplify (e.g. dates
numbers, names) |
|
… or leave well alone |
Phrase level elements
|
|
|
are often by convention typographically
distinct |
|
“data-like” (names, numbers, dates,
times, addresses) |
|
editorial intervention (corrections,
regularizations, additions, omissions ...) |
|
cross references and links (see later) |
|
|
for example...
Direct speech
|
|
|
Use the who attribute to show speakers |
|
Speeches can be nested in other
speeches |
“Foreign” language phrases
|
|
|
|
The lang attribute may be attached to
any element |
|
Use <foreign> if nothing else is
available |
|
Define each language in
<langUsage> in header |
Referring strings
|
|
|
The <rs> (referring string)
element is used for any kind of name or reference |
Dates, times, numbers
|
|
|
attributes can be used to quantify
<date> and <dateRange> expressions |
|
similarly, times <time>,
<timeRange> and numbers <num> |
Correction and
Regularization
|
|
|
<corr> and <sic> for correction (or non-correction) |
|
<reg> and <orig> for
regularization (or the reverse) |
Omissions, Deletions,
Additions
|
|
|
|
<gap> omission by transcriber |
|
<del> cancellation in source or
by editor |
|
<add> or <supplied> insertion in source or by
editor |
|
<unclear> material uncertain
because illegible |
|
<damage> physical damage to text
carrier |
The multiple hierarchy
problem
|
|
|
|
XML allows only one hierarchy at a time |
|
Is a document |
|
chapter-paragraph-phrase |
|
gathering-page-leaf |
|
or both? |
|
discontinuous segments |
|
links and milestones |
Boundary markers
|
|
|
page, column, and line breaks (<pb>,
<cb>, <lb>) |
|
generic <mileStone> |
Some chunks are also phrases
|
|
|
<list> lists of all kinds |
|
<note> notes (authorial or
editorial) |
|
<figure> pictures or figures |
|
<formula> formulae |
|
<table> tables |
|
<bibl> bibliographic descriptions |
Lists
|
|
|
use <list> for lists of any kind
(use type attribute to distinguish) |
|
use <label> in two-column lists
as alternative to n attribute |
|
may be nested as necessary |
for example...
Figures and graphics
|
|
|
|
The presence of a graphic is indicated
by the <figure> element |
|
The title of the graphic is tagged as a
<head> |
|
A description of the graphic may be
supplied (as a <figDesc>) for use by software unable to render the
graphic |
|
The graphic itself is specified as an
external entity |
for example...
Tables
|
|
|
a <table> element contains <row>s
of <cell>s |
|
spanning is indicated by rows and cols
attributes |
|
role attribute indicates whether row or
column holds data or a label |
|
embedded tables are permitted |
for example...
Bibliography
|
|
|
|
Use simple <bibl> with optional
subcomponents: |
|
<respStmt> (for any kind of
responsibility) or <author>, <editor>, etc. |
|
<title> with optional level
attribute |
|
<imprint> groups publication
details |
|
<biblScope> adds page references
etc. |
|
Use <listBibl> for list of
references |
for example...
Notes
|
|
|
|
Use <note> for notes of any kind
(editorial or authorial) |
|
if in-line, use place attribute to
specify location |
|
if out of line, either |
|
use target attribute to specify
attachment point |
|
or mark attachment point as a
<ref> |
for example...
Out of line bibliographic
notes
|
|
|
<p>As Blenkinsop <ref target=“N32”>
32</ref> remarks … |
|
<!-- or equivalently --> |
|
<p>As Blenkinsop <ptr target=“N32”/>
remarks … |
Links and pointers
|
|
|
cross-referencing |
|
association of text and annotation |
|
association of image and text or audio
and transcript |
|
alignment of text and translation... |
Terminology
|
|
|
|
A pointer points from here (where it
is) to there (somewhere else) |
|
A ref does the same, but has some
content |
|
A link points to two or more places and
asserts some (linking) relation between them. Its own location is not
significant |
|
An anchor exists only to be pointed at |
Cross References
|
|
|
Use <ptr> (empty element) or
<ref> (with content) |
|
use target to specify an identifier (ID
value) |
HTML-style pointers in TEI
|
|
|
a URL identifies an external entity |
|
it is pointed to by an xref or xptr |
TEI X-pointers
|
|
|
Allow you to point outside the current
document |
|
“location ladder” technique |
|
15 different location methods |
TEI X- pointers
|
|
|
|
target
specified by location ladder (within external entity named by doc
attribute) |
|
The most reliable location methods are
tree based |
... and links
|
|
|
freestanding links can associate
anything that has an ID, including x-pointers |
|
can also be grouped and typed |
Slide 47
A three way alignment
Front and back matter
|
|
|
|
contain generic divisions and
<titlepage> |
|
which contains arbitrary mixture of
<titlePart>s and |
|
stated author:<docAuthor> and
<byLine> |
|
original title and edition:
<docTitle>, <docEdition>, |
|
original imprint and date:
<docImprint> <docDate> |
Example titlepage elements
Not covered here...
|
|
|
|
specialised front and back matter |
|
analytic tagging |
|
segmentation |
|
interpretations |
|
the header |
|
tags for documentation |
Summary
|
|
|
|
How TEI Lite handles… |
|
Structural divisions |
|
Rendition vs. interpretation |
|
Phrases, chunks, and chunky phrases |
|
Pointers and links |
|
Any dtd dealing with ordinary text will
need a similar range |