Metadata Matters
Using TEI Headers to Document a Text

Metadata requirements: the scope
Identification of the object
Documentation of its structure and organization
Statement of rights (reproduction, ownership etc.)
Statement of intended usage
Documentation of interpretive scheme/s applied
Brief characterization for search engines

The TEI header
Based on AACR2 practice, the header contains:
mandatory  file description
optional encoding, profile and revision descriptions
The TEI envisages only two “levels” of header
The corpus header
The text header

TEI Header structure

For example

The file description
Mandatory
Supplies full description of the electronic file itself, and its source/s
Must specify at least a title, a publication statement, and a source
Use of authority control is advisable but not required

The File Description

The title statement
mandatory <title> [245]
identifies the electronic file, not its source
optionally followed by statements of responsibility, as appropriate, using    <author>,  <editor>, <respStmt> [720], <sponsor>, <funder>, <principal> [536]

A real title statement

The publication statement

Edition  and extent statements
As elsewhere, distinguishing a new “edition” from just another version may be difficult…
Extent should be expressed in some platform independent way, e.g. words, Kb.

Sample publication statement

The source description
may contain common TEI bibliographic elements <bibl>, <biblStruct>,
 or a nested file description <biblFull>
or a list <listBibl>
or a prose description
specialised elements for transcribed speech
Or (for the “born-digital” document) simply the text “Original”

Sample source description

Typical BNC source statements

The  Encoding Description
documents what was done in creating the electronic form of a text

Editorial Declarations
document policies for some key decisions:
<correction>, <normalization>, <quotation>, <hyphenation>, <segmentation>, <interpretation>
declarable (can vary)
attributes are suggested for some key aspects; can just contain prose description

Example encoding description

The Profile Description

Sample profile description

Language
global lang attribute
references a <language> element
defines both natural language and writing system

BNC Participant Description

Setting Descriptions

Text classification
Texts may be classified by one or more
topic-descriptive keywords: <keyWords>
(public) classification codes: <textClass>
or by (private) classification or typology: <catRef>
Classification schemes may be
 explicitly defined in the <classDecl> element
implied by reference in source attribute
Again, authority control is not mandatory (but recommended!)

Using a classification-1

Using a classification-2

The revision description

Sample revision description

Issues in using the Header
originally intended for non-specialist use
application profiles for particular uses
independent headers
up/down translation

Crosswalks

Harmonization of practice
TEI Header is widely deployed, but in varying dialects
Several attempts have been made to define standard “codes of practice” for digital libraries using it (e.g. by LC, AHDS)
It is likely to remain the best “source of information” for cataloguing use, whether or not it is also the best means of expression.

Creating and managing headers
As an independent XML document
Within a database
special case of common problem
OTA experience
all data in XML headers
harvestable by Z39.50 client
OLAC compatibility is a SMOP