Text Encoding Initiative

20. The Electronic Title Page


Every TEI text has a header which provides information analogous to that provided by the title page of printed text. The header is introduced by the element <teiHeader> and has four major parts:

<fileDesc>
contains a full bibliographic description of an electronic file.
<encodingDesc>
documents the relationship between an electronic text and the source or sources from which it was derived.
<profileDesc>
provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
<revisionDesc>
summarizes the revision history for a file.

A corpus or collection of texts, which share many characteristics, may have one header for the corpus and individual headers for each component of the corpus. In this case the type attribute indicates the type of header.

     <teiHeader type="corpus">
introduces the header for corpus-level information.

Some of the header elements contain running prose which consists of one or more <p>s. Others are grouped:

20.1. The File Description

The <fileDesc> element is mandatory. It contains a full bibliographic description of the file with the following elements:

<titleStmt>
groups information about the title of a work and those responsible for its intellectual content.
<editionStmt>
groups information relating to one edition of a text.
<extent>
describes the approximate size of the electronic text as stored on some carrier medium, specified in any convenient units.
<publicationStmt>
groups information concerning the publication or distribution of an electronic or other text.
<seriesStmt>
groups information about the series, if any, to which a publication belongs.
<notesStmt>
collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.
<sourceDesc>
supplies a bibliographic description of the copy text(s) from which an electronic text was derived or generated.

A minimal header has the following structure:

<teiHeader>
     <fileDesc>
          <titleStmt> ... </titleStmt>
          <publicationStmt> ... <publicationStmt>
          <sourceDesc> ... <sourceDesc>
     </fileDesc>
</teiHeader>

20.1.1. The Title Statement

The following elements can be used in the <titleStmt>:

<title>
contains the title of a work, whether article, book, journal, or series, including any alternative titles or subtitles.
<author>
in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item.
<sponsor>
specifies the name of a sponsoring organization or institution.
<funder>
specifies the name of an individual, institution, or organization responsible for the funding of a project or text.
<principal>
supplies the name of the principal researcher responsible for the creation of an electronic text.
<respStmt>
supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc., do not suffice or do not apply.

It is recommended that the title should distinguish the computer file from the source text, for example:

[title of source]: a machine readable transcription
[title of source]: electronic edition
A machine readable version of: [title of source]
The <respStmt> element contains the following subcomponents:

<resp>
contains a phrase describing the nature of a person's intellectual responsibility.
<name>
contains a proper noun or noun phrase.

Example:
<titleStmt>
     <title>Two stories by Edgar Allen Poe: a machine readable
               transcription</title>
     <author>Poe, Edgar Allen (1809-1849)
     <respStmt><resp>compiled by</resp>
     <name>James D. Benson</name></respStmt>
</titleStmt>

20.1.2. The Edition Statement

The <editionStmt> groups information relating to one edition of a text (where edition is used as elsewhere in bibliography), and may include the following elements:

<edition>
describes the particularities of one edition of a text.
<respStmt>
supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc., do not suffice or do not apply.

Example:

<editionStmt>
     <edition n="U2">Third draft, substantially revised
     <date>1987</date>
     </edition>
</editionStmt>

Determining exactly what constitutes a new edition of an electronic text is left to the encoder.

20.1.3. The Extent Statement

The <extent> statement describe the approximate size of a file.

Example:

<extent>4532 bytes</extent>

20.1.4. The Publication Statement

The <publicationStmt> is mandatory. It may contain a simple prose description or groups of the elements described below:

<publisher>
provides the name of the organization responsible for the publication or distribution of a bibliographic item.
<distributor>
supplies the name of a person or other agency responsible for the distribution of a text.
<authority>
supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor.

At least one of these three elements must be present, unless the entire publication statement is in prose. The following elements may occur within them:

<pubPlace>
contains the name of the place where a bibliographic item was published.
<address>
contains a postal or other address, for example of a publisher, an organization, or an individual.
<idno>
supplies any standard or non-standard number used to identify a bibliographic item. Attributes include:

type
categorizes the number, for example as an ISBN or other standard series.

<availability>
supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc. Attributes include:

status
supplies a code identifying the current availability of the text. Sample values include restricted, unknown, and free.

<date>
contains a date in any format.

Example:

<publicationStmt>
     <publisher>Oxford University Press</publisher>
     <pubPlace>Oxford</pubPlace> <date>1989</date>
     <idno type="ISBN"> 0-19-254705-5</idno>
     <availability>Copyright 1989, Oxford University
          Press</availability>
</publicationStmt>

20.1.5. Series and Notes Statements

The <seriesStmt> groups information about the series, if any, to which a publication belongs. It may contain <title>, <idno>, or <respStmt> elements.

The <notesStmt>, if used, contains one or more <note> elements which contain a note or annotation. Some information found in the notes area in conventional bibliography has been assigned specific elements in the TEI scheme.

20.1.6. The Source Description

The <sourceDesc> is a mandatory element which records details of the source or sources from which the computer file is derived. It may contain simple prose or a bibliographic citation, using one or more of the following elements:

<bibl>
contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
<biblFull>
contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.
<listBibl>
contains a list of bibliographic citations of any kind.

Examples:

<sourceDesc>
     <bibl>The first folio of Shakespeare, prepared by Charlton
          Hinman (The Norton Facsimile, 1968)</bibl>
</sourceDesc>
<sourceDesc>
     <scriptStmt id="CNN12">
     <bibl><author>CNN Network News
          <title>News headlines
          <date>12 Jun 1989
     </bibl>
     </scriptStmt>
</sourceDesc>

20.2. The Encoding Description

The <encodingDesc> element specifies the methods and editorial principles which governed the transcription of the text. Its use is highly recommended. It may be prose description or may contain elements from the following list:

<projectDesc>
describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
<samplingDecl>
contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
<editorialDecl>
provides details of editorial principles and practices applied during the encoding of a text.
<tagsDecl>
provides detailed information about the tagging applied to an SGML document.
<refsDecl>
specifies how canonical references are constructed for this text.
<classDecl>
contains one or more taxonomies defining any classificatory codes used elsewhere in the text.

20.2.1. Project and Sampling Descriptions

Examples of <projectDesc> and <samplingDesc>:

<encodingDesc>
     <projectDesc>Texts collected for use in the Claremont
          Shakespeare Clinic, June 1990.
     </projectDesc>
</encodingDesc>
<encodingDesc>
     <samplingDecl>Samples of 2000 words taken from the beginning
          of the text
     </samplingDecl>
</encodingDesc>

20.2.2. Editorial Declarations

The <editorialDecl> contains a prose description of the practices used when encoding the text. Typically this description should cover such topics as the following, each of which may conveniently be given as a separate paragraph.

correction
how and under what circumstances corrections have been made in the text.
normalization
the extent to which the original source has been regularized or normalized.
quotation
what has been done with quotation marks in the original -- have they been retained or replaced by entity references, are opening and closing quotes distinguished, etc.
hyphenation
what has been done with hyphens (especially end-of-line hyphens) in the original -- have they been retained, replaced by entity references, etc.
segmentation
how has the text has been segmented, for example into sentences, tone-units, graphemic strata, etc.
interpretation
what analytic or interpretive information has been added to the text.

Example:

<editorialDecl>
          <p>The part of speech analysis applied throughout
               section 4 was added by hand and has not been
               checked.
          <p>Errors in transcription controlled by using the
               WordPerfect spelling checker.
          <p>All words converted to Modern American spelling
               using Webster's 9th Collegiate dictionary.
          <p>All quotation marks converted to entity
               references &odq; and &cdq;.
</editorialDecl>

20.2.3. Tagging, Reference, and Classification Declarations

The <tagsDecl> element is used to provide detailed information about the SGML tags actually appearing within a text. It may contain a simple list of elements used, with a count for each, using the following special purpose elements:

<tagUsage>
supplies information about the usage of a specific element within the outermost <text> of a TEI conformant document. Attributes include:

gi
the name (generic identifier) of the element indicated by the tag.
occurs
specifies the number of occurrences of this element within the text.

The <rendition> element is used to document different ways in which elements are rendered in the source text.

<rendition>
supplies information about the intended rendition of one or more elements.
<tagUsage>
supplies information about the usage of a specific element within a <text>. Attributes include:

occurs
specifies the number of occurrences of this element within the text.
ident
specifies the number of occurrences of this element within the text which bear a distinct value for the global id attribute.
render
specifies the identifier of a <rendition> element which defines how this element is to be rendered.

For example:

<tagsDecl>
 <tagUsage gi="text" occurs=1>
 <tagUsage gi="body" occurs=1>
 <tagUsage gi=p occurs="12">
 <tagUsage gi="hi" occurs=6>
</tagsDecl>
This (imaginary) tags declaration would be appropriate for a text containing twelve paragraphs in its body, within which six <hi> elements have been marked. Note that if the <tagsDecl> element is used, it must contain a <tagUsage> element for every element tagged in the associated text element.

The <refsDecl> element is used to document the way in which any standard referencing scheme built into the encoding works. In its simplest form, it consists of prose description.

Example:

<refsDecl>
     <p>The N attribute on each DIV1 and DIV2 contains the
     canonical reference for each such division in the form
     XX.yyy where XX is the book number in roman numeral and
     yyy is the section number in arabic.
</refsDecl>

The <classDecl> element groups together definitions or sources for any descriptive classification schemes used by other parts of the header. At least one such scheme must be provided, encoded using the following elements:

<taxonomy>
defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.
<bibl>
contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
<category>
contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy.
<catDesc>
describes some category within a taxonomy or text typology, in the form of a brief prose description.

In the simplest case, the taxonomy may be defined by a bibliographic reference, as in the following example:

<classDecl>
     <taxonomy id="LCSH">
          <bibl>Library of Congress Subject Headings
          </bibl>
     </taxonomy>
</classDecl>

Alternatively, or in addition, the encoder may define a special purpose classification scheme, as in the following example:

<taxonomy id=B>
   <bibl>Brown Corpus</bibl>
   <category id="B.A"><catDesc>Press Reportage
      <category id="B.A1"><catDesc>Daily</category>
      <category id="B.A2"><catDesc>Sunday</category>
      <category id="B.A3"><catDesc>National</category>
      <category id="B.A4"><catDesc>Provincial</category>
      <category id="B.A5"><catDesc>Political</category>
      <category id="B.A6"><catDesc>Sports</category>
     ...
   </category>
   <category id="B.D"><catDesc>Religion
      <category id="B.D1"><catDesc>Books</category>
      <category id="B.D2"><catDesc>Periodicals and tracts</category>
   </category>
  ...
</taxonomy>

Linkage between a particular text and a category within such a taxonomy is made by means of the <catRef> element within the <textClass> element, as further described below.

20.3. The Profile Description

The <profileDesc> element enables information characterizing various descriptive aspects of a text to be recorded within a single framework. It has three optional components:

<creation>
contains information about the creation of a text.
<langUsage>
describes the languages, sublanguages, registers, dialects, etc., represented within a text.
<textClass>
groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.

Examples:

<creation>
     <date value="1992-08">August 1992</date>
     <name type="place">Taos, New Mexico</name>
</creation>

The <textClass> element classifies a text by reference to the system or systems defined by the <classDecl> element, and contains one or more of the following elements:

<keywords>
contains a list of keywords or phrases identifying the topic or nature of a text. Attributes include:

scheme
identifies the controlled vocabulary within which the set of keywords concerned is defined.

<classCode>
contains the classification code used for this text in some standard classification system. Attributes include:

scheme
identifies the classification system or taxonomy in use.

<catRef>
specifies one or more defined categories within some taxonomy or text typology. Attributes include:

target
identifies the categories concerned

The element <keywords> contains a list of keywords or phrases identifying the topic or nature of a text. The attribute scheme links these to the classification system defined in <taxonomy>.

<textClass>
     <keywords scheme="LCSH">
          <list>
          <item>English literature -- History and criticism --
               Data processing.</item>
          <item>English literature -- History and criticism --
               Theory etc.</item>
          <item>English language -- Style -- Data
               processing.</item>
          </list>
     </keywords>
</textClass>

20.4. The Revision Description

The <revisionDesc> element provides a change log in which each change made to a text may be recorded. The log may be recorded as a sequence of <change> elements each of which contains

<date>
contains a date in any format.
<respStmt>
supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc., do not suffice or do not apply.
<item>
contains one component of a list.

Example:

<revisionDesc>
     <change><date>6/3/91:</date>
          <respStmt><name>EMB</name><resp>ed.</resp></respStmt>
          <item>File format updated</item></change>
     <change><date>5/25/90:</date>
          <respSmt><name>EMB</name><resp>ed.</resp>
          <item>Stuart's corrections entered</item></change>
</revisionDesc>

Up: Contents Previous: 19. Front and Back Matter



Date: (revised October 2004) Author: Lou Burnard (revised SPQR).
Copyright TEI 1995