Anyone for pizza?
|
|
|
Designing a TEI document type
declaration |
The T E what?
|
|
|
|
Originally, a research project within
the humanities |
|
Sponsored by ALLC, ACH, ACL |
|
Funded 1990-1994 by US NEH, EU LE
Programme et al |
|
Major influences |
|
digital libraries and text collections |
|
language corpora |
|
scholarly datasets |
|
Now an international membership
consortium incorporated Jan 2001 |
Current TEI activity
|
|
|
|
Preliminary XML version of DTD now
available |
|
Text update in progress |
|
Workgroups under consideration |
|
character set issues |
|
manuscript description |
|
modelling |
|
lexica and termbanks |
|
physical description |
|
resource discovery |
|
Membership will vote by end 2001 |
|
|
Who uses TEI?
|
|
|
|
|
digital librarians and archivists |
|
HTI, UVA, CETH, OTA... |
|
Language Engineering projects |
|
EAGLES, BNC, MULTEX, ECI, Silfide |
|
academic researchers |
|
Women Writers Project, CURIA Project,
VWWP, Orlando, Model Editions Partnership, Canterbury Tales Project, Bodleian
Library... |
|
http://www.hcu.ox.ac.uk/TEI/Applications/ |
Goals of the TEI
|
|
|
better interchange and integration of
scholarly data |
|
support for all texts, in all
languages, from all periods |
|
guidance for the perplexed: what to encode |
|
assistance for the specialist: how to encode any information of interest |
|
Hence a loose framework into which
unpredictable extensions can be fitted |
Legacy of the TEI
|
|
|
|
a way of looking at what “text” really
is |
|
a codification of current scholarly
practice |
|
(crucially) a set of shared assumptions
and priorities about the digital agenda: |
|
focus on content and function (rather
than presentation) |
|
generic solutions (rather than
application-specific ones) |
TEI Deliverables
|
|
|
|
A set of recommendations for text
encoding, |
|
covering both generic text structures
and some highly specific areas based on (but not limited by) existing
practice |
|
A very large collection of element definitions |
|
combined into a very loose document
type declaration |
|
A mechanism for creating multiple views
(DTDs) of the foregoing |
|
One such view and associated tutorial:
TEI Lite; others exist (e.g. CES, BNC) |
The TEI modus operandi...
|
|
|
|
identify significant particularities |
|
independent of notation or realization |
|
avoid controversy, over-delicacy, inadequacy |
|
seek generalizable solutions, acceptable to a consensus |
... and some consequences
|
|
|
focus on content, not presentation |
|
descriptive, not prescriptive |
|
Occam's razor |
|
modular, extensible dtd |
Designing a dtd for the TEI
|
|
|
|
How can a single markup scheme handle a
large variety of requirements? |
|
all texts are alike |
|
every text is different |
|
Learn from the database designers |
|
one construct, many views |
|
each view a selection from the whole |
How Many dtds?
|
|
|
|
How many dtds might the TEI require? |
|
one (the Corporate or WKWBFY approach) |
|
none (the Anarchic or NWEUMP approach) |
|
as many as it takes (the Mixed Economy or XML approach) |
|
Or a single main dtd with many faces
(the British approach) |
The TEI solution:
modularization
|
|
|
a (very) large number of element and
attribute definitions |
|
organized as tagsets (core, base,
additional, or auxiliary) |
|
grouped into classes |
How to combine Tag Sets…
|
|
|
all tag sets, all the time (the table
d'hôte model) |
|
a few pre-selected combinations (the
combination plate model) |
|
in completely unconstrained abandon
(the smørgasbord model) |
|
one from column A, two from column B
(the Chinese menu model) |
The Chicago Pizza Model
|
|
|
<!ENTITY % base “(deepDish | thinCrust | stuffed)” > |
|
<!ENTITY % topping “( pepperoni
| mushrooms | sausage | pepper | anchovies | ...)” > |
|
<!ELEMENT pizza - - |
|
( %base;, tomatoSauce & cheese,
%(topping)) > |
To build a view of the
TEI dtd, take...
|
|
|
the core tagsets |
|
the base of your choice |
|
the toppings of your choice |
TEI base tagsets
|
|
|
|
one only must be selected |
|
defines basic structural components |
|
currently defined: |
|
prose, verse, drama |
|
transcribed speech |
|
dictionaries |
|
terminological databases |
|
mixtures of bases require special
treatment |
TEI additional tagsets
|
|
|
|
sets of elements for specialized
application areas |
|
can be mixed and matched ad lib |
|
currently provided: |
|
linking and alignment;
analysis; feature structures;
certainty; physical transcription; textual criticism, names and dates; graphs
and trees; figures and tables; language corpora.... |
For an XML DTD
|
|
|
Just add another declaration to the
subset |
|
|
|
|
|
(This is new in TEI P4) |
How does this work?
|
|
|
enables all declarations within the tagset
marked section defined in the main TEI dtd |
|
these may include element, attribute,
and class definitions |
How does this work?
|
|
|
|
Within the main DTD: |
|
the declarations making up each tagset
are enclosed by an IGNORE marked section |
|
the declarations for each element are
enclosed by an INCLUDE marked section |
|
this can be over-ridden by your
declaration within the DTD subset |
Customizing the TEI DTD
|
|
|
|
In DTD subset |
|
selection of tag sets |
|
specification of document entities |
|
in TEI.extensions.ent |
|
renaming of elements |
|
suppression of elements |
|
modification of TEI classes |
|
in TEI.extensions.dtd |
|
definition of new elements |
Entity definitions
|
|
|
typically will include entity
declarations for embedded graphics etc. |
|
may also invoke special characters etc. |
To modify the dtd
|
|
|
Define your modifications in a pair of
extension files |
In your extension files you
can…
|
|
|
|
rename elements |
|
<!ENTITY % n.p “para” > |
|
undefine elements |
|
<!ENTITY % seg “IGNORE”> |
|
The pizzaChef gives you a list of all
the elements available from your chosen tagsets, and generates extension
files for you |
You can also
|
|
|
supply additional (or replacement)
declarations |
|
|
|
|
|
supply entirely new elements and embed
them in the architecture |
An example
|
|
|
In the DTD subset we write: |
|
<!ENTITY % TEI.prose "INCLUDE"> |
|
<!ENTITY % biblStruct "IGNORE"> |
|
In the prose tagset it says: |
Finally, the pizza is cooked
|
|
|
|
The carthage program removes |
|
parameterization in the DTD |
|
unreferenced or inaccessible elements |
|
The pizzachef website |
|
http://www.tei-c.org/pizza.html |
|
command line equivalent: |
|
http://www.tei-c.org/maketeidtd/ |
|
|
Element Classes
|
|
|
|
Most TEI elements are assigned to one
or more |
|
model classes, identifying their
syntactic properties, or |
|
attribute classes, identifying their attributes |
|
This provides a (relatively) simple way
of |
|
documenting and understanding the DTD |
|
parameterizing content models |
|
facilitating customization |
|
An alternative way of doing
architectural forms |
Some TEI model classes
|
|
|
divn: structural elements like
divisions (<div>,<div1>, <div2>…) |
|
divtop: elements which can appear at
the start of a divn element (<head>, <epigraph>, <byLine>…) |
|
chunk: paragraph-like elements
(<sp>, <p>, <lg>…) |
|
phrase: elements which appear within
chunks (<hi>, <foreign>,
<date> …) |
Implementation of classes
|
|
|
Each model class is defined as a pair
of parameter entities |
|
|
|
|
|
|
|
Reference to class members is always
indirect |
|
|
Class mobility
|
|
|
|
Each model class is defined as a parameter entity, containing |
|
a reference to an initially null
extension class |
|
a list of members |
|
To add a new member to a class, we
redefine the extension class: |
|
|
TEI attribute classes
|
|
|
global: attributes which are available
to every element (n, lang, id, TEIform) |
|
linking: attributes for elements which
have linking semantics (targType, targOrder, evaluate |
The TEIFORM attribute
|
|
|
|
|
protects applications from the effect
of element renaming |
|
<titre TEIform="title">...</titre> |
|
protects applications from the effect
of syntactic sugar |
|
<abc type="xyz”> can be rewritten as |
|
<xyz TEIform="abc"> |
TEI Auxiliary DTDs
|
|
|
|
independent dtds for specialized
information: |
|
writing system / character set |
|
feature system (for feature-structure
notation) |
|
tag set documentation |
|
independent, free-standing TEI header |
What can go wrong?
|
|
|
extensions must use SGML syntax |
|
beware of zombie elements |
|
beware of over zealous pruning |
|
remember that some TEI rules are not
enforced (or enforceable) by the DTD |
|
You have to know what's on the menu
before you can choose from it |
A case study: the
Lampeter corpus
|
|
|
|
Fairly typical requirements for
historical language corpora: |
|
light presentational tagging |
|
structural markup for access |
|
detailed information about source text
production |
|
small number of tags to ease data
capture and validation |
|
Implementation |
|
tagsets: prose base, and tags from four
additional sets |
|
some extensions, many exclusions |
The Lampeter corpus DTD
subset
|
|
|
<!DOCTYPE teiCorpus.2 SYSTEM
"tei2.dtd"[ |
|
<!ENTITY % TEI.prose "INCLUDE"> |
|
<!ENTITY % TEI.corpus
"INCLUDE"> |
|
<!ENTITY % TEI.figures
"INCLUDE"> |
|
<!ENTITY % TEI.transcr
"INCLUDE"> |
|
<!ENTITY % TEI.extensions.ent SYSTEM
"lampext.ent"> |
|
<!ENTITY % TEI.extensions.dtd SYSTEM
"lampext.dtd"> |
|
]> |
The Lampeter corpus
extensions.ent
|
|
|
<!ENTITY % analytic 'IGNORE' > |
|
<!ENTITY % biblStruct 'IGNORE' > |
|
<!-- hic desunt multa --> |
|
<!ENTITY % supplied 'IGNORE' > |
|
<!ENTITY % x.phrase "it|ro|sc|su|bo|go|"> |
|
<!ENTITY % x.biblPart
"printer|pubFormat|bookSeller|"> |
|
<!ENTITY % x.demographic
"socecstatusPat|biogNote|"> |
The Lampeter corpus extensions.dtd
|
|
|
<!ELEMENT it (%phrase.seq;)> |
|
<!ELEMENT printer (%phrase.seq;)> |
|
<!ATTLIST it %a.global; > |
|
<!– etc.for all other new elements
--> |
To finish the job
|
|
|
Document your extensions, using the TEI
tagset for tagset documentation |
|
Write a manual using the ODD system to
generate your DTD fragments |
|
|
Why bother?
|
|
|
|
The TEI is a well-known reference point |
|
Using the TEI enables |
|
sharing of data and resources |
|
shared modular software development |
|
lower learning curve and reduced
training costs |
|
The TEI is stable, rigorous, and
well-documented |
|
The TEI is also flexible, customizable,
and extensible in documented ways |
|
The architectural approach offers the
best compromise for practical work. |