1. What is the TEI
1.1. 1.1 Origins
Origins in the literary-and-linguistic computing
community. Great interest immediately in other areas:
computational linguistics is in a stage of massive db development
and concern for reusability of data across theoretical boundaries.
Those affected by the project are
- researchers (esp. humanists but also computational linguists)
- publishers and industry
- software developers
- data archivists
- funding agencies
Requirements:
- It should specify a common interchange format for machine
readable texts.
- It should provide a set of recommendations for encoding new
textual materials.
- It should document the major existing encoding schemes, and
investigate the feasibility of developing a metalanguage in
which to describe them.
- It must be a set of guidelines, not a set of rigid
requirements.
- It must be extensible.
- It should be device- and software-independent.
- It should be language-independent.
- It should be application-independent.
We are conscious of a number of tradeoffs:
- in standardizing notation one risks standardizing the thinking
- there's a long way from the classicist in the garret to the
multi-million-dollar machine-translation project. We have to
keep things simple for the poor scholar, expressive for the
team with programmers to spare.
- we want rigorously defined standards, but they should be
clear and expressive. (Enough rigor will render anything
unreadable.)
1.3. 1.2 Organization
Sponsorship by ACH, ALLC, ACL.
Funding is from NEH, EEC, Mellon.
Participation by 15 other organizations.
Steering Committee, Advisory Board, Editors, Working Committees.
3. 3. Why Should Industry Care about the TEI?
Why should you care about this? Well, in the SGML revolution, the
research community are the Jacobins or the Bolsheviks. SGML attempts
the liberation of electronic texts from paper output. But it takes a
while to shake your thoughts free. But the research community has never
been fixated on ink on paper: texts have always appeared to researchers
as complex multi-leveled cultural and linguistic objects that exhibited
a lot of regularity but also a tremendous variety of form.
Also, whether it's obvious or not: our problems are your problems,
and your problems are our problems. Most industrial firms do not much
care about the textual criticism of the First Folio, but they do face
serious problems of version control -- which take the same form for text
applications. You may not care about literary allusion, but subject
indexing has many of the same problems. You may not care about the
problems of theoretical diversity, but the same problems arise in trying
to mediate among conflicting models in page description languages.