The Text Encoding Initiative Guidelines have been widely adopted by projects and institutions in many countries in Europe, the Americas, and Asia, and are used for encoding texts in dozens of languages. However, the Guidelines are written in English, the examples are largely drawn from English literature, and even the names of the elements are abbreviated English words. We need to make sure that the TEI and its Guidelines are internationalized and localized so that they are accessible in all parts of the world.
The Text Encoding Initiative Guidelines [TEI] have been widely adopted by projects and institutions in many countries in Europe, North America, and Asia, and are used for encoding texts in dozens of languages. For example, the projects listed at http://www.tei-c.org/Applications/ have examples of work involving Chinese, Danish, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Latin, Norwegian, Serbian, Spanish, Welsh, and some African languages; but given that the Guidelines are c. 1400 pages of fairly dense technical English, it is possible that only the more dedicated scholars get involved.
It may be useful to distinguish between what we might call ‘traditional’ or documentary approaches to translation, which focus on translating the descriptive prose of the Guidelines as a document, and ‘formal’ approaches which focus instead on translating the individual components (examples, element and attribute names, technical descriptions) in a way that enables these components to be used within the formal structures of the TEI as a technical standard. While the first approach may be very useful, the results are more difficult to maintain over the long term and are also more difficult to produce, since they cannot be accomplished in discrete chunks. The latter approach is the one we propose here, since it is more easily maintainable (only the affected elements need to be updated when changes are made to the Guidelines) and can be more easily undertaken in a distributed fashion by collaborative groups.
<distinct>
to consider, where the value of ‘fag’ gives
little help. <líneaDirección>
, <ligneAdresse>
,
<linDireccio>
or <AdressZeile>
instead of <addrLine>
It should be noted that element name translation by itself is quick
and useful, but necessarily the most effective way to proceed. For
example, many of the element names are in an abbreviated form of
English (eg <respStmt>
) which are not easy to translate
sensibly. Furthermore, unless the reference descriptions are also
translated, the element names by themselves do not give a clear idea
of what the element is for. Using <infoResp>
instead of
<respStmt>
is not as helpful as translating the description
‘supplies a statement of responsibility for someone responsible for
the intellectual content of a text, edition, recording, or series,
where the specialized elements for authors, editors, etc. do not
suffice or do not apply.’
<g>
(where <g>
is a reference to a non-Unicode character)<glyph>
element in the TEI header. In the following example, we define a new
character and assign it to a position in the Unicode Private Use Area
(PUA); we also prode a standardized form as a fallback:
<gi>
element, as in
<charDesc>
element allow the
user to provide an image file which has a picture of the character.
It is also possible to override what appears in the text by using
markup like this
<g>
element can be used immediately
without any lookup.Where a character is simply a relatively unimportant
variant on a Unicode character, the
user does not need to define a point in PUA, but can simply use
<charDesc>
to describe the variation.
The ODD language has allowance for translating element name,
attribute names, and descriptions, and for preserving information to
allow canonicalisation. The technical documentation elements
(<gloss>
and <desc>
) for TEI elements and attributes etc
can be specified multiple times, in different languages, distinguished
by the standard xml:lang attribute. There is also a container
(<equiv>
) to specify the relationship of an element, attribute
or value to standardised schemes.
<altIdent>
element; so
the example above could be rewritten as
<taxonomy>
is
defined by the containing pattern ‘taxonomy’; it is the
pattern name which other elements use, not the
element name. If the schema were translated into Chinese, it would
look like this:
<altIdent>
. The descriptions work in the same way. We can
expand the TEI source to add French translations alongside the English
originals, and the appropriate text can be passed to the generated schemas or
documentation:
<altIdent>
information, and
puts the text back to canonical form.<divGen type="toc"/>
, will have to provide appropriate
translations. The TEI XSL family maintained by Sebastian Rahtz, for
example (http://www.tei-c.org/Stylesheets/teic/), can
operate in many languages:
ISO Language code | Text |
en | Contents |
de | Inhalt |
ro | Cuprins |
fr | Contenu |
pt | Índice geral |
es | Contenidos |
slv | Vsebina |
sv | Innehåll |
ch-TW | 內容 |
sr | Sadržaj |
ja | 目次 |
pl | Spis treści |
hi | Mula Shabda |
th | เนื้อหา |
nl | Inhoud |
ru | Оглавление |
tr | İçerik |
bg | Съдържание |
el | Περιεχόμενα |
<p>
elements should not normally
be translated, but the second <p>
has an explicit override.For the purposes of the formal translation procedure advocated by this paper, the ITS procedure provides a good framework.
<desc>
and <gloss>
texts, and
a mechanism to allow users to easily take advantage of the work.
The scale of work involved is not impossible to contemplate. The TEI
contains
<desc>
elements, 106666 characters<gloss>
elements, 32385 charactersThe first steps in formalized internationalization of the TEI (as opposed to the translations of the Lite document) were made by Alejandro Bia, to whom many thanks are due. Translation examples in this paper come from Pierre Yves Duchemin (French), Marcus Bingenheimer (Chinese), Arno Mittelbach (German) and Alejandro Bia (Spanish). Veronika Lux and Julia Flanders co-wrote some of the explanations of TEI I18N.
Chinese | Marcus Bingenheimer | Chung-hwa Institute of Buddhist Studies, Taipei |
Dutch | Bert Van Elsacker | - |
French | Laurent Romary | Nancy |
French | Veronika Lux | Nancy |
German | Christian Wittern | Institute for Research in Humanities, Kyoto University |
German | Werner Wegstein | Wuerzburg University |
Hindi | Paul Richards | UGS (The PLM Company), http://www.ugs.com/ |
Hungarian | Király Péter | - |
Italian | Fabio Ciotti | University of Roma |
Japanese | OHYA Kazushi | Tsurumi University, Yokohama |
Norwegian | Øyvind Eide | - |
Polish | Radoslaw Moszczynski | Warsaw University |
Portuguese | Leonor Barroca | Open University |
Romanian | Dan Matei | CIMEC - Institutul de Memorie Culturala, România |
Serbian | dr Cvetana Krstev | - |
Slovenian | Tomaž Erjavec, Matija Ogrin | Dept. of Knowledge Technologies, Jozef Stefan Institute, Slovenia |
Spanish | Manuel Sánchez | Miguel de Cervantes Digital Library |
Swedish | Matt Zimmerman | NYU |
Tibetan | Linda Patrik, Tensin Namdak | www.nitartha.org |