.sr docfile = &sysfnam. ;.sr docversion = quiet;.im teigmlp1 .* Document proper begins. Text Encoding Initiative <title>A Progress Report, Summer 1990 to Summer 1991 </titlep> <!> <toc> </frontm> <!> <body> <p> This document reports on recent work (1990-91) in the Text Encoding Initiative. <p> The Text Encoding Initiative is a cooperative undertaking of the international textual research community to formulate and disseminate guidelines for the encoding and interchange of machine-readable texts intended for literary, linguistic, historical, or other textual research. At present, the chaotic diversity of encoding schemes used for such texts makes it difficult to move texts from one software program to others, and researchers who exchange texts with others lose valuable time deciphering the texts and converting them into their local encoding scheme. The primary goal of the Text Encoding Initiative is to provide explicit guidelines which define a text format suitable for data interchange and data analysis; the format is to be hardware- and software-independent, rigorous in its definition of textual objects, easy to use, and compatible with existing standards. Representatives from approximately twenty learned societies and professional associations whose members are actively concerned with the encoding of machine-readable literary and linguistic material serve on an Advisory Board to guarantee that a wide range of expertise is brought to bear on the problems. <p> The TEI is sponsored by the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC). It is funded in part by the U.S. National Endowment for the Humanities, an independent federal agency; Directorate General XIII of the Commission of the European Communities; the Andrew W. Mellon Foundation; and the Social Science and Humanities Research Council of Canada. <!> <h1>Distribution of Guidelines <p> The <cit>Guidelines for the Encoding and Interchange of Machine-Readable Texts</cit> (<q>Guidelines</q>) <fn> <cit>Guidelines for the Encoding and Interchange of Machine-Readable Texts</cit>, ed. C. M. Sperberg-McQueen and Lou Burnard, ver. 1.1 (Chicago, Oxford: Text Encoding Initiative, 1990), available from the TEI. </fn> are the major product of the Text Encoding Initiative, incorporating the work of scholars from across the world. <p> The first public draft of the Guidelines---Version 1.0---was made available in July, 1990, pulling together the results of the initial 1988-90 grant period in a volume of 300 pages. The Guidelines include an initial section with tutorial information on Standard Generalized Mark-up Language (SGML), followed by more detailed discussions of characters and character sets, bibliographic standards for electronic texts, textual features common to many text types (e.g. paragraphs, highlighting, emphasis, quotations, names), textual features of specific text types (e.g. corpora, dictionaries, drama), the use of mark-up in linguistic analysis and interpretation, and means of <q>extending</q> the Guidelines to other specialties. <p> Since June, 1990, over 1,000 copies of the Guidelines have been distributed world-wide, and numerous comments have been received on the substance of the volume. These will be incorporated into Version 2, anticipated for January, 1992, which will again be distributed for public comment and revised for formal publication in summer of 1992. As many recipients have remarked, the initial version of the Guidelines is a fairly technical introduction to text encoding and the TEI. It is anticipated that Version 2 will be accompanied by much more tutorial information and will be more accessible to scholars in the humanities and social sciences with little background in computing. <!> <h1>Steering Committee <p> The Steering Committee met five times in the last year: July, 1990 (Oxford); September, 1990 (Chicago); December, 1990 (Oxford); March, 1991 (Tempe, Arizona, in conjunction with the annual joint conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing); and June, 1991 (Berkeley, California, in conjunction with the annual conference of the Association for Computational Linguistics). As described below, the creation of work groups within the Text Analysis and Interpretation and Text Representation committees was a major product of these deliberations. Planning for the future of the TEI after the expiration of current funding in June, 1992, and the initiation of contact with Japanese researchers interested in collaborating in further development of the TEI Guidelines were further major topics of work. Time was also spent on improving the organization of the project, especially in the areas of committee structure, reporting procedures, administrative support, and finances. <p> In addition, the steering committee has emphasized importance of developing TEI tutorial materials, the need for software support for the TEI scheme, and public dissemination of the products of the working committees and work groups. <!> <h1>Work Committees/Formation of Work Groups/Progress of Work Groups <p> The first draft of the Guidelines was based in large part upon the guidance of the four working committees set up by the initial TEI planning conference: Text Documentation, Text Representation, Text Analysis and Interpretation, and Metalanguage and Syntax Issues. <p> The Metalanguage Committee has continued to be active in meeting and producing papers dealing with the adaptation of SGML principles to TEI needs, the availability and development of SGML software, and the conversion of documents formatted in other languages into SGML. The Text Documentation committee essentially completed most of its work with the publication of the Guidelines, but will meet again in November, 1991, to review the bibliographic recommendations emerging from the other committees and their work groups, as described below. <p> In addition to these larger committees, the work of revision and extension of the Guidelines has been carried out since 1990 by a number of smaller work groups working under the auspices of the Text Representation and Text Analysis and Interpretation Committees. It is the responsibility of these groups to examine more specialized aspects of their subject areas. Those working under the Text Representation committee are (with their current heads): <ul> <li>TR1 Character Sets (Head: Harry Gaylord, Groningen) <li>TR2 Text Criticism (Head: Peter Robinson, Oxford) <li>TR3 Hypermedia (Head: Steven DeRose, Electronic Book Technology, Providence, R.I.) <li>TR4 Formulae, Tables, Graphics, etc. (Head: Paul Ellison, Exeter) <li>TR6 Language corpora (Head: Douglas Biber, Northern Arizona) <li>TR8 Physical Description of Printed Books (Head: John Barnard, Leeds) <li>TR9 Physical Description of Manuscripts (Head: Jacqueline Hamesse, Louvain-la-Neuve) <li>TR10 Verse (Head: David Robey, Manchester) <li>TR11 Performance (Head: Elli Mylonas, Harvard) <li>TR12 Literary Prose (Head: Tom Corns, U. of Wales, Bangor) </ul> Those working under the committee for Text Analysis & Interpretation are: <ul> <li>AI1 Linguistic Description (Head: Terry Langendoen, Arizona) <li>AI2 Spoken Texts (Head: Stig Johansson, Oslo) <li>AI3 Literary Studies (Head: Paul Fortier, Manitoba) <li>AI4 Historical Studies (Head: Daniel Greenstein, Glasgow) <li>AI5 Machine-readable Dictionaries (Head: Robert Amsler, Mitre Corp.) <li>AI6 Machine Lexica (Head: Robert Ingria, BBN Technologies) <li>AI7 Terminological Data (Head: Alan Melby, Brigham Young University) </ul> <p> These groups have, for the most part, been quite active in evaluating Version 1 of the Guidelines, analyzing the issues associated with their special disciplines, and either producing tag sets for the types of documents associated with these disciplines or enunciating principles on which to base the creation of TEI tags. <p> Among the fruits of their efforts are these: <ul> <li>the Character Set work group has established contact with responsible officials at the International Organization for Standardization (ISO), who will cooperate in their work; <li>the Hypermedia group has established contact with its counterparts at the American National Standards Institute (ANSI); <li>the Linguistics work group prepared a detailed set of recommendations for the encoding of morphological and lexical information, including languages of the EEC and Russian; <li>the Spoken Texts work group has produced a thorough analysis of the major issues associated with these specialized texts (e.g. oral histories, linguistic and lexicographical analyses, radio transcripts), and produced an initial set of tags associated with speech and conversation; <li>the Literary Studies work group conducted a survey of literary scholars to ascertain their needs in text encoding; <li>the Historical Studies work group has produced a major volume, <cit>Modelling Historical Data</cit>, describing the theoretical issues faced by historians seeking to standardize the encoding and analysis of historical data, and the specific issues associated with various types of sources; <fn> <cit>Modelling Historical Data: Towards a Standard for Encoding and Exchanging Machine-Readable Texts</cit>, ed. Daniel I. Greenstein (Gottingen: Max-Planck-Institut fuEr Geschichte, 1991), available from the TEI. </fn> <li>the Dictionary work group has examined existing electronic encoding of dictionaries, and is in the process of producing a set of tags for both monolingual and multilingual dictionaries; <li>the Terminological Data group has conducted an extensive survey of data categories used by existing terminological database systems, and has met twice to develop a set of TEI tags for terminological data. It will recommend that these be adopted by the ISO Technical Committee 37, Subcommittee 3 at its meeting in Vienna, Austria in November. </ul> <p> The work of the groups will make the Guidelines somewhat more complete in their coverage than was Version 1; it is clear, however, that some areas initially expected to be included in the 1992 version of the Guidelines will not be ready for standardization in June, 1992, but will require further discussion and experiment. For this reason, the TEI steering committee now plans for the TEI to continue as an ongoing project and to supplement the Guidelines of June, 1992, with recommendations for further specialized areas, as consensus on proper practice in these areas emerges. <p> In areas where consensus does prove achievable, the results of the work groups will be incorporated into a second version of the TEI Guidelines, to be distributed early in 1992 for public comment. A third version, revised in accordance with comments received on version 2, will be submitted in May, 1992, to the TEI Advisory Board for endorsement, followed by formal publication. <!> <h1>Affiliated Projects <p> A number of projects affiliated with the TEI have put the Guidelines to the test of practice by applying the draft recommendations to their collections. In early 1991, the Steering Committee named individuals to consult with the affiliated projects and help them in their application of the TEI scheme to their individual problems. As goals for the cooperation of the TEI and the affiliated project, the Steering Committee set: the preparation of one short example (10,000 to 20,000 characters) of real text from each project in TEI-conformant form, followed later by the creation of a large example (100,000 characters or more) in TEI-conformant form, together with a brief explanation of how the material was made TEI-conformant. These examples will be made publicly available, if possible, and may be included in a collection of examples of TEI usage to be made public with some future revision of the Guidelines. The application of the TEI encoding scheme to data being prepared by the affiliated projects, with the requirement that the TEI encoding be able to capture all the information required by the projects, represents a large scale systematic test of the TEI Guidelines on actual scholarly material. It is gratifying to report that results so far have been positive, in that no fatal flaws have been found in the TEI scheme, although a number of improvements will follow from the suggestions of the affiliated projects. <p> The affiliated projects include: <ul> <li>American and French Research on the Treasury of the French Language (ARTFL) (French language and literature, Revolution to 20th Century) <li>Bar Ilan Corpus of Modern Hebrew (language corpus) <li>British National Corpus (language corpus) <li>Brown Women Writers Project (encoding of English language women's writing) <li>Data Collection Initiative, ACL (language corpus) <li>Georgetown Center for Text and Technology Hegel Project (encoding Hegel's works) <li>Institute for Formal and Applied Linguistics, Charles University (Prague), Czech-English translation corpus (bilingual translation corpus for work in Czech/English machine translation) <li>Leiden Armenian Database (Armenian language and culture) <li>Milton Project (encoding of Milton's works) <li>Network of European Corpora (language corpus) <li>Nietzsche Project/Dartmouth (encoding of Nietzsche's works) <li>Perseus Project (materials on classical Greek civilization) <li>Stockholm-Umea Corpus of Modern Swedish (language corpus) <li>Brandeis/Thomas Middleton Project (encoding of Middleton's works) <li>Vassar/CNRS Machine-Readable Dictionary Project (application of AI techniques to work with machine-readable dictionaries) </ul> <p> In July, 1991, week-long workshops for representatives of the affiliated projects were held in Oxford, England, and Providence, Rhode Island. <!> <h1>Other Publicity and Dissemination <p> In addition to distributing the Guidelines, the Text Encoding Initiative has actively pursued the involvement of scholars from across the world through a series of workshops, articles in humanities and social science magazines, newsletters and journals, and the TEI-L electronic bulletin board. <!> <h2>Workshops <p> Four workshops have been held since the release of the Guidelines: Chicago, Illinois (25 participants); Tempe, Arizona, in conjunction with the annual joint conference of ACH and ALLC (40 participants); Providence, Rhode Island (30 participants); and Oxford, England (40 participants). In these workshops, persons associated with the TEI working committees, work groups and affiliated projects have learned the purpose and scope of the TEI, basics of SGML software and usage, and the basics of the TEI recommendations. In addition, they had the opportunity to get hands-on experience with SGML software for the MacIntosh or the IBM PC. <!> <h2>Papers and Presentations <p> In addition, the TEI has been presented in talks at professional meetings and in papers. A list of the major talks and papers is appended (TEI SCR14). <!> <h2>TEI-L Bulletin Board <p> The TEI maintains a public bulletin board, TEI-L@UICVM (or, from the Internet, tei-l@uicvm.uic.edu), for electronic discussion of the TEI and the TEI markup scheme. Major TEI papers produced by the working committees and work groups are announced on TEI-L. (To subscribe, send electronic mail to LISTSERV@UICVM on Bitnet with the contents <q>subscribe TEI-L Firstname Lastname</q>). <!> <!> <h1>TEI Documents <p> Much of the effort of TEI since the publication of the first version of the Guidelines is their evaluation and extension by the TEI working committees and work groups. In the process of carrying out their work, the committee and group members prepare working papers explicating the issues associated with their various disciplines, and in some cases proposing preliminary sets of TEI tags. These documents are either distributed publicly after announcement on the TEI-L Bulletin Board or internally to all TEI work group heads and affiliated projects to allow integration and cross-breeding between the various disciplines. <p> Attached to this report as appendices are lists of documents publicly available from the TEI (TEI A4), and those for internal distribution only (TEI A0). <!> </body> </gdoc>