Below we describe the tasks of the Metalanguage (ML) committee. We assume that SGML will serve as the basis for the interchange format for the TEI project.
The Metalanguage committee will generate a set of guidelines for writing SGML Document Type Definitions, DTDs, and instances of DTDs.
The Metalanguage committee will develop a formal metalanguage for the description of existing encoding schemes. A survey of existing schemes will be undertaken prior to the development of the metalanguage. The choice of schemes to be included in the survey will be determined by the following criteria: 1) the scheme is judged to be sufficiently representative of encoding in the domain by the Metalanguage Committee, 2) sufficient documentation of the scheme can be obtained; and 3) electronic versions of data encoded with the scheme can be obtained.
The SGML guidelines will be generated primarily by the ML committee head, in cooperation with, and concensus from, all committee members. The first version of these guidelines will be presented officially to the TEI in June, 1989. It is anticpated that three tasks will emerge as necessary to implement the SGML guidelines: 1) the acquisition or development of software environments for generating DTDs and their instances; 2) the formal specification of all TEI DTDs, using the respective environment; and 3) the formal specification of all TEI instances of DTDs, using the respective environment. As the exact descriptions of these tasks become available, members of the ML committee will be assigned to their execution.
With respect to the metalanguage, a proposal will be generated for the lanaguage primarily by the ML committee head, in cooperation with, and concensus from, all committee members. The first version of this proposal will be presented officially to the TEI in September, 1989. The primary task that will emerge as a result of the proposal is to test the power and applicability of the metalanguage with respect to currently existing encoding schemes.
To date, we have gathered at Ohio State University sufficient documentation and electronic versions of encoded data for each of the following schemes: (1) Lancaster-Oslo-Bergen Corpora of Modern English, (2) COCOA, (3) Scribe, (4) Dictionary of the Old Spanish Language, (5) Thesaurus Linguae Graecae, and (6) WATCON.
We have considered and decided to exclude from the testing phase the following schemes, basically because they appear to us to be so similar to the schemes we have chosen and likely will add no additional information to our analysis: (1) Brown Corpus, (2) General Inquirer, (3) Aramaic Lexicon, and the text-formatters (4) SCRIPT/GML, (5) LaTeX, and (6) troff. After the testing phase, however, all these schemes may eventually be translated to the metalanguage, as the needs of TEI members dictate.
The committee will be polled for advice and consensus on the choice of schemes and suggestions for adding new ones. Once a metalanguage proposal is in place, members of the ML committee will be assigned to encode existing schemes, other than those mentioned above, using the metalanguage.