.* TEI Document No: SCM12 .* Title: Minutes of the Steering Committee Meeting .* Bernardsville, New Jersey, 17-18 March 1990 .* Drafted: 19 March 1990 (transcribed by VAM) .* Revised: 26 March 1990 MSM corrected some typos .* 3 April 1990 MSM from SJ: typos, paraphrase of SJ remarks .* 24 April 1990 MSM Script problems. .* .im gmlpaper ;.* Use GMLPAPER or GMLGUIDE (or -MLA) .sr docfile = &sysfnam. ;.sr docversion = 2 .im teigml .* Document proper begins. <title>Minutes of the Steering Committee Meeting <title>Bernardsville, 17-18 March 1990 <author>C. M. Sperberg-McQueen <docnum>TEI &docfile. <date>&docdate. <attend> Present: Donald Walker (DW), chair; Robert Amsler (RA); David Barnard (DB); Lou Burnard (LB); Susan Hockey (SH); Nancy Ide (NI); Stig Johansson (SJ); D. Terence Langendoen (TL); Michael Sperberg-McQueen (MSM); Antonio Zampolli (AZ) </attend> </titlep> </frontm> <!> <body> <h1>1. Agenda DW proposed the following agenda, which was adopted: <sl> <li>1. Reports from committee heads and discussion <sl> <li>a. Metalanguage and Syntax (ML) <li>b. Text Representation (TR) <li>c. Text Analysis and Interpretation (A&I) <li>d. Text Documentation (TD) </sl> <li>2. Budget report <li>3. Planning for remainder of first cycle <li>4. Planning for second cycle: organization, testing, validation <li>5. Promotion of the guidelines and planning for presentations in Siegen and Pittsburgh. </sl> <!> <h1>2. Metalanguage DB began by apologizing for delays in the committee's work resulting from his administrative load being higher than expected. <p> The committee's contributions to the draft of June 1990 include chapters 3 (SGML Markup), 9 (Extending and Modifying the Guidelines), and sample translations into SGML from other encoding schemes in appendix B. <ul> <li>Section 3.1 (Introduction to SGML) is being drafted by LB. Section 3.2 (SGML Declarations) takes its content and some text from document MLW13, drafted by DB. Section 3.3 will not exist in the form described by EDW6 and EDP2's draft table of contents. <li>Chapter 9, which DB has drafted and distributed to the committee, deals with the problems of renaming tags and changing the structural definition of a document type. The mechanism agreed upon by the committee, nicknamed the <q>pizza model</q> to distinguish it from a range of other possibilities with their associated culinary metaphors.<fn>Notably the table d'ho^te, the combination plate, and the Chinese menu. -Ed.</fn> This model allows aggregates of tags (<q>crystals</q>) with well-defined internal structure to coexist in a document with overall a very loosely defined structure. Relatively simple mechanisms allow gradations in the degree of freedom permitted in a document's internal structure. <li>The appendix of examples is underway. In the absence of a defined TEI layout tag set, no great progress is possible, and the half-dozen examples expected may deviate slightly from the final form of the recommended tag set. </ul> <p> In the light of the continuing press of his other duties, DB expressed a desire to step down as committee head at the conclusion of the first cycle. The steering committee expressed its appreciation of DB's work on the project and DB confirmed his intention to continue as head through the end of the first cycle and his willingness to continue serving as a member of the committee. <p> DB identified two open work items: the study and possible specifications of a packing and unpacking mechanism similar to that of the SGML Data Interchange Format (SDIF), which might be carried over into the second cycle, and a paper on naming conventions, which should be useful in the preparation of DTDs. AZ asked what work the committee would need to perform in the second cycle; DB responded that apart from continuing work on translation mechanisms into the TEI scheme, the committee would need to work on formalizing methods for combining the results of other committees' work in useful packages. <p> The steering committee discussed the possibly different requirements of data interchange and data capture (or human manipulation of documents using non-SGML-aware programs). It was agreed that a general section on data capture and human readability would be included in the guidelines and that TEI DTDs would include final specifications of the omissibility of tags even though for interchange purposes the SGML feature OMITTAG is to be turned off. Committees would however not be encouraged to develop DTDs specifically for data entry or capture where such DTDs would differ structurally from those used for data interchange. Such special-purpose DTDs are held to be application-specific tools and the responsibility of software developers or individual projects. <p> The steering committee agreed that tag names, attribute names, and attribute values would be specified in all languages of the EEC in the version of June 1992, but not necessarily in June 1990. If versions in all languages are not available (it is agreed that they will not be), then a note announcing our intention will be added to the preface of the June 1990 draft. <p> The <q>pizza model</q> and some of its applications were discussed. <!> <h1>3. Text Representation <!> SJ reported that the TR committee has drafts in hand for chapters 4 (Character Sets), 6 (Features Common to Many Text Types), and parts of 7 (Specific Literary Texts and Office Documents). These drafts are being revised, with a deadline of 26 March. <p> The formal tag descriptions required for the alphabetical list of tags will take longer. SJ asked whether formal tag descriptions should be prepared for cases where several alternative methods of achieving a result were presented in the guidelines. MSM asked that tag descriptions be omitted for tags used as casual examples of possible extensions (e.g. some examples in the section on references) but included for tags described as parts of a fully worked out alternative (as in the section on text criticism). <p> SJ observed that the formal descriptions should include a reference to a discussion of a tag in context in the earlier chapters. He suggested a relatively fine division of the guidelines into brief sections with a structural hierarchy limited to three levels (so that section numbers like <q>6.2.1</q> are possible but not <q>6.2.2.1</q>). <p> SJ observed that <ol><li>the first condition for producing machine-readable texts for scholarly purposes is systematic encoding plus documentation, regardless of how the encoding is done; <li>achieving uniformity of encoding (e.g. standardization on one set of tag names) represents additional progress, even without using the full machinery of SGML; <li>full exploitation of SGML represents a distinct level of standardization and may be difficult to achieve, because the formalism can be so forbidding. </ol> Many researchers might tag the same features as the TEI recommends (<q>level 1</q>), others might do so with the same tags as the TEI stipulates (<q>level 2</q>), but the amount of knowledge required might prevent many from fully exploiting the formal power of DTDs and other SGML formalisms (<q>level 3</q>). The Steering Committee agreed that usability is crucial and expressed the hope that chapters 3 and 9 on SGML markup and DTD modification could over time be sufficiently elaborated to make DTD extensions a reasonable proposition for serious readers of the guidelines. It is expected that the draft of summer 1990 may require readers to study SGML independently in order to make full use of the mechanisms of the system. <p> SJ noted that like DB he has found it difficult to meet the time commitment for chairing a committee. DW stressed the need for continuity in the committees and hoped that some arrangement could be made so as to enable SJ to remain as head of the TR Committee. <p> SJ presented and discussed an example text from the LOB corpus recorded using SGML-style tagging and observed that the exercise exposed a number of issues where the TEI's practice is unclear and needs specification. Treatment of record ends and white space around tags were discussed and the editors suggested that TEI-conformant software be required to treat series of white-space characters like single spaces, with the possible exceptions of double spaces after sentence ends and blank lines. Whether any characters of the data streams should be deleted when descriptive tags were added (e.g. the quotation marks around a tagged quotation) was discussed without consensus being achieved. <p> The steering committee commented briefly on some details of the example, and discussed at some length whether descriptive markup of underlying features (other than the surface rendering of those features) should be recommended. MSM reviewed the range of solutions foreseen in document EDW9 (Points of Style), from requirements which might be circumvented only with proper declarations overriding the default, through recommendations intended to apply in cases where they are achievable without being required where they are practical, recommendations applicable only under specified conditions, and gratuitous advice, to neutral observations to assist the reader in making an informed choice where the TEI cannot make blanket recommendations. It was agreed that descriptive markup should indeed be recommended as preferable to the alternative, other things being equal. It was also agreed that chapter 2 on the nature of the guidelines would contain a discussion of usability issues, a clear statement that the recommendations of the text are not to be misconstrued as absolute requirements, and a description of conditions in which one might legitimately choose to ignore certain recommendations. <!> <h1>4. Text Analysis and Interpretation <!> TL reported that the A&I committee, responsible for part of chapter 7 (dictionaries, lexica, and spoken texts) and chapter 8 (Analysis and Interpretation) has produced a draft of the dictionary section and agreed upon the basic technical content of chapter 8. The section on spoken texts must be deferred. Drafts of the sections on syntax, morphology, and phonology are due 31 March; some materials are due 15 April. <p> A single uniform markup scheme has been developed for all linguistic markup, TL reported. This comprises a tag set for feature structures for expressing analyses and a mechanism to align different analyses of a text. Analyses may appear either in-line in the document, segmenting the text and surrounding each segment with its analysis, or in a separate region of the document, in which case the text must be segmented and the segments of the text linked to their analyses using the alignment mechanism. In most cases the second solution will probably be more practical. <p> In either case, the linguistic markup will make extensive use of the ID and IDREF mechanism of SGML and relatively little other use of attributes. <p> TL described a simple example and demonstrated how the use of entity references could both reduce the size of the analysis and allow commonly agreed upon feature structures (e.g. for lexical and phrasal categories and their grammatical features) to be expressed in the same notation as less commonly accepted theoretical constructs. <p> The alignment mechanism automatically handles discontinuous segments. <p> Since feature structures and alignment mappings are not the working data structures of many linguists, the A&I committee has also developed two additional sorts of tags, one for trees and one for multiple transcriptions or analyses with implicit alignment. A third additional set for categorial grammar might also be included. <p> A discussion of ambiguity ensued, which elucidated the treatment of ambiguities both true and apparent, both general and local, and which also clarified the treatment of indeterminate (as in indeterminate PP-attachment). Some aspects of ambiguity (preference marking and garden paths) need further work. <p> At its recent meeting in Tucson, TL reported, the A&I committee heard a report from Robert Ingria on computational lexica. Further work on this area is expected for the second cycle; no recommendations will be forthcoming in June 1990. <p> SH asked how far along the drafts are. The answer, said TL, is: not very far. <!> <h1>5. Text Documentation <!> MSM described the state of work in the TD committee in the absence of its chair, Dominik Wujastyk (DWu). General agreement on principles was achieved in Toronto in June, 1989, but since then the committee work has not prospered, owing to MSM's other duties (which led earlier this year to the nomination of DWu as the new head). <p> DWu is now drafting a proposal for chapter 5 (Text Documentation) which is to be discussed at the TD committee meeting in Oxford next week. According to discussion between MSM and DWu in January, this draft will be based on the International Standard Bibliographic Description model with seven major bibliographic areas (realized here as tags) to which the TEI would add a (nested) bibliographic description of the source and area (tag) for TEI declarations. The tags to be included in the TEI declarations will come in part from the TD committee but also largely from TR and A&I. <p> AZ inquired where descriptions of systematic omissions, normalization, and corpus sampling methods would go. MSM said they would go into the TEI Declaration section in the forms recommended by the TR. AZ expressed the view that the TEI Declarations section was very important in interchange. <!> <h1>6. Planning for End of First Cycle <!> SH asked who should receive the guidelines when the draft is made public. NI was assigned to prepare a mailing list by 1 July. <action> <who>NI <act>to prepare mailing list for distribution of draft <duedate>1 July 1990 </action> <p> The SC agreed that the public draft of June 1990 should make clear that substantial changes may still occur within the guidelines. The schedule for completion of the first public draft was discussed, with the following results: <sc compact=1> <scheddate>31 March <schedtodo>Committee texts due <scheddate>1 May <schedtodo>First rough draft (draft 0) completed by editors and distributed within project to SC and working committees <scheddate>15 May <schedtodo>Major comments due--comments made by this date should be accommodated if at all possible <scheddate>19-20 May <schedtodo>SC meeting, Chicago <scheddate>24 May <schedtodo>Text is locked <scheddate>31 May <schedtodo>Draft (draft 1) sent to Advisory Board and Affiliated Projects with request for comments by 1 July <scheddate>1 July <schedtodo>Comments due from Advisory Board and Affiliated Projects <scheddate>15 July <schedtodo>Draft made public, mailed to interested parties, NEH & EEC </sc> The committee agreed to meet in Chicago to discuss the text of the guidelines immediately before it is locked. <action> <who>MSM <act>to reserve rooms in Chicago <duedate>a.s.a.p. </action> <p> The status of the various chapters overall was reviewed: <sl> <li>1. to be drafted by editors <li>2. to be drafted by editors <li>3.1 being drafted by LB <li>3.2 content set, to be drafted by DB <li>3.3 content set, (descriptions of mechanism as described above <li>4. drafted, being revised by Steven DeRose <li>5. expected from TD meeting, 23-24 March <li>6. drafted; being revised by TR (various hands) <li>7. drafted; being revised by TR and A&I (various) <li>8. technical content set; being drafted by A&I <li>9. drafted; being studied by ML; to be revised by DB <li>10. expected from all committees <li>11. to be deleted <li>12. to be drafted by editors <li>Appendix A. to be drafted by editors or volunteers from SC <li>Appendix B. to be drafted by ML <li>Appendix C. (DTDs) to be drafted by editors with assistance from ML <li>Appendix D. (Examples) to be drafted by working committees </sl> <!> <h1>7. Planning for Second Cycle <!> DB listed as likely topics for ML work in the second cycle: <sl> <li>1. collect experience with the pizza model, test its strengths <li>2. develop methods for constructing and refining DTDs <li>3. develop a DTD construction kit (?) <li>4. develop further sample transformations from other schemes into TEI <li>5. possibly develop method of specifying (& building ?) TEI validation software </sl> <p> The SC asked that the ML committee continue to monitor the project's progress with a view to registering TEI types, character sets, and entity sets with ISO when appropriate. <!> <h1>8. Budget Report <!> MSM reported briefly of the expenditures to date. Travel expenditures to date total approximately $86,000. If promised matching funds are obtained from NEH, this amounts to a $12,000 deficit; if all promised matching funds are obtained, the project will have a surplus of roughly $25,000. <p> The possible uses of this surplus were discussed, with tentative agreement to spend up to approximately $5,000 on editorial travel to allow the editors to confer in person, up to $10,000 on fees for a consultant to help with construction of appendices and testing of the DTDs and examples, approximately $3,000 each on a further meeting of the Steering Committee and of the dictionary encoding group, and up to $1,000 for travel by the committee heads during final revisions. This leaves $3,000 for other costs arising. <!> <h1>9. Further Discussion of Second Cycle <!> The Steering Committee discussed the organization of the second cycle with the committee heads; the idea of replacing the large committees with several smaller groups whose heads would receive an honorarium but not necessarily negotiated release time with their universities was discussed. The large committees would still be needed to coordinate the work of these subcommittees, but need not meet more than once or twice. <p> TL suggested that the A&I committee or its successors would need in the second cycle to address four areas: <ol> <li>Linguistic / grammatical work (continuing the work already done) <li>Speech research, speech transcription, etc. <li>Discourse analysis, semantics, and pragmatics <li>Literary and historical analysis (which might in turn need further subdivision) </ol> <p> SJ suggested that the TR committee or its successors would need in the second cycle to address these areas: <ol> <li>Development of conventions for more languages and character sets <li>Further formalization, especially for complex text types, notably for critical editions <li>Development of encoding schemes for less common text types, e.g. papyri, inscriptions, reference works, technical and scientific documents, and legal texts <li>Annotation of content in historical source material (e.g. the identification of referents, etc.) </ol> <p> The committee agreed that liaison work with users of the first public draft would require a great deal of time and work. It also seems wise to provide some sort of general SGML training for all working committees in the second cycle, either at committee meetings or in a couple of project-wide workshops to which all committees would be invited. <!> </body> </gdoc