.* ABN1 - Notes from Advisory Board meeting .* on Committee Work Plans .* .* drafted from notes, 19 April 1989, by CMSMcQ .* These notes have been reviewed by the steering committee and reflect .* the feelings of the Advisory Board and the Steering Committee. .* (Comments in parentheses are my interpretations, and not to be .* viewed as binding in the same way as the AB and SC comments should .* be. -CMSMcQ) .* Notes from Advisory Board meeting on Committee Work Plans 1 General ------------------------------------------------------------ AB felt the definition of "text" as "extended natural discourse" was not consistently visible in all documents: historical documents and bibliographic data are included by TDR1 and TRR1, but are not necessarily discursive. Status of collections of elicited sentences (e.g. in field linguistics) is not clear: included or not? The AB felt the TEI documents should not focus on literary texts exclusively. Historical documents, especially, need to be included in the conception of the project. (Historical documents may pose a problem for us; where can the line be drawn between what we include and what we exclude? ("*Any* written survival is a text," according to the AHA representative.) For the moment, we may have to dodge the question, or include clearly discursive documents in cycle one and postpone all others until cycle 2. The historians' group headed by Manfred Thaller will be working on them from the beginning, but we don't have to guarantee results now. -CMSMcQ) The AB was concerned about the mechanisms available for expressing ambiguity of structure or interpretation in the encoded text, either when it involves different (conflicting) taggings of a passage or when it involves different (conflicting) segmentations of a text or passage. 2 TDR1 - Text Documentation ------------------------------------------ correct punctuation clarify concept of "minimal bibliographic identification" Clarify whether the normalization of texts before or during their capture will be handled by this committee. (Note: I believe the role of this committee can only be to provide declarations for the types of pre-editing done in the transcription of the text.) The committee must help users clarify when editing and normalization (whether before machine-readable capture or after) constitute a new edition of the text(s). add "rights" or "copyright status" to list of topics Some feeling in AB that scope of declarations should be variable. [I am not clear on the motive or the exact intent of this comment, and would like to confer with Stephen Anderson about it. -CMSMcQ] 3 TDR1 - Text Representation ----------------------------------------- in point i, delete "in fiction" delete first paragraph (introduction) and discussion of SGML in conclusion -- these topics are handled in the general design documents point e in the list ("hyphenation, including declaration of how hyphenation is treated") is unclear. "Treat" during encoding or "treat" during processing? There is some worry in the SC that attempts to specify processing of the data will complicate our task and lose the generality of the encoding scheme. (I think this means this item must be revised to refer clearly to declaration of what happened to hyphens in the text when the text was transcribed. -CMSMcQ) "foot-notes" is too specific; all sorts of annotation, whether footnotes, marginal notes, or endnotes, should be handled. Need further specification of the languages and text types to be covered in the project over all and especially within the first development cycle. (Either specification of the languages, or a specification of a method for specifying the languages.) AB felt concern that the passage in EDP1 (page 5) stating The goal of the Initiative is to devise and document encoding methods appropriate to every language used officially or studied extensively with machine assistance in Europe and North America. did not commit the Initiative clearly enough to handling languages not commonly studied today but worth preserving (e.g. dying languages). (It seems to me some argument must be made that handling all languages commonly studied will ensure that we have a very general and flexible approach, and some explicit promise should be made that generality of script encoding will be kept in mind. -CMSMcQ) If character sets and symbols are to be consistently represented, the TEI should include or develop some sort of registry function, so that parallel redevelopment of functionally identical character sets or symbol names is minimized. The registry functions performed by ISO should be investigated -- will the TEI sponsors have standing to register character sets, etc. with those registries? (The only ISO registry I know about is that for character sets, maintained for ISO by ECMA. Worth investigating. -CMSMcQ) The committee should define at an early state what counts as a "character" not just in printed books but in manuscripts (with symbols and abbreviations) and spoken texts. AB felt that focus on "accepted typographic representation" should not be overstressed. Apparatus criticus, for example, should have a notation appropriate to its function -- development of that notation should not be restricted by the typographic tradition. [N.B. I believe this comment missed the point: the committee is to provide tags for text *features* for which typography has conventions. But the tags are to express the features, not (just) the typography, and need not be derived specifically from the typography. We may need to stress the feature/typography distinction more. -CMSMcQ] revise the sentences In devising guidelines for machine-readable texts, it is natural to take conventions from printed texts as a starting-point and suggest ways of expressing typographical distinctions in machine-readable texts. The committee on text representation will handle features for which there are accepted typographical conventions. They are too prone to misinterpretation and can lead readers to assume we wish to develop typographic markup, not descriptive markup of the text features represented by the typography. 4. AIR1 - Analysis and Interpretation ------------------------------- on page 4 need new item in list making clear that the committee will be active in dictionary encoding as well. AB felt that at least basic requirements for spoken language should be handled during first development cycle. AB expressed some concern over the claim that ambiguous text (or text to which two conflicting interpretations apply) must be repeated with each tagging. (See also general comments, above.) [I believe committees TR and AI must develop a couple of simple, clear examples of ambiguity and multiple taggings of the same segment, and the ML committee must consider methods of expressing the multiple interpretations or ambiguity. -CMSMcQ] The SC believes this should be dealt with as early as possible. 5 MLR1 - Metalanguage and Syntax ------------------------------------- The document should say (1) what is to be done and (2) how that work is organized and who will do it, in that order. (If nothing else, this means move "Mode of Operation" to the end. -CMSMcQ) Document seems to reflect an inappropriate division of labor between the committee head and the committee. The committee is responsible as a whole for the collaborative discharge of its duties; it should not be assumed that the committee head will create all documents and the other committee members serve only as a sounding board. This is fair neither to the head nor to the other members of the committee. Accordingly, revise first section "Committee Mode of Operation" to reflect a broader distribution of the tasks among committee members. A more detailed plan of attack needs to be included for each task, but especially the second. The other documents of the TEI say the metalanguage will be developed and both formal and informal descriptions of major tagging schemes will be presented at the conclusion of the first development cycle. How will these things be accomplished? There was some desire for a list of the issues regarding SGML that need to be resolved by this committee, specifically mentioning the problem of multiple reference systems and other parallel hierarchies. ML committee needs to address the issue of ambiguity and multiple taggings for the same segment of text. Can this be handled in SGML? How?