Minutes of Chicago Meeting, <title>5-6 October 1991 <author>C. M. Sperberg-McQueen <docnum>TEI &docfile. <date>&docdate. </titlep> <!> </frontm> <!> <body> Present: Robin Cover (RC), Robert Kraft (RK), Ian Lancashire (IL), Peter Robinson (PR), Peter Shillingsburg (PS), Michael Sperberg-McQueen (MSM). <h1>Introductions: <p>PR ON, ME, problems of ms. transcription. Central to collation, because what you are collating is what you record. Visio Pauli, De Monarchia. Everyone has own approaches. <p> MSM: MHG, t.c. theory. Maas, Quentin, Greg, Dearing. <p> RK: no theory; dirty hands-on work in patristics. Epistle of Barnabas, Jewish epigrapha. Last 12 years a project to record all variants in LXX. Also some papyrology, mostly Greek, some Latin and Coptic. Lots of interest in how to record the data, also in how to deal with how others have recorded it. <p> RC: t.c. inter alia. For last 10 years, the feeling that practitioners in my field haven't worked very well. Growing consensus that much of the work simply can't be sustained anymore. Premises suspicious and doubtful. Provide encoding to allow critical work which is textual in some sense and in other senses broader. Problems of mixed orthographic strata: require normalization before machine treatment. Retroversions, normalizations important. RK you need to preserve it as evidence for your hypothesis, but you need to normalize it out to see what you are doing. <p> IL: editing Ren dictionaries, literary works, etc. From print and mss. Since 1983; been looking around for an encoding method since 1984. A personal reason for wanting success. Have taught Renaissance bibliography since 1974; edition of two plays in Revels series. Involved with REED; 11 volumes now. Served as member of AI3, literary work group, which evinced strong dissatisfaction with P1. MLA survey went out in June. At Winnipeg meeting, unhappy with t.c. section since seemed unrelated to work I've been doing since 1978. <p> P1 does not handle features people are encoding. SGML has been imposed on the field instead of the usage of the field being consulted before finding a model. TEI has failed to capture the group of people interested in editing, --- those interested in interpretation as well. I had not wanted to undermine interest in collation or stemmatics, but t.c. depends on proper realization of text features, including structure. <p> PR I share the suspicion of an imposed model and fear that we can be prevented from doing something important because someone will say `SGML says you cannot do that.' Let us deflect the accusation that a hierarchy is required by creating a tag system which will work at the bottom level. I feel fairly confident that this is feasible. <p> MSM n.b. SGML was adopted tentatively and subject to the requirements of scholarship. The design goals are quite clear: adequacy for scholarship is prior to conformance. <p> PS: general and textual editor of Garland Thackeray. For a long time all the idiot work has been done with computers: collation, apparatus generation, ... Typesetting with TeX. Because designed own package of software, designed own encoding scheme. To some extent, as I come to this scheme, I see a <emph>lot</emph> more than I ever did or wanted to, in a system that may be mnemonic for some people but not for me. A certain amount of resistance results. The second reason I'm interested is a long-time relation with MLA committee on scholarly editions. A general interest in dissemination of scholarly editions; I've watched as a number of editions have moved to computers. Anything which will make it possible for people in this field to share and help each other will be a good idea; I'm not convinced about this one way or another, for lack of information. Third, interested in move toward electronic editions rather than electronic tools for production of paper editions. You'll need to be prophet to know exactly what that means, but some of us have had visions. This has driven me back to asking what literary texts are. P1 seems to isolate too many details; better than isolating too few. <p> What we have is an encoding system in anticipation of a non-existent set of software. This is the reverse of the usual practice: everywhere else, the software has come first. If there was a piece of software out there which convinced me, I would use this, but to encode a text in anticipation of unrealized software seems crazy. The one good thing is that P1 seems to be in some sense a spec of what software is going to have to do. If we can identify the right set of features. PR there is a real danger of finding yourself trapped in a ghetto of one sort or another, especially if you work specifically with one piece of software. PS You are not going to find me working in this form until there is a lot more software out there. I'll use my own format. RK I won't either, but if we focus upon the transfer aspect, this becomes a set of features which this tag set wants to see. <h1>Rationale for this section <p> PR Not only why does T.C. require special treatment but why we think this is a useful model. I want some useful interchange format to go from my local form into other local forms. This scheme as a common denominator, to allow various people to interrogate each others' texts without having to know the details of all the original encoding. RK analogy with LDS 'standard' form for genealogical data. I envision what we are talking about in that way. <p> RC: it reduces the load from N**2 to 2N. PS ? MSM (explains). PS it would be very useful if there were a map of fiddly things that <emph>could</emph> be coded and of interest, not only publisher city and date, etc. --- things that almost no one thinking of electronic text actually is thinking of: even if you don't specify how they are to be encoded. We are also talking about standards of accuracy. What is necessary to ensure that the transcribed text accurately reflects the copy text? Having identified these things, we can just say 'When you are encoding texts for your projects, consider these as things you might wish to capture for your project.' P1 does contain this, but it's all mingled with specific proposals for how to encode them. Need 1 justification for standard, 2 a set of features possibly worth encoding, 3 a specification of an encoding scheme into which you may wish to translate stuff. PR we need a rationale for this scheme. <h2>2.1. Digression on hierarchy. PR if I were arguing for SGML with many people, I would leave hierarchy out of it entirely. Can we make some general statements about SGML? MSM reference manual. tutorials. IL what is order of topics in tutorials? PS important to bring them in very gently. You emphasized the importance of respecting what scholars do; it is also important to affect favorably what they do: there are a lot of neo-editors who need instruction in our profession. These need to be tutorials not just in TEI but in the field of electronic texts. The tutorial should begin with a justification of standard, and the relationship between a standard scheme and a local scheme. and a list of features that scholars in the field have found important / useful to record, devoid of all threatening notation. Explicit encouragement of accuracy. Third, a statement about comprehensiveness of TEI spec and its relation to smaller set of a project's individual scheme. (Several): need to talk about TEXT before you talk about SGML. MSM, cont'd. Case book. IL some of the examples need to be complete texts. PS: need pictures. IL: pictures of the originals; PR original, a project-specific transcription, a TEI transcription, and possibly a transduction program. The TC work group needs to draft the reference section of t.c., but also to consider drafting a tutorial for t.c. work. IL are there common programs in the field which would benefit from translation routines? PS: when should the tutorials be done? by whom? is there funding? RC: can't we put the relevant tutorial information into the tutorials and leave it all out of the reference manual? PR the tutorial is assuming increasing importance as our route to reaching people. [lunch] <!> <h1>Encoding text critical features <p>Examples from Wife of Bath. Three tiers: letters of text, possibly using entities; tags surrounding regions of text for things like abbreviations, deletions, illegibility, etc.; annotation. <xmp> <![ cdata [ ex<abb type=brev repr=&crossedthorn;>per</abb>iment ex<del type=underdotting resp=scribe1>peri</del>ment ex<unreadable type=phys.dam length=4>ment</unreadable> <rdg type=normalis. resp=PR orig='ex&per;iment'>experiment</rdg> <ex<add type=interl resp=hand2>per</add>iment experiment <ed.comm>a totally stupid letter</ed.comm> ]]> </xmp> Need also lacuna (in binding, successful deletion, lost parchment ...) RK n.b. physical loss of reading differs from successful deletion; RC important to work carefully on the typology --- one can find oneself boxed in if one isn't careful. IL what about 'lr&emacron;' for 'letter'? PR: the abbreviation here is the 'lre' not the 'e-macron' --- so put the ABB tag around the 'lre' not around the 'e'. RK are we supposed to be reinventing the work of other groups? MSM we can modify it, but should not have to reinvent it. Discussion of use of entity references for such abbreviations; IL feels that w with superscript t and yogh with superscript e or t are fundamentally different from 'Dr.' and similar examples of the ABBREV tag. You need to be able to distinguish the 't' over the 'w' from a 'w' followed by a superscript 't' (with and without a dot under the t?) You need to be able to represent this, even though many people will not represent all these distinctions. RC you are diverging consciously from the ABBREV tag, right? to get the full form into the element content. PR right, but I'm not sure that's essential. RK technically, how you record a text isn't our problem, so your first line with ABB is really out of our competence. PS is our Q what is the best way to do this, or is our Q what is the best way to do this? Is it what we would like transcriptions to do, or how to ensure that our system can allow people to do things differently if they wish? RK should we be reinventing the ABB tag? Isn't that doing text representation? ... PS a hierarchical system of access to <emph>all</emph> of such information would be possible in an electronic text. Is the 'per' example perfectly unambiguous? PR in this context, yes: no doubt. PS shouldn't we be dealing with what this encoding system can handle rather than worrying about whether our colleagues are stupid? IL not the role of the TEI to recommend one style of encoding over another; if one chooses a very detailed style of transcription, the question must be how does this system handle it? ABBREVIATIONS: Possible LIG tag <lig>bl</lig>. Or entity references? Examples: Petty's transcription of Langland. N.B. the superscript 'e' in 'boþe' is brought down to the base line in the transcription. (Tags for what is not there and for what is added ...) MS damage with readings supplied from Skeat: how to tag? ADD? but Skeat did not insert this. mss smudge; hole after writing. Elision. ms exists here and I can't make it out. +- was here, and the parchment gone water damage partial vs. total damage: maybe was here, papyrus torn in any case is here but we can't read it damage to material or damage to writing scraping, cut away, torn away, eaten away material does exist but you can't read it space left in ms intentional / unintentional. deletion / damage / omission or elision physical things, formatting things, editorial interventions +/- I can read this +/- there was something here +/- I don't know there was something here +/- RK: divide and conquer. Easy: non-intentional: failures of material (torn, cut away, ...) (includes papyrus with front piece torn away but back piece still there) worm holes (physical lacuna) obliterated (cannot read in whole or part: coffee stain...) - accretion something stuck on it (an old protective screen) binding too tight stains, smudges (insect excrement or smears) - diminution ink washed away fading accretion of pseudo-text: the insect dropping misread as a comma in a D. H. Lawrence ms., comma into semicolon, stray pencil marks Intentional: scribal, non-scribal, author/amanuensis, post-scribal damage intentional deletions / cancellations / overwriting additions Intentional infliction of any of the 'unintentional' cases Deletions on the line: (successful or unsuccessful) physically cuts out (unlike accidental physical lacuna) left blank (e.g. for rubrication) erasure: scraping, rubbing or other inked over (style) ie marked for deletion overwriting, overtyping paste over, paint over / white out PS (what do we do with) mistaken cancellation a line which goes too far a line which does not go far enough ditto for later hand IL does Center for Scholarly Editions have a handbook? Possible sources: TLG, Duke Documentary Papyri, Belles Lettres, ... Bowers has a very long article on MS notations. Studies in Bibliography, 1975 or 76. A totally unusable system, in PS's opinion, but Bowers did use it. A system for indicating alterations in a MS. PR an article in Text on this topic as well. What auxiliary information is useful? size of the cutout hole, ... how many letters are missing? 'ca. 14 letters' damage dimension, in letters and/or absolute size. Is there any trace of the letters there? What kind? (This is where you get all these special brackets.) Is our distinction between intentional and unintentional change useful? As a heuristic, yes. PS distinguish agency of change, time, method, extent, and degree of success. What about bad writing? New category: 'it is there but we have problems with it': Problems of interpreting what is present: difficulty of reading the characters difficulty in interpreting abbreviations, ... illegible: I cannot make heads or tails questionable reading: I think I can make it out or I can at least guess RK distinguish not clearly written / not fully written / not conventionally presented or spelled. How do I represent it? And how do I represent its interpretation? What if the author simply misspelled? You emend it, but you need to note it. Normalization... PS Random Cloud argues that a lot of what editors have done to texts is a result of their ignorance of typography. In an era when the apostrophe was not standard for possessives, a final -s might be either possessive or plural. There is a good line which can be read either way; making this line accessible to the modern reader means drawing their attention to this ambiguity. IL 'questionable' suggests uncertainty about the letter forms. 'conjecture' is what i think he may have meant to write. What to do about 'not fully written' --- suspension (initial ...) contraction (initial and final) brevigraph (runic M for madhr) superscription Some have letters, some have symbols. Some have indicators, some have not. Tutorial needs to give several possible ways of handling the classification of abbreviations. RK do we have a principled reason for using entity references in one case and a tagged version in another? (again, but no resolution) MSM do we wish to maintain the IL we have done 5.4: PR's notations are an improvement. SIC should be labeled useful only in few cases. CORR ok. NORM I have problems with. PS we are trying to establish guidelines for one transcription from which several versions can be generated. I use CASE software to do several things; from a diplomatic transcription I can run several programs to produce work paper, page breakup, and clean copy. ... Under those circumstances, even though I agree that NORM is a despicable thing to do to anything, we might need to keep it anyway. in mss you may get something you can read as a number or as a word. RK can we talk about ambiguous abbreviations? e.g. this can be a number, or something else. A certainty attribute? Possibly, if we can figure out how to define its meaning. <!> <h1>Encoding text critical apparatus: methods (2d day) PR it seems we have no support from anyone for single end-point attachment. RK what assumptions are we making about the form people are using? Is it part of our mission or job to make recommendations? PR double end-point attachment works very well from computational point of view, but looks very unlike what people have been producing before. --- Example: text file: (1) Experience (2) thogh noon ... apparatus file: <app start=1 end=2>Experiment La Sl1</app> PS in editing American / British literature of last three centuries, no one does this. Apparatus never explicitly marks the lemmata; only page and line numbers. The information about what is varied does exist, but only implicitly. PR this is very close to the 'location-referenced method' I proposed. RK why would anyone wish to have apparatus in a file separate from the base text? (A: CD-ROM contains base text, which one wishes to work from.) RK we should not be trying to replicate the form of printed apparatus; we should be looking for the most streamlined system that can allow easy computational manipulation of the data. IL I agree we want encoding to permit software to produce complex editions. But we also must serve the humans who want to produce texts themselves. In 10 years it may be agreed that running machine collations are the only way to produce critical editions; but there will always be people who need to produce editions 'manually' --- without the intervention of automatic collation software. MSM isn't this problem structurally identical to the one we've been discussing? PS consider Post-Modern Culture. What the editors have apparently done is to create a text in a word processor, using minimum formatting so it can be transmitted over the net. No hidden control characters, no tabs, etc., but they have tried to make it looks pretty. There are end-notes, etc. MSM people like this clearly do not need the TEI: if they are happy without markup, they don't need a markup scheme. RC It is possible to join what we are doing with what people are doing in things like PMC. Though P1 is 'document'-oriented, the strength of the underlying assumptions is that P1 is information-structuring techniques. Your fundamental problem is to build a knowledge base or information base from which one can generate all sorts of different materials. Our job is as information architects. Probably the phrase 'critical apparatus' should not be used in the document: that's a paper concept. PR: example of location reference (separate apparatus): <xmp> <![ cdata [ <app loc=L1> <rdg>experiment</> <vargroup>La Sl1 Eg2 Ad3 </> </app> <app loc=L1> <lemma>Experience</> <rdg>Experiment <vargroup>La Sl1 Eg2</> <rdg>Exeryment <vargroup>Ad2 Ha2 Ha4</> </app> ]]> </xmp> RK can we handle details on the status of individual readings within a group? If a reading is attested by five witnesses, and I want to say 'A is conjectural, B is retroverted, E is dubious' how do I do it? Allow VARGROUP to nest (and rename WIT) --- use it to replace witness.detail. RC what do we do about things like retroversions? Discussion of example: Hebrew text has putative variant whose witnesses include LXX reading; LXX reading relies on Armenian version. One formulation of problem: critical Hebrew text will have a reading whose witness is 'LXX'. editor may leave it at that, or include parenthetical evidence for LXX reading in form of Gk mss The evidence for the LXX may include a reading whose witness is 'Armenian'. editor may leave it at that or include variants for the Armenian RK plus, minus, substitutions, letter variants (orthographic etc.), paleographic, transposition. I want to be able to search for these. PS all this seems to involve many mss and very shallow treatment of them. Modern editors will tend to have fewer witnesses and treat them in much greater detail. IL true. Example of First Folio, where one may or may not include details of stop-press variants in an apparatus. PS we also need a system to distinguish levels of authorial revision. In modern editions one often develops systems for marking cancellations and insertions. In George Eliot edition early volumes there was no full notation system, since they handled cancellations but not insertions. [lunch] <h1>Return to miscellaneous features Marginalia: Need <ol> <li>to mark the span of text to which the marginalium applies; <li>to specify the content of the marginalium; <li>may need to describe in some detail what is happening in the text (is there an anchor? is there cancellation?) </ol> Should there be a MARGINALIUM tag? MSM argues no: treat function as basic, not location: note and var, not margin... IL can't we say NOTE place=margin? MSM yes. Problem of treating marginal notes in mss. If ms B has a 'Symmachus says X here' in the margin... IL marginalia: transcribing marginal index notes. These serve in my text as a sort of index -- I don't think of them as notes, but as index or toc entries. Running subtitles. PS these will occur primarily in works intended to be shown: published texts or finished mss for presentation. Not in authorial mss or other working documents. PS when I edited Henry Esmond, I used original running titles as shoulder notes. Lots of people have said "that doesn't look like the original edition". One of Kidd's objections to Gabler's edition is that J Joyce apparently engineered that certain passages face each other across the gutter. This falls under a larger distinction between the 'linguistic text' and the 'production text' (Jerry McGann) or 'material text' (PS). The meaning of the material text can go far beyond that of the linguistic text. RK I can find 5 uses for non-text-block material: variant or variant-related content pointers (marginal or page headers) IDs (page numbers, line numbers, columns, letters down gutter) (location ...) cross references to other passages (Homer, Bible, ...) symbol designating a kind of text (quotation, paragraphus, ...) (testimonium) IL is it implicit that running titles become similar to notes? I think of them as more like indexes. (MSM claiming that notes are all notes...) PS you are trying to distinguish the linear progression of semiotic signs which make up a text from the ancillary objects which people attach to it. MSM there is a real danger of reifying the characteristics of a specific technology. RK sense of the meeting is that NOTE needs to be re-analyzed. MSM asks for specific proposals. IL MARGINALIA, RUNNING-TITLE. RC skeptical of MARGINALIUM. Masoretic notes sometimes annotation, sometimes variants. IL asks what MSM will do with ruminations on NOTE. MSM asks for specific proposals; IL asks whether this means all the discussions of the last two days have been without import, since no formal motions have been made. MSM most of the discussion is more clearly related to text criticism. RK no, it's all been basic text representation. RC the problem is that P1 does not deal with the codex; codex problems were postponed. Quantitative variations: RK asks whether he will be able to record the length of (say) omissions. PR says perhaps should add an attribute to the RDG tag within the parallel segmentation method. <xmp> <![ cdata [ <anchor id=a1>Experience <anchor id=a2> ... <app begin=a1 end=a2> <rdg>&zero;<wit>A B C <rdg type=om status=... length=4/12><wit>X </app> This <var><rdg type=om><wit>A <rdg type=sub>The <wit>B </var> <var><rdg type=om><wit>A <rdg>woman<wit>B </var> ]]> </xmp> Additions: <xmp> <![ CDATA [ Experience ... Experiment of good food <anchor id=a1>Experience<anchor id=a2> <app start=a1 end=a2><rdg type=sub>Experiment <wit>A</app> <app start=a2 end=a2><rdg type=add>of good food <wit>A</app> <var> ]]> </xmp> Or (omitting START or chasing tail) <xmp> <![ CDATA [ Experience ... Experiment of good food <anchor id=a1>Experience <app start=a1><rdg type=sub>Experiment <wit>A</app> <app id=a2 start=a2><rdg type=add>of good food <wit>A</app> <app><rdg type=add>of good food <wit>A</app> <var> ]]> </xmp> Straw vote: RK is for attaching inline apparatus after first word of the variant; IL also. PR for beginning of span. RC, MSM for end of span. Consensus: attach at end of span. <xmp> <![ CDATA [ <!ELEMENT var - - (rdg, wit)+ > <!ELEMENT var - - (vl+) > <!ELEMENT vl O O (rdg, wit) > ]]> </xmp> RC where do we put editor's evaluation of reading? Consensus: on the RDG or WIT tags. RC skeptical: he needs much better structure. PR confidence levels for readings, tag for editorial opinion on origin of reading, direction of variation (derivation from another reading, which may not exist in the text) <!> <h1>Encoding critical apparatus: elements and their relationships <!> <h1>Encoding sample existing or putative editions <!> <h1>Tasks for group: volunteers to draft First task: all to develop tag set from minutes, working together. For the moment, keep among this group; when we have consensus, we should put this before the discussion group as a whole. Second task: examples (by email). Schedule: minutes by end of week, agreement in smaller group asap, reaction from <!> </body> </gdoc>

.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 .* Document proper begins. .sr docdate '30 November 1991' Minutes of Chicago Meeting, <title>5-6 October 1991 <author>C. M. Sperberg-McQueen <docnum>TEI &docfile. <date>&docdate. </titlep> <!> </frontm> <!> <body> Present: Robin Cover (RC), Robert Kraft (RK), Ian Lancashire (IL), Peter Robinson (PR), Peter Shillingsburg (PS), Michael Sperberg-McQueen (MSM). <h1>Introductions: <p>PR ON, ME, problems of ms. transcription. Central to collation, because what you are collating is what you record. Visio Pauli, De Monarchia. Everyone has own approaches. <p> MSM: MHG, t.c. theory. Maas, Quentin, Greg, Dearing. <p> RK: no theory; dirty hands-on work in patristics. Epistle of Barnabas, Jewish epigrapha. Last 12 years a project to record all variants in LXX. Also some papyrology, mostly Greek, some Latin and Coptic. Lots of interest in how to record the data, also in how to deal with how others have recorded it. <p> RC: t.c. inter alia. For last 10 years, the feeling that practitioners in my field haven't worked very well. Growing consensus that much of the work simply can't be sustained anymore. Premises suspicious and doubtful. Provide encoding to allow critical work which is textual in some sense and in other senses broader. Problems of mixed orthographic strata: require normalization before machine treatment. Retroversions, normalizations important. RK you need to preserve it as evidence for your hypothesis, but you need to normalize it out to see what you are doing. <p> IL: editing Ren dictionaries, literary works, etc. From print and mss. Since 1983; been looking around for an encoding method since 1984. A personal reason for wanting success. Have taught Renaissance bibliography since 1974; edition of two plays in Revels series. Involved with REED; 11 volumes now. Served as member of AI3, literary work group, which evinced strong dissatisfaction with P1. MLA survey went out in June. At Winnipeg meeting, unhappy with t.c. section since seemed unrelated to work I've been doing since 1978. <p> P1 does not handle features people are encoding. SGML has been imposed on the field instead of the usage of the field being consulted before finding a model. TEI has failed to capture the group of people interested in editing, --- those interested in interpretation as well. I had not wanted to undermine interest in collation or stemmatics, but t.c. depends on proper realization of text features, including structure. <p> PR I share the suspicion of an imposed model and fear that we can be prevented from doing something important because someone will say `SGML says you cannot do that.' Let us deflect the accusation that a hierarchy is required by creating a tag system which will work at the bottom level. I feel fairly confident that this is feasible. <p> MSM n.b. SGML was adopted tentatively and subject to the requirements of scholarship. The design goals are quite clear: adequacy for scholarship is prior to conformance. <p> PS: general and textual editor of Garland Thackeray. For a long time all the idiot work has been done with computers: collation, apparatus generation, ... Typesetting with TeX. Because designed own package of software, designed own encoding scheme. To some extent, as I come to this scheme, I see a <emph>lot</emph> more than I ever did or wanted to, in a system that may be mnemonic for some people but not for me. A certain amount of resistance results. The second reason I'm interested is a long-time relation with MLA committee on scholarly editions. A general interest in dissemination of scholarly editions; I've watched as a number of editions have moved to computers. Anything which will make it possible for people in this field to share and help each other will be a good idea; I'm not convinced about this one way or another, for lack of information. Third, interested in move toward electronic editions rather than electronic tools for production of paper editions. You'll need to be prophet to know exactly what that means, but some of us have had visions. This has driven me back to asking what literary texts are. P1 seems to isolate too many details; better than isolating too few. <p> What we have is an encoding system in anticipation of a non-existent set of software. This is the reverse of the usual practice: everywhere else, the software has come first. If there was a piece of software out there which convinced me, I would use this, but to encode a text in anticipation of unrealized software seems crazy. The one good thing is that P1 seems to be in some sense a spec of what software is going to have to do. If we can identify the right set of features. PR there is a real danger of finding yourself trapped in a ghetto of one sort or another, especially if you work specifically with one piece of software. PS You are not going to find me working in this form until there is a lot more software out there. I'll use my own format. RK I won't either, but if we focus upon the transfer aspect, this becomes a set of features which this tag set wants to see. <h1>Rationale for this section <p> PR Not only why does T.C. require special treatment but why we think this is a useful model. I want some useful interchange format to go from my local form into other local forms. This scheme as a common denominator, to allow various people to interrogate each others' texts without having to know the details of all the original encoding. RK analogy with LDS 'standard' form for genealogical data. I envision what we are talking about in that way. <p> RC: it reduces the load from N**2 to 2N. PS ? MSM (explains). PS it would be very useful if there were a map of fiddly things that <emph>could</emph> be coded and of interest, not only publisher city and date, etc. --- things that almost no one thinking of electronic text actually is thinking of: even if you don't specify how they are to be encoded. We are also talking about standards of accuracy. What is necessary to ensure that the transcribed text accurately reflects the copy text? Having identified these things, we can just say 'When you are encoding texts for your projects, consider these as things you might wish to capture for your project.' P1 does contain this, but it's all mingled with specific proposals for how to encode them. Need 1 justification for standard, 2 a set of features possibly worth encoding, 3 a specification of an encoding scheme into which you may wish to translate stuff. PR we need a rationale for this scheme. <h2>2.1. Digression on hierarchy. PR if I were arguing for SGML with many people, I would leave hierarchy out of it entirely. Can we make some general statements about SGML? MSM reference manual. tutorials. IL what is order of topics in tutorials? PS important to bring them in very gently. You emphasized the importance of respecting what scholars do; it is also important to affect favorably what they do: there are a lot of neo-editors who need instruction in our profession. These need to be tutorials not just in TEI but in the field of electronic texts. The tutorial should begin with a justification of standard, and the relationship between a standard scheme and a local scheme. and a list of features that scholars in the field have found important / useful to record, devoid of all threatening notation. Explicit encouragement of accuracy. Third, a statement about comprehensiveness of TEI spec and its relation to smaller set of a project's individual scheme. (Several): need to talk about TEXT before you talk about SGML. MSM, cont'd. Case book. IL some of the examples need to be complete texts. PS: need pictures. IL: pictures of the originals; PR original, a project-specific transcription, a TEI transcription, and possibly a transduction program. The TC work group needs to draft the reference section of t.c., but also to consider drafting a tutorial for t.c. work. IL are there common programs in the field which would benefit from translation routines? PS: when should the tutorials be done? by whom? is there funding? RC: can't we put the relevant tutorial information into the tutorials and leave it all out of the reference manual? PR the tutorial is assuming increasing importance as our route to reaching people. [lunch] <!> <h1>Encoding text critical features <p>Examples from Wife of Bath. Three tiers: letters of text, possibly using entities; tags surrounding regions of text for things like abbreviations, deletions, illegibility, etc.; annotation. <xmp> <![ cdata [ ex<abb type=brev repr=&crossedthorn;>per</abb>iment ex<del type=underdotting resp=scribe1>peri</del>ment ex<unreadable type=phys.dam length=4>ment</unreadable> <rdg type=normalis. resp=PR orig='ex&per;iment'>experiment</rdg> <ex<add type=interl resp=hand2>per</add>iment experiment <ed.comm>a totally stupid letter</ed.comm> ]]> </xmp> Need also lacuna (in binding, successful deletion, lost parchment ...) RK n.b. physical loss of reading differs from successful deletion; RC important to work carefully on the typology --- one can find oneself boxed in if one isn't careful. IL what about 'lr&emacron;' for 'letter'? PR: the abbreviation here is the 'lre' not the 'e-macron' --- so put the ABB tag around the 'lre' not around the 'e'. RK are we supposed to be reinventing the work of other groups? MSM we can modify it, but should not have to reinvent it. Discussion of use of entity references for such abbreviations; IL feels that w with superscript t and yogh with superscript e or t are fundamentally different from 'Dr.' and similar examples of the ABBREV tag. You need to be able to distinguish the 't' over the 'w' from a 'w' followed by a superscript 't' (with and without a dot under the t?) You need to be able to represent this, even though many people will not represent all these distinctions. RC you are diverging consciously from the ABBREV tag, right? to get the full form into the element content. PR right, but I'm not sure that's essential. RK technically, how you record a text isn't our problem, so your first line with ABB is really out of our competence. PS is our Q what is the best way to do this, or is our Q what is the best way to do this? Is it what we would like transcriptions to do, or how to ensure that our system can allow people to do things differently if they wish? RK should we be reinventing the ABB tag? Isn't that doing text representation? ... PS a hierarchical system of access to <emph>all</emph> of such information would be possible in an electronic text. Is the 'per' example perfectly unambiguous? PR in this context, yes: no doubt. PS shouldn't we be dealing with what this encoding system can handle rather than worrying about whether our colleagues are stupid? IL not the role of the TEI to recommend one style of encoding over another; if one chooses a very detailed style of transcription, the question must be how does this system handle it? ABBREVIATIONS: Possible LIG tag <lig>bl</lig>. Or entity references? Examples: Petty's transcription of Langland. N.B. the superscript 'e' in 'boþe' is brought down to the base line in the transcription. (Tags for what is not there and for what is added ...) MS damage with readings supplied from Skeat: how to tag? ADD? but Skeat did not insert this. mss smudge; hole after writing. Elision. ms exists here and I can't make it out. +- was here, and the parchment gone water damage partial vs. total damage: maybe was here, papyrus torn in any case is here but we can't read it damage to material or damage to writing scraping, cut away, torn away, eaten away material does exist but you can't read it space left in ms intentional / unintentional. deletion / damage / omission or elision physical things, formatting things, editorial interventions +/- I can read this +/- there was something here +/- I don't know there was something here +/- RK: divide and conquer. Easy: non-intentional: failures of material (torn, cut away, ...) (includes papyrus with front piece torn away but back piece still there) worm holes (physical lacuna) obliterated (cannot read in whole or part: coffee stain...) - accretion something stuck on it (an old protective screen) binding too tight stains, smudges (insect excrement or smears) - diminution ink washed away fading accretion of pseudo-text: the insect dropping misread as a comma in a D. H. Lawrence ms., comma into semicolon, stray pencil marks Intentional: scribal, non-scribal, author/amanuensis, post-scribal damage intentional deletions / cancellations / overwriting additions Intentional infliction of any of the 'unintentional' cases Deletions on the line: (successful or unsuccessful) physically cuts out (unlike accidental physical lacuna) left blank (e.g. for rubrication) erasure: scraping, rubbing or other inked over (style) ie marked for deletion overwriting, overtyping paste over, paint over / white out PS (what do we do with) mistaken cancellation a line which goes too far a line which does not go far enough ditto for later hand IL does Center for Scholarly Editions have a handbook? Possible sources: TLG, Duke Documentary Papyri, Belles Lettres, ... Bowers has a very long article on MS notations. Studies in Bibliography, 1975 or 76. A totally unusable system, in PS's opinion, but Bowers did use it. A system for indicating alterations in a MS. PR an article in Text on this topic as well. What auxiliary information is useful? size of the cutout hole, ... how many letters are missing? 'ca. 14 letters' damage dimension, in letters and/or absolute size. Is there any trace of the letters there? What kind? (This is where you get all these special brackets.) Is our distinction between intentional and unintentional change useful? As a heuristic, yes. PS distinguish agency of change, time, method, extent, and degree of success. What about bad writing? New category: 'it is there but we have problems with it': Problems of interpreting what is present: difficulty of reading the characters difficulty in interpreting abbreviations, ... illegible: I cannot make heads or tails questionable reading: I think I can make it out or I can at least guess RK distinguish not clearly written / not fully written / not conventionally presented or spelled. How do I represent it? And how do I represent its interpretation? What if the author simply misspelled? You emend it, but you need to note it. Normalization... PS Random Cloud argues that a lot of what editors have done to texts is a result of their ignorance of typography. In an era when the apostrophe was not standard for possessives, a final -s might be either possessive or plural. There is a good line which can be read either way; making this line accessible to the modern reader means drawing their attention to this ambiguity. IL 'questionable' suggests uncertainty about the letter forms. 'conjecture' is what i think he may have meant to write. What to do about 'not fully written' --- suspension (initial ...) contraction (initial and final) brevigraph (runic M for madhr) superscription Some have letters, some have symbols. Some have indicators, some have not. Tutorial needs to give several possible ways of handling the classification of abbreviations. RK do we have a principled reason for using entity references in one case and a tagged version in another? (again, but no resolution) MSM do we wish to maintain the IL we have done 5.4: PR's notations are an improvement. SIC should be labeled useful only in few cases. CORR ok. NORM I have problems with. PS we are trying to establish guidelines for one transcription from which several versions can be generated. I use CASE software to do several things; from a diplomatic transcription I can run several programs to produce work paper, page breakup, and clean copy. ... Under those circumstances, even though I agree that NORM is a despicable thing to do to anything, we might need to keep it anyway. in mss you may get something you can read as a number or as a word. RK can we talk about ambiguous abbreviations? e.g. this can be a number, or something else. A certainty attribute? Possibly, if we can figure out how to define its meaning. <!> <h1>Encoding text critical apparatus: methods (2d day) PR it seems we have no support from anyone for single end-point attachment. RK what assumptions are we making about the form people are using? Is it part of our mission or job to make recommendations? PR double end-point attachment works very well from computational point of view, but looks very unlike what people have been producing before. --- Example: text file: (1) Experience (2) thogh noon ... apparatus file: <app start=1 end=2>Experiment La Sl1</app> PS in editing American / British literature of last three centuries, no one does this. Apparatus never explicitly marks the lemmata; only page and line numbers. The information about what is varied does exist, but only implicitly. PR this is very close to the 'location-referenced method' I proposed. RK why would anyone wish to have apparatus in a file separate from the base text? (A: CD-ROM contains base text, which one wishes to work from.) RK we should not be trying to replicate the form of printed apparatus; we should be looking for the most streamlined system that can allow easy computational manipulation of the data. IL I agree we want encoding to permit software to produce complex editions. But we also must serve the humans who want to produce texts themselves. In 10 years it may be agreed that running machine collations are the only way to produce critical editions; but there will always be people who need to produce editions 'manually' --- without the intervention of automatic collation software. MSM isn't this problem structurally identical to the one we've been discussing? PS consider Post-Modern Culture. What the editors have apparently done is to create a text in a word processor, using minimum formatting so it can be transmitted over the net. No hidden control characters, no tabs, etc., but they have tried to make it looks pretty. There are end-notes, etc. MSM people like this clearly do not need the TEI: if they are happy without markup, they don't need a markup scheme. RC It is possible to join what we are doing with what people are doing in things like PMC. Though P1 is 'document'-oriented, the strength of the underlying assumptions is that P1 is information-structuring techniques. Your fundamental problem is to build a knowledge base or information base from which one can generate all sorts of different materials. Our job is as information architects. Probably the phrase 'critical apparatus' should not be used in the document: that's a paper concept. PR: example of location reference (separate apparatus): <xmp> <![ cdata [ <app loc=L1> <rdg>experiment</> <vargroup>La Sl1 Eg2 Ad3 </> </app> <app loc=L1> <lemma>Experience</> <rdg>Experiment <vargroup>La Sl1 Eg2</> <rdg>Exeryment <vargroup>Ad2 Ha2 Ha4</> </app> ]]> </xmp> RK can we handle details on the status of individual readings within a group? If a reading is attested by five witnesses, and I want to say 'A is conjectural, B is retroverted, E is dubious' how do I do it? Allow VARGROUP to nest (and rename WIT) --- use it to replace witness.detail. RC what do we do about things like retroversions? Discussion of example: Hebrew text has putative variant whose witnesses include LXX reading; LXX reading relies on Armenian version. One formulation of problem: critical Hebrew text will have a reading whose witness is 'LXX'. editor may leave it at that, or include parenthetical evidence for LXX reading in form of Gk mss The evidence for the LXX may include a reading whose witness is 'Armenian'. editor may leave it at that or include variants for the Armenian RK plus, minus, substitutions, letter variants (orthographic etc.), paleographic, transposition. I want to be able to search for these. PS all this seems to involve many mss and very shallow treatment of them. Modern editors will tend to have fewer witnesses and treat them in much greater detail. IL true. Example of First Folio, where one may or may not include details of stop-press variants in an apparatus. PS we also need a system to distinguish levels of authorial revision. In modern editions one often develops systems for marking cancellations and insertions. In George Eliot edition early volumes there was no full notation system, since they handled cancellations but not insertions. [lunch] <h1>Return to miscellaneous features Marginalia: Need <ol> <li>to mark the span of text to which the marginalium applies; <li>to specify the content of the marginalium; <li>may need to describe in some detail what is happening in the text (is there an anchor? is there cancellation?) </ol> Should there be a MARGINALIUM tag? MSM argues no: treat function as basic, not location: note and var, not margin... IL can't we say NOTE place=margin? MSM yes. Problem of treating marginal notes in mss. If ms B has a 'Symmachus says X here' in the margin... IL marginalia: transcribing marginal index notes. These serve in my text as a sort of index -- I don't think of them as notes, but as index or toc entries. Running subtitles. PS these will occur primarily in works intended to be shown: published texts or finished mss for presentation. Not in authorial mss or other working documents. PS when I edited Henry Esmond, I used original running titles as shoulder notes. Lots of people have said "that doesn't look like the original edition". One of Kidd's objections to Gabler's edition is that J Joyce apparently engineered that certain passages face each other across the gutter. This falls under a larger distinction between the 'linguistic text' and the 'production text' (Jerry McGann) or 'material text' (PS). The meaning of the material text can go far beyond that of the linguistic text. RK I can find 5 uses for non-text-block material: variant or variant-related content pointers (marginal or page headers) IDs (page numbers, line numbers, columns, letters down gutter) (location ...) cross references to other passages (Homer, Bible, ...) symbol designating a kind of text (quotation, paragraphus, ...) (testimonium) IL is it implicit that running titles become similar to notes? I think of them as more like indexes. (MSM claiming that notes are all notes...) PS you are trying to distinguish the linear progression of semiotic signs which make up a text from the ancillary objects which people attach to it. MSM there is a real danger of reifying the characteristics of a specific technology. RK sense of the meeting is that NOTE needs to be re-analyzed. MSM asks for specific proposals. IL MARGINALIA, RUNNING-TITLE. RC skeptical of MARGINALIUM. Masoretic notes sometimes annotation, sometimes variants. IL asks what MSM will do with ruminations on NOTE. MSM asks for specific proposals; IL asks whether this means all the discussions of the last two days have been without import, since no formal motions have been made. MSM most of the discussion is more clearly related to text criticism. RK no, it's all been basic text representation. RC the problem is that P1 does not deal with the codex; codex problems were postponed. Quantitative variations: RK asks whether he will be able to record the length of (say) omissions. PR says perhaps should add an attribute to the RDG tag within the parallel segmentation method. <xmp> <![ cdata [ <anchor id=a1>Experience <anchor id=a2> ... <app begin=a1 end=a2> <rdg>&zero;<wit>A B C <rdg type=om status=... length=4/12><wit>X </app> This <var><rdg type=om><wit>A <rdg type=sub>The <wit>B </var> <var><rdg type=om><wit>A <rdg>woman<wit>B </var> ]]> </xmp> Additions: <xmp> <![ CDATA [ Experience ... Experiment of good food <anchor id=a1>Experience<anchor id=a2> <app start=a1 end=a2><rdg type=sub>Experiment <wit>A</app> <app start=a2 end=a2><rdg type=add>of good food <wit>A</app> <var> ]]> </xmp> Or (omitting START or chasing tail) <xmp> <![ CDATA [ Experience ... Experiment of good food <anchor id=a1>Experience <app start=a1><rdg type=sub>Experiment <wit>A</app> <app id=a2 start=a2><rdg type=add>of good food <wit>A</app> <app><rdg type=add>of good food <wit>A</app> <var> ]]> </xmp> Straw vote: RK is for attaching inline apparatus after first word of the variant; IL also. PR for beginning of span. RC, MSM for end of span. Consensus: attach at end of span. <xmp> <![ CDATA [ <!ELEMENT var - - (rdg, wit)+ > <!ELEMENT var - - (vl+) > <!ELEMENT vl O O (rdg, wit) > ]]> </xmp> RC where do we put editor's evaluation of reading? Consensus: on the RDG or WIT tags. RC skeptical: he needs much better structure. PR confidence levels for readings, tag for editorial opinion on origin of reading, direction of variation (derivation from another reading, which may not exist in the text) <!> <h1>Encoding critical apparatus: elements and their relationships <!> <h1>Encoding sample existing or putative editions <!> <h1>Tasks for group: volunteers to draft First task: all to develop tag set from minutes, working together. For the moment, keep among this group; when we have consensus, we should put this before the discussion group as a whole. Second task: examples (by email). Schedule: minutes by end of week, agreement in smaller group asap, reaction from <!> </body> </gdoc>