TEI MI M 01 (draft)Draft of TEI XML Migration Task Force Meeting Minutes, 2002-10-13/14

Initials Used for People

SB Syd Bauman
AB Alejandro Bia
LB Lou Burnard
JH Jessica Hekman
TR Tobias Rischer
CR Christine Ruotolo
NS Natalia Smith
ST Syun Tutiya
JU John Unsworth
CW Christian Wittern

Meeting took place in the Claridge Hotel, Chicago IL, USA on Sunday 13 and Monday 14 October. All times listed are the local-time in Chicago.

Commenced ~13:28 with SB, LB, AB, JH, TR, CR, NS, ST; CW joined ~13:50.

Introductions.

Objectives
Survey
Identify . . .
Minimally invasive vs. canonical
Processing environment
Discussion of problems found in samples
Case Studies
Dividing up Labor for Writing up Reports
- Strategic document: MI W 02 Strategic Considerations in Migrating TEI documents from SGML to XML.
- MI W 03 Practical Guide to Migration of TEI Documents from SGML to XML

Objectives

Review of list of objectives from our charge.

CR Q: are DTDs in scope? Consensus is that they are, but because few people will need help here, low priority. CR: plan to have relatively vague suggestions in recommendation documents.

CR suggests our focus should be on P3->P4. Consensus is that outlining tasks for P3->P4(XML) will include all steps needed for P4 SGML -> P4 XML. Asks if we want to have more of an advocacy role. LB answers yes. SB agrees, but wonders if any advocacy is necessary. LB points out that (disregarding extensions) a P3 document is ipso facto a P4 document.

Brief discussion of why a project wants to move to XML: access to new technologies, new tools (XML); non-support of P3 (P).

We have 2 sections of document already! 1. Scoping; 2. Motivation.

Case studies.

SB asks do we need a test suite? Is it hard to make? JH is concerned we may not be able think of 'em all.

LB points out difference between test suite and using samples. Thinks we need to ascertain what practices are via survey (#2).

Action 1: LB 2002-10-22 Remind MP to send OTA materials

Action 2: SB 2002-10-22 Send WWP samples

CR asks whether or not we need test suite. SB asks how hard is it to do? CW suggests start with list of differences between XML and SGML. ¹ .

Question comes up as to who we are surveying: SB holds repository reps insufficient to survey.

Summary that we will not use test suite, but rather results of survey of real cases, perhaps augmented with a fabricated test if deemed necessary.

Although software development is not an output of this group, suggestions for areas ripe for new tools or modifications to existing ones are.

Modifications to ED W 76 made.

Action 3: CR 2002-11-01 Write preliminary work-plan and circulate to list

Survey

CR: not too many responses. ²

CW explains the recent experience of Character Set WG with its survey.

SB suggests as only 50 projects listed on TEI website, perhaps phone survey. Generally disliked, but LB counter proposes e-mail with caveats of privacy. LB likes e-mail and phone call. Question discussed about whether we just want files or answers to survey questions, too.

So, after identification stage letter asking a very brief survey culminating with asking for only a small data sample (no DTD or other supporting files should be explicitly requested). Non-respondents to be contacted by phone. Respondents for whom we have questions followed-up by e-mail. Also a thank-you.

Five stages of survey project:

Identification of projects using TEI (SGML)
Survey letter for collection of samples. Telephone follow-up of non-responders (repository group to help)
Analysis stage: divvy up sample files and check for various features.
Follow-up based on number and nature of samples — e.g., asking for DTDs when needed, getting info on technical, organizational challenges and opportunities

Samples will then be checked against a checklist of issues.

Action 4: TR 2002-11-01 Create MI W 04, the checklist for stage 3 examination of files

Action 5: CR ?? Develop database of contacts

Action 6: SB 2002-10-23 Follow up on JF's survey (of which data went to JU), find out where the data is.

Action 7: LB 2002-11-01 Develop list of projects that use TEI to which we should send survey, get data

Action 8: all 2002-10-26 Send LB any projects you know of

Action 9: CR & SB 2003-01-02 Draft survey letter asking for samples and asking questions

Action 10: LB 2002-11-01 Look for "tei" on HUMBUL; coordinate the great TEI search.

Action 11: SB 2002-10-26 Draft "stand up and identify yourself" letter

XML4LIB, TEI-L, HUMANIST, BIBLIOTECH, DIGLIB, LINGUIST, CORPORA, ANSAX-L,

Action 12: SB & LB 2002-11-15 Get a list of lists from MF and get "stand up and identify yourself" letter posted to all lists (including above).

Identify . . .

Split out technical to expert group, organizational to repository group.

Action 13: CR 2002-10-28 Initiate organizational discussion in repository group.

Discussion of order of objectives in Charge. Decided charge is really unordered, not to worry about it. CR to provide order in work-plan.

Decided to discuss further issues (e.g. XPointer and other P5ish issues) in appendix to output reports.

Adjourned ~17:18.

Minimally invasive vs. canonical

Commenced ~09:15.

CR reviews discussion from list.

Discussion of whitespace. General agreement that we need to try to munge source whitespace so that parsed whitespace matches.

Discussion of character entity references. LB argues that in migration character entity references should be converted to characters or numeric character references. Consensus is to have prose discussing reasons for desiring this conversion (that later XML processes won't be able to handle character entity references), but to recommend it as an option.

Discussion of external entities.

Action 14: SB 2002-10-28 Ask Steve DeRose for his notes of what he did to convert P3 ODD files to P4 ODD files.

Consensus is the same as for the previous two: user option with discussion of why you'd prefer to use XInclude to system entities.

Discussion on DTDs: yes, we need to keep 'em. XML tools that won't do well-formedness work on files that specify a DOCTYPE declaration are broken, so it's not our problem.

Can address dirty hacks.

Comments: can't have comments inside other declarations; can't have multiple comments inside one comment; <!> not permitted.

Action 15: JH 2002-10-14 Investigate how comments are processed in SX or other tools

‘strategies’ document will have things like advising migrators to think about issues of, say, XInclude v. external entities. ‘practices’ document will have advise on how to convert to XInclude or how to migrate without converting.

In strategy document we should probably point out that more migrations in the future are likely, but that if you're happy with P4, TEI does plan to support it, you could just stay there.

Specification of defaulted attributes: we'll recommend not to specify them (and hopefully point out ways to migrate without them) unless you really need them.

Discussion of DTD conversion: we can't help those who did not use extension mechanism, but we should have a paragraph addressing the problems created by not doing so.

Strategic document should discuss the fact that migration may be an opportunity to improve your DTD.

CR: In technical report document we need to address

minimal conversion
easy conversion
conversion that maximizes XML tool usability
conversion that is forward-looking to P5, or at least what we can predict of P5.
in depth discussion of macro issues identified in samples

SDATA entity discussion. SB suggests three categories

characters that are in Unicode
characters that are not in Unicode
solutions, ala P4 chap 4.2.1
- CDATA
- PIs
- markup (<c>)
SB suggests we need to better describe the disadvantages of each method in our practices document
others
- ambiguous glyph
- glyph exists in Unicode with different meaning in the document
- temp data capture flags

Processing environment

LB points out difficulty in actually managing all the little pieces of a sample (or real) case. Corollary is that practices document needs to address catalog files.

Things to Consider

instances
DTD extension files
catalog files
style-sheets and other parts of processing environment

Add questions about processing environment to third round survey questions.

SDATA entities to be attacked by a separate individual in practices document.

Discussion of problems found in samples

TR: ??

LB: consultancy may be desirable. General agreement that a workshop on specific issue like, e.g. extension files, would be a good thing.

SB asks about recommending open source v. proprietary software. In resulting discussion LB points out that he'd prefer we say ‘this tool does this’ rather than make a recommendation ‘use this tool’.

LB sees only three strategies for obtaining tools for migration:

in-house development
buy proprietary tools
use open source

Action 16: JH 2002-11-04 Seek out vendors of useful tools, and contact them to find out rudimentary information about their tools.

Case Studies

CR expects repository reps to write up a case study each. Recommendations for tools & strategies should be ready by mid- to late-December to give repository reps a month to work before joint meeting.

Action 17: CR 2002-12-01 Write up a framework of feedback information we want from repository reps, MI W 01, Format for Case Study Feedback

Dividing up Labor for Writing up Reports

Strategic document: MI W 02 Strategic Considerations in Migrating TEI documents from SGML to XML.

Challenges, opportunities, and motivation.
Types or scope of migration (P3->P4 or P4->P4)
Areas of migration (instances, DTD extensions, catalog files, processing environment)
Levels of migration, e.g. minimal surgery approach, get almost to P5 approach, et. al.
Appendix: potential impact of future versions of the Guidelines on migration issues.

MI W 03 Practical Guide to Migration of TEI Documents from SGML to XML

DTD conversions
- SDATA (ST)
- Extension files (TR)
Instance conversion: tools. Issues: whitespace & comments, prologue & file structure (e.g. external entities) (JH)
Recommended work-flow (AB)

Section write-ups due 2002-12-02.

Action 18: CR 2002-11-25 Send reminder mail to group to get write-ups done in 1 week

Adjourned ~16:00.

Notes

Available from http://www.w3.org/TR/NOTE-sgml-xml; doesn't seem to be on the CD, though.

To posting to TEI-L of 2002-10-04 14:12-04 migrating TEI resources from SGML to XML.