Draft of TEI XML Migration Task Force Meeting Minutes, 2003-06-16/17
Initials Used for People
- SB Syd Bauman
- AB Alejandro Bia
- LB Lou Burnard
- TE Tomaž Erjavec
- JH Jessica Hekman
- TR Tobias Rischer
- CR Christine Ruotolo
- SS Susan Schreibman
- NS Natalia Smith
- JU John Unsworth
- JW John Walsh
- SW Sarah Wells
- FW Frans Wiering
- CW Christian Wittern
Meeting, hosted by the Biblioteca Virtual Miguel de Cervantes, took place at the Universidad de Alicante, on June 16th & 17th, 2003. All times are local (+01).
Commenced a bit late at ~10:50 due to bus delay ostensibly due to weather, with SB, LB, AB, JH, CR, NS, JW, SS, SW, TE.
The Chair and local host worked out a few schedule changes accordingly.
Group extends a warm and heartfelt ‘thank-you’ to AB and the Cervantes project for hosting this meeting.
Contents
Version Control
Question of whether or not we need our own version control system for the group. We have already suffered at least one failure (two people writing to the same file). LB doesn't think we need to, as there should always be only a two-way communication between author & technical writer (SW), and then technical writer & editors.
Process editors use to update TEI depot and website explained.
- author finishes section, sends to list,
- task force members send comments directly to list,
- author updates (canonical copy on TEI website);
- at some point ownership changes from author to technical writer (SW),
- from that point on comments continue to go to list, but only technical writer updates (canonical copy still on TEI website.)
For this meeting, SW will make changes, website will be updated afterwords.
osx
1.5 is the public version that contains some of JH's changes. New changes (since last meeting) have been checked in by JH, but (at request of OpenJade team) to two different branches, thus very difficult to even build it via CVS (and there are no tarballs, let alone binaries).
Consensus is that our documents need to be out before we can be assured that OpenJade will update osx, so we will plan on including migration steps that hack around OpenSP 1.5 limitations, but footnote them as not needed iff new version of osx is available.
Agreed to put any available binaries we have (cygwin or Mac OS X or whatever) on publicly available website. Consensus of group is that we have done what we reasonably could to get a Windows binary; we don't have access to the programming expertise needed, and on the theory that small projects can use the web interface, any larger project will have GNU/Linux or at least cygwin capability, we're going to all but stop trying.
Survey and Deadlines
Thus, we can establish deadlines by working backwards from that date.
Currently authors have ownership of case studies.
Reports: Macro Issues
CR raises issue that MI W 02 and MI W 03 have quite a bit of overlap — general agreement that there will be some parallel and coverage redundancy, but to continue policy of general overview for MI W 02 and details in MI W 03. Specifically CR's nice table on osx switches should be moved from MI W 02 to MI W 03.
- dropping support for P3
- no route from P3 directly to P5
- we don't know details yet but here are things that will most certainly be different
- miw02: CP & eds.
- miw03: none
- miw06: CR & SB
It was agreed that we need a consistent terminology of migrations.
Agreed to remove casual terminology, not address reader as ‘you’, and to refer to this task force as ‘we’.
Agreed that readers should see titles, although reminder to authors to please encode with an <xref> with url="./miwXX.html" (not <xptr> ).
Specific Document Issues
MI W 02 Introduction
SS & CR to take a crack at re-writing.
MI W 02 Motivations, Opportunities, Challenges
- P3 not supported;
- difficulty in P3 directly to P5;
- availability of tools and related specs.
On the topic of open source tools we decided that MI W 02 should simply point to TEI Software page. But the list of X- standards should be expanded
.and explained more?
Suggested re-write of para 1 sentences 2-4 of Motivation section to SS, who will re-write and post.
After a bit of discussion on the implications of ‘standards’ it was agreed to change ‘standards’ to ‘standards and specifications’.
The point of the Challenges subsection is to admit up front there are costs. SW has some prose for it.
Concern (SB & CR) that we have too many internal references. Consensus was that they're a good thing, but that after we've assembled documents into a whole we need to look over and see if there are too many.
MI W 02 Areas of Migration
At lunch SS & JH …
Unless someone can usefully fill this in, I'll just delete it.
MI W 02 Workflow
Add mention of ‘if old DTD via chef, make new one’.
Add sentence at end of DTD section that DTD extensions can be hard, whereas many will find instances easy. Include explicit pointer to section of MI W 03.
Catalog file section to mention that entity conversion is a pain. Decided to put this under processing environment. JH asks about XML catalog syntax. Consensus is that we will provide a pointer to further info. If software is available at the time, we'll mention it.
JH to later the "by ahnd" phrase.
if anyone has any clue what this might mean, please let me know. Otherwise, we nuke it.
Processing environment: XML tools more likely to stop at first error; more detail about delivery environment including index tools, web output generation stuff
SS is concerned over discrepancy between "is straightforward" (instance section) and TR's list (MI W 04). JH to change to can be straightforward.
MI W 02 General Recommendations
List of editors: disagreement among group, but in the end we decided that a disclaimer ‘you need to think about choosing your editor carefully, but we're not talking about particular editors’ probably with a pointer to somewhere that discusses this stuff.
We decided the wording of ‘Target production environment’ was a bit too strong — while projects should consider production environment first, it may be too much to actually have it running first. 1
The subsection on Training, once the tools discussion is removed, is pretty short, and therefore will be rolled into the Resources section.
Replace language of 2nd para of Migration method: with first para of Resources. Discussion of the issue of production stop or not. The language is to be toned down a bit.
Changes to final para of Resources section (SW has details).
Suggestion that the data testing rule of thumb ‘1, 10, 100%’ be used.
Add item to list in Other recommendations for ‘design and run test procedures’.
MI W 02 Special Considerations in Migration
Table of switches moves to MI W 03.
Question of whether a comment is a declaration or not arose — answer is that in SGML they are called ‘comment declarations’ while in XML they are ‘comments’ 2 .
Consensus was that we should recommend use of the -xpreserve-case and -xempty switches in all conversions that use osx.
Large discussion on what it means to be a robust conversion; then all conversion categories. Agreed to eliminate the explicit distinction into easy, minimally invasive, robust.
Discussion of ESIS — decided to eliminate reference to jargon here.
Discussion of being more generic than mentioning osx specifically.
New Topic: ‘supra-validation’
3JH suggests (and no objections raised) that a new paragraph or section about the various possible pitfalls of migration (including errors created by migration scripts, some of which might not be caught by validation) be added to MI W 03. And in the General Recommendation section of MI W 02 a recommendation to design a way to check your data.
MI W 04
MI W 03
Comments are for the author, should be removed before we make the files public. If author wants to send commentary to the rest of the group uses <note> .
- Intro (by AB)
- Tools (by JH)
- Workflow (by AB)
- SDATA (by CW)
- DTD (by TR)
SS & CR to re-work intro to MI W 03 to make it more parallel to the intro of MI W 02.
Question of XMetal as tool was raised —
Discussion of tei2tei.xsl — we'll be discussing it as an example, not a full tool.
With respect to osx, group recommends JH put in a footnote referring to soon-to-be-available features, which could then be changed to a paragraph quickly if in fact the new version is released in time.
Decided to either not mention Notetab at all or in a general sentence in workflow about using editors as a happy front end.
Long discussion on DTDs: point made (JH) that we don't migrate the TEI DTD, we just need to tell people how to go get the new one.
Workflow section to two subsections: Intro, DTDs (w/ pointer to extensions section), catalogs, bulk on instances, processing environment.
Instance migration section
- Numeric version of osx to be relegated to footnote;
- change ‘Open SX’ to ‘OpenSP’;
- fix case of program names;
- further notes from JH to SW directly.
- osx,
- possibly a style-sheet (e.g. tei2tei.xsl) to a) correct case, and b) remove default attributes, and c) pretty-print, or
- if no (2) use pretty-printer (e.g. HTML Tidy) if desired.
‘… likely workflow that integrates all of the steps above.’
At LB's suggestion the bash script in File conversion in batch mode section should become more generic, pseudo-code if you will. We will then have pointers into the tools page at least from the case studies if not from here also to the actual scripts that people used (and AB's example), well commented.
SDATA section
Not a lot to say now. SW to copy-edit and repost to list
DTD extension migration
Not much to say now; v. good section, although it outweighs all the others by several pages. Discussion of whether or not we should ditch
<! -- comm decl with -- -- two comments -- >is a single ‘comment declaration’ with two ‘comments’; in XML there is no such distinction, and the above is not allowed. (I.e., only one comment per comment declaration, no whitespace between the ‘--’ and the ‘<!’ or ‘>’, and the whole thing is called a ‘comment’.)