Notes from TEI Migration Task Force Combined Meeting 2003-02-07/08TEI MI M 07
Initials Used for People
- SB Syd Bauman
- AB Alejandro Bia
- LB Lou Burnard
- TE Tomaz Erjavec
- JH Jessica Hekman
- AK Amit Kumar
- CP Chris Powell
- TR Tobias Rischer
- CR Christine Ruotolo
- SS Susan Schreibman
- NS Natalia Smith
- JW John Walsh
- SW Sarah Wells
- FW Frans Wiering
All times are local to MITH (i.e., -05).
Started late (due to weather) at ~14:15 with CR, SS, TE, FW, LB, JH, AB, SW, JW, SB. CP arrived @ ~15:07.
Group extends a gracious thank-you to SS, MITH, and also to Amit Kumar. SS and AK have gone above and beyond to see that this meeting works despite the closing of the University of Maryland due to snow.
- Appendix B: Outline of MI W 02
Discussion of survey project; little done so far.
LB announces Alan Morrison looking into using new projects TEI webpage as springboard for survey.
Consensus is to perform a small survey using the list of projects available on the aforementioned TEI applications page; perhaps do larger survey later on a different or extended grant.
Strategic document should have an overview of how the process should be performed — who does what, whether to stop production, how to change workflow, etc.
FW: DTD extension problems; reports bug in tei2tei.xsl leaves attribute name but not value for defaulted attr. SDATA problems: using SDATA entity referencess for renaissance musical notation, some of which are not in Unicode.
Discussion of whether to discuss SX or osx — consensus is to discuss both, including difficulties of building osx.
LB reports that CE WG is working on this [FW's SDATA problem]. LB thinks only solution is going to be PUA use, and that WG is going to recommend the encoding thereof.
FW asks how can one use a font to represent a PUA char. No one actually knows.
Discussion of fonts to be incorporated into SDATA section of technical document.
TE asks about depreciation of named entities; big discussion on whether XML requires named entities to be declared or not. Consensus is that we will discuss the disadvantes of using named entities in SDATA section.
SB suggests tools section admits that JH works on osx.
FW points out that XMetal can convert SGML to XML. (Discussion as to whether it does or not to be discussed on list later.)
We should include in our survey a question or small section asking people about tools they use.
TE: points out discrepancy in tools (sx v osx); also he felt there was no where to start, so he used checklist.
Group considers software for out-of-the-box TEI Lite (P3) to TEI Lite (P4 XML) something we'd like to be able to recommend.
SB suggests that MI W 04 be rolled into an appendix of MI W 03 and, similar to SS's suggestion, be referred to by the first steps of AB's list … "if you have complicated data, lots of it, anything you don't know [e.g., data that was created before you started working on the project]."
TE: documents don't discuss marked sections! (Marked sections in docuement instance, that is — LB suggests using general entity references declared based on the value of TEI.XML.)
TE: no place are SGML declarations mentioned. (Need to mention to use your local declaration for SGML processing).
ACTION: LB to check whether osx reads SGML declaration, in particular whether it acts on NAMECASE GENERAL NO. Answer: 1.5 seems to do it right if you specify the SGML declaration on the commandline as you're supposed to.
The tech document should discuss that osx will only preserve case if you specify an SGML declaration that specifies case sensitivity.
LB points out we should point out the disadvantages of using dirty hacks. We should put some effort into overcoming the obvious reasonable objections to the off-the-shelf tools.
Discussion on XSLT engines. Consensus is to state that we have successfully transformed an X big document with tei2tei.xsl using [software we use, probably xsltproc].
SB points out that the ‘@@ hack’ not needed now that osx does not expand entities; CR points out that it's needed to protect expansion from XSLT stylesheet to correct case.
SB wonders why osx doesn't fix case. After explained to JH, she thinks this feature might be added in future.
LB reminds us (JH in particular) that fix to ‘attribute bloat’ 1 problem is in stack, too.
CR & SB point out that the discussion of that batch script should be more plug & play. 2
TE points out that we do not mention anything about public identifiers; SB adds DOCTYPEs in general. LB points out that this is mentioned on sgml2xml page on site, could be used as a starting point.
Catalogs:
At the very least, we'll need some sort of discussion of catlogs.
Commenced 10:30
JH reports on osx updates.
CR raises issues with osx: [?...?]. Input files in EUC (a Unix Japanese double-byte encoding). osx can process them (with a -b switch); problem is that it gave an error message, even though it seemed to work. JH points out that ‘this is technically an OpenSP issue, not an osx issue. Which is to say, I will definitely not be able to fix it. I will take care of harrassing the OpenSP folks about it, though.’ In some other case got gibberish out
Suppressing output of built-in entity references has been written but not checked in; supression of default attributes is on JH's ‘to-do’ list.
CP: using SX, has been relatively smooth as pretty simple data pretty well normalized. Points out that her parser complains about "<l/>" in the output.
JW asks if putting up lists of entity names & Unicode codepoints for n2x would be helpful. SB says yes, but not much. SB points out that users find it difficult to find the ISO entity sets for Unicode on the website. Consensus is that MI W 03 should contain an explicit reference.
NS: TEILite, quite well normalized. Easy translation. Had used osx & xsltproc.
CP & CR bring up a company called Intellex; NS mentions Apex. CR thinks they might be helpful in taking a look at our documents and providing feedback. Perhaps raw SGML with lots of minimizations.
JW: VWWP is also TEILite, well normalized; created new entity sets using XHTML versions as a base and adding Greek with diacritics by themselves.
- persName fix
- globincl -> Incl
- dual-purposing of entity sets (using %TEI.XML;)
LB reports that information about the BNC migration is now on website ( http://www.tei-c.org/MI/Samples/BNC/ ), not much to add. OTA is working on this problem, but LB not equipped to report on it. (A lot of OTA stuff is not in TEI anyway.)
CR: main barrier to conversion is error-prone SGML on input. LB suggests that we have a discussion of document management issues: e.g., keeping track of changes (e.g., in <revisionDesc> ).
CR discusses what a repository rep report should contain.
MI W 06 will be the collected case reports, "Migration Case Study Reports".
Next meeting tentatively Mon 16 & Tue 17 Jun 2003 (all day Mon 16, half day Tue 17) in Alicante Spain. Thanks to AB for volunteering to host the next meeting. Note that Alicante is having a big festival on Mon 23 & Tue 24.
On the subject of MI W 06, ‘conversion that maximizes XML tool usability’ seems to mean whether you convert from external entities to XInclude type of stuff.
-
LB sends mail to Projects on activities pageAction 13: LB send draft letter to lists 2003-02-28Action 14: LB provide list of e-mail addresses to those on activities page to CR 2003-03-14
- Of those that use P3 SB & CR send letter for general info
- Those who are willing to help out will be divvied up among the MIGR group members, who will send 'em the survey, with the expectation of an ongoing dialogue.
Appendix A:
Adjourned ~16:15.
Appendix B: Outline of MI W 02
MIW02: Strategic Considerations in Migration of TEI Documents from SGML to XML Introduction [from beginning of MIW03d] [AB] Challenges, opportunities, and motivation [SS,JW] Motivation Why it's a good idea Why do this now? Will make conversion to P5 possible comparison to cost of converting proprietary formats Opportunities Easier to find programers and tech people Reduced production costs leverage related standards (X...) Software Challenges Expanded file size people time (including training) new software, processing system (new procedures, workflow) Types or scope of migration [CP] P3->P4 P4->P4 Levels of encoding (eg TEIlite v. full P3 or P4) Areas of migration [JH] Document instances DTD extensions Catalog files Processing environment General recommendations [AB] Things to think about before you start Workflow issues Training Consider resources (staff, software, time) Use checklist Make a backup Use an incremental approach Check your migrated docs in your new processing environment Special considerations in migration [CR] Easy conversion "minimally invasive" conversion conversion that maximizes XML tool usability Appendix: potential impact of future versions of the Guidelines on migration issues [Eds.]