Analysis and Interpretation Committee
 
        Minutes of the meeting held at Oxford University Press,
 
                           21 September 1989
 
 
                              Lou Burnard
 
                            10 October 1989
 
                      Document Number:  TEI AI M1
 
Present:  Robert Amsler (RA);  Steve Anderson (SA);  Bran Boguraev (BB);
Lou Burnard  (LB);  Nicoletta  Calzolari (NC);   Nancy Ide  (NI);  Terry
Langendoen (TL),  chair;  Winfried Lenders (WL);  Nelleke Oostdijk (NO);
Bill Poser (BP);  Beatrice Santorini (BS), replacing Mitch Marcus;  Gary
Simons (GS); Michael Sperberg-McQueen (MSM); Donald Walker (DW); Antonio
Zampolli (AZ)
 
                            Final 10 Oct 89
 
 
 
                                   0
 
                         INTRODUCTORY BUSINESS
 
TL welcomed  members of the  committee and  drew their attention  to the
agenda previously circulated.
 
 
Expenses
 
Committee members are reminded to send all receipts for travel to MSM at
UIC.   A  maximum per  diem of  $20 is  payable without  receipts;  with
receipts up to 50% of estimated travel costs would be reimbursed.   (The
ceilings are thus $650 for North Americans  and 315 ECU for Europeans at
this meeting.   -Ed.) North Americans should send originals and would be
reimbursed (up  to 50%)  in US  dollars;  Europeans may send  copies and
would be reimbursed in ECUs.
 
 
Timetable
 
   Three meetings were scheduled for the current funding cycle.  A draft
of the committee's  guidelines was due by March 1990.   The next meeting
could conveniently (except  for some Europeans)  be held at  the time of
the LSA and MLA meetings in Washington during (Dec.  28-30).   The final
meeting could be held,  in Tucson,  at the end of February.  MSM gave an
overview of the TEI timetable;  funding for the second cycle had already
been requested of NEH and a decision would be made by May 1990.
 
 
Committee Brief
 
   TL stressed that  the committee's task in the current  phase was more
to decide  what to mark up  than exactly how to  do it.  He had  had the
opportunity of discussing  SGML with the chair of  the Metalanguage Com-
mittee (David  Barnard)  and saw no  technical reason why it  should not
support the committee's requirements, however imperspicuously.  The com-
mittee's work would undoubtedly take SGML into corners it had not as yet
visited,  but the Metalanguage Committee would have the job of identify-
ing and  then supporting any extensions  to the standard which  might be
required. There was no requirement that the committee adopt SGML for its
own internal working:   its job was  to produce tagsets.  He likened the
committee's role to that of the body which defined IPA: it had to define
a universal language for text structure,   capable of supporting any one
of a number  of possibly inconsistent or  mutually exclusive theoretical
frameworks.  No unification of theories was expected of it, nor would it
be responsible for implementing any part  of its proposals.  As with IPA
again,  the  object was to define  an interchange format rather  than an
application-specific one.
 
 
 
                                   0
 
                           GENERAL DISCUSSION
 
   RA drew a parallel between the  committee's task and that of defining
purely typographic  markup.  It should focus  on tagsets rather  than on
theoretical discussions. BP asked whether linguistic rules should not be
tagged as such,  since they would necessarily form a part of the content
of some texts to be tagged.  TL said that a full analysis of formulae as
such should be deferred to the second  cycle.  The goal should be to tag
such things adequately to ensure their correct representation,  indepen-
dent of  any particular  formatter.  If two  theories are  disjoint with
respect to a particular feature (traces for example),  it should be pos-
sible but  not obligatory to supply  it.  BP stressed the  importance of
making provision  for less  orthodox theories  using entirely  different
representations.  TL said subcommittees should aim to be catholic, iden-
tifying as many theories as possible.  NC asked whether when considering
existing markup schemes,  e.g.   TOSCA or LOB,  it would be necessary to
identify conversion rules, grouping tags according to their function. TL
said that such mappings were the responsibility of the Metalanguage Com-
mittee.  The  main focus should be  on linguistic issues  and alphabetic
texts.  BP asked about polylinguistic texts and non-Roman alphabets, for
example Gardiner's  Egyptian Grammar,  or  Japanese texts  with embedded
                    ________________
Chinese characters.  TL  replied that the Text  representation committee
would address these  problems,  although the question  of syntactic fea-
tures linking parallel structures was  within the A&I committee's brief.
MSM said that the question of synchronised structures was a good example
of an area where several committees would  need to work,  as were cross-
reference and the markup of discontinuous  text segments and of segments
with unclear boundaries.  He stressed  the importance of good communica-
tion between  the committee heads and  the editors in this  respect.  TL
stated that linguistic markup should include  all of semantics and prag-
matics,  and acknowledging the point of view of certain linguists,  such
as George Lakoff,  that all domain-specifying categories are artificial,
contended that  the markup should  make it  possible to smear  all these
distinctions.
 
   WL asked whether the intention  was to provide a language-independent
superset of tagsets,  citing the MATER standard (ISO 6156)  as one which
had found the need to include a German-specific appendix. TL stated that
his personal preference was for a superset.  DW pointed out that no par-
ticular tagset  would use all available  tags and MSM that  the Steering
Committee had  already decided  tagsets should  be extensible,   and re-
nameable.
 
 
 
                                   0
 
                           SUBCOMMITTEE WORK
 
Dictionary Encoding Subcommittee
 
   RA reported on the Dictionary Encoding Standard worked out in collab-
oration with  Frank Tompa at Bellcore,   and circulated copies  of their
joint paper  describing it.   RA  said that  the paper needed  many more
examples and more descriptive text. He invited comments from the commit-
tee  and indicated  that he  would  circulate any  comments received  on
extensions to  it (also see "4. The  Amsler/Tompa paper will be  given a
number").   RA is  chair of the dictionary subcommittee;  BB  and NC are
members.   Two areas of activity were proposed for the subcommittee: one
was to broaden the work to include non-English and multilingual diction-
aries and the other to consider  etymology.  Frank Abate had volunteered
to investigate the latter.  Alain Pierrot was collaborating with NI on a
DTD for Hachette's dictionaries.   John  Fought and Carol Van Ess-Dykema
were developing a multilingual  standard,  with language-specific exten-
sions for each language.
 
   AZ  asked whether  this subcommittee  would also  deal with  machine-
tractable dictionaries, or electronic lexica.  RA replied that there was
considerable overlap of interests among the proposed membership. A brief
discussion of the wisdom, or lack of it, of recoding lexica expressed in
LISP in SGML ensued.  AZ opined  that it was organisationally preferable
for this subcommittee to focus only on printed dictionaries, aiming at a
neutral interchangeable  format.  Output from other  subcommittees (mor-
phology for example) would be useful at a later stage in the project.(1)
 
   Other  members  proposed  for the  subcommittee  were  Susan  Warwick
(ISSCO, Geneva),  Carol Van Ess-Dykema (NSA),  John Fought (U Penn)  and
additional representatives from the groups  at Bellcore,  IBM,  Pisa and
IKP (Bonn).  Communication  between these and commercial  publishers was
important.
 
 
Phonetics/Phonology Subcommittee
 
   This work of this group, chaired by Bill Poser,  would to some extent
overlap with that of the Text  Representation committee.  It would addi-
tionally address such issues as hesitation, intonation, and overlapping,
and the correspondence between phonemes and graphemes, but would need to
prioritize these carefully.  AZ reminded the committee of the importance
of supporting the  needs of the speech synthesis community  in this con-
text.  RA asked whether the dictionary encoding should attempt to define
phonemic equivalences in IPA or something  else:  his view was that they
should not.
 
   After lunch,   the following were  suggested as possible  members for
this subcommittee:  Ken Church and Mark Liberman (AT&T);  Henry Thompson
(Edinburgh); Jared Bernstein (SRI); Lauri Lamell (MIT); Janet Pierrehum-
bert (Northwestern);  Brian MacWhinney  (Carnegie-Mellon);  Paul Roossin
(IBM); Bob Mercer (IBM); Jan Svartvik (Lund); John McCarthy (U Massachu-
setts).
 
   BP said  that the work of  the subcommittee should  include "gestural
stuff":  its task was not to propose a Klatt-style "ARPAbet" but to make
it possible for anyone who wished to use  one to define such a code in a
portable way. He asked where the kind of phonemic markup employed should
be specified and  whether its semantics should be  specified with refer-
ence to e.g.  IPA.  MSM saw this  as another area where this committee's
work overlapped with that of  another:  the Text Documentation committee
would provide  a space  into which  declarations of  this kind  could be
placed, but little more. If texts include application specific data (for
example F0 values)  it was clearly necessary to provide portable ways of
interpreting them.
 
 
Morphology Subcommittee
 
   This subcommittee  currently comprises Steve Anderson  (chair),  Win-
fried Lenders and Gary Simons.  It  will address such standard issues as
the delimiting  and classifying of  words,  aiming at  generic solutions
rather than value lists.  SA  suggested that many substantive categories
such as  dialectal or usage variants  are not morphological  but lexical
information.  The subcommittee should focus  be on the representation of
the internal structure of words,  recognising however that simply delim-
iting morphemes would be inadequate for discontinuous segments (e.g.  in
Arabic)  or for the use of such  tricks as ablaut in Germanic languages,
or metathesis in Saylish to render aspectual distinctions.  SA suggested
that the  most promising line was  to identify and generalise  the rela-
tionships existing between different forms,  regarding morphology as the
internal syntax of words.
 
   Members  proposed for  the  subcommittee  included:  Martin  Chodorow
(Hunter C, CUNY);  Richard Sproat (AT&T);  Kimmo Koskiennemi (Helsinki);
Lauri Karttunen (Xerox PARC); Jorge Hankamer (UC Santa Cruz);  Mark Aro-
noff (SUNY  Stony Brook);  John McCarthy  and Lisa Selkirk  (U Massachu-
setts); Burghardt Schaeder (Siegen).
 
   There was some discussion of  the level of generalisation appropriate
to the subcommittee's work.  NC and AZ  pointed out that for most people
identifying the lexical item (lemma)  appropriate  to a surface form was
of far  more importance than its  internal structure.  AZ  asked whether
compound words would  also be considered.  SA replied that  these were a
special case of the  general rule.  TL said it was  important to support
different levels of analysis.  RA suggested  that some redundancy in the
encoding would be a helpful way of supporting this.  He also recommended
that as much language-specific information as possible should be identi-
fied and shared amongst members of the subcommittee.  BP remarked on the
existence of many large corpora  of Amerindian languages exhibiting many
unusual features.  RA recommended the  use of consultants with expertise
in these areas, mentioning David Nash for Australian Aboriginal languag-
es.
 
   Resuming the  earlier discussion,   MSM pointed  out that  the coding
schemes used by existing tagged corpora often blurred lexical, syntactic
and morphological distinctions.  He felt that  it was enough to identify
places where a value could be recorded without attempting to unravel its
semantics.  RA  noted that the DEI  often specified alternative  ways of
encoding a given feature;  GS that tags were treated isomorphically with
data in  the Brown  Corpus.  AZ was  firmly of the  opinion that  a well
defined set of  values should be identified,  for example,   for part of
speech,  rather than an open ended set.  RA remarked that SGML gave us a
better  notation than  that available  to earlier  projects which  often
needed to attach attribute values to every token because they lacked the
notion of markup distributed throughout a text.
 
 
Syntax subcommittee
 
   BS reported on behalf of Mitch Marcus who had been asked to form this
subcommittee,  together with herself,  NO and Hans Uszkoreit.  They felt
that whatever was to be provided should  be able to specify both ambigu-
ous and hierarchic syntactic structures,  to  cope with a variety of re-
analysis phenomena and other syntactic ambiguities.  A single word (e.g.
the Japanese causative)  might require a bi-clausal analysis.   Multiple
simultaneous representations of  a string might be  needed,  for example
"(take advantage [of) John]."  TL remarked that David Barnard had stated
that such things could be managed by SGML.  SA asked whether it was also
capable of Postal-style arc-pair grammar.  On ambiguity,  RA highlighted
the  need to  distinguish  the deliberately  ambiguous  from the  merely
vague,  contrasting "the  duck is ready to eat" with  "light house keep-
ing".   NO  asked whether idiomatic  and figurative phrases  belonged in
this subcommittee: most present agreed with her that they did not.  Idi-
omatic phrases formed a convenient unit,   but were not in fact phrases.
They also had multiple class membership.   TL mentioned the need to sup-
port inheritance of properties within  a hierarchy by placing attributes
as high as possible in the tree:  if  tense is only marked as a property
of verbs, it becomes difficult to deduce the tense of sentences.
 
   The following  people were suggested  for this  subcommittee:   Annie
Zaenen (Xerox PARC); James Pustejovsky (Brandeis);  Geoffrey Leech (Lan-
caster); Geoffrey Sampson;  Robin Fawcett (Cardiff);  Beth Levin (North-
western); Eric Wehrli (UCLA); Gerald Gazdar (Sussex); Don Hindle (AT&T);
Elisabet Engdahl (Edinburgh).
 
 
 
                                   0
 
                                ACTIONS
 
1.    Subcommittee chairs should  co-opt people to their  committees and
      produce interim reports by 1 November, 1989.
 
2.    Subcommittee chairs were requested to  save and precis for distri-
      bution all correspondence with potential  members.  TL offered the
      services of his office for assistance with TEI work,  particularly
      in sending documents out to subcommittee members.
 
3.    All working documents  should be sent to TL who  would assign num-
      bers and post them on the TEI-ANA server.
 
4.    The Amsler/Tompa paper  will be given a number  shortly,  and com-
      ments  were requested  as  soon  as possible,   particularly  with
      respect to  extending its  scope to  include polylingual  and non-
      English dictionaries and to discussing the etymology problem.
 
5.    The committee  should agree  the structure  of an  overall interim
      report.  It was agreed to differ  discourse analysis to the second
      cycle.
 
6.    MSM requested that draft documents  be distributed using some form
      of descriptive markup  to simplify their later reuse.    He and LB
      are working on  a tagset for this purpose,   but committee members
      should not await its appearance before putting finger to keyboard.
      These minutes reflect this (mostly LB's  doing,  but TL has lent a
      hand).
 
7.    Minutes of the  meeting to be distributed by the  end of September
      1989.
 
8.    TL to try to contact Hans Uszkoreit.
 
9.    NI to circulate details of the TEI  workshop at the MLA meeting in
      Washington in December,  so that our next committee meeting can be
      definitely scheduled (see "Remaining Meetings").
 
 
 
                                   0
 
                           REMAINING MEETINGS
 
   The next meeting will be held  in Washington,  DC in conjunction with
LSA and MLA meetings,  either 28 or  29 December,  so as not to conflict
with the TEI workshop at MLA.
 
   The final meeting will be held in Tucson, AZ in late February or ear-
ly March, 1990.
 
-------------------------
 
(1) Since the meeting, RA has suggested adding Robert Ingria as co-chair
    of the Dictionary subcommittee and  extending its mandate to include
    the development of recommendations for encoding electronic lexica.
 
                                                         Final 10 Oct 89