<!-- includes some corrections requested by And Rosta
      in his note of 28 Aug -->
<ldoc status=final docnum=AI2M1>
<title>TEI AI2 M1:
Workgroup on Spoken Texts: Minutes of meeting held
at University of Oslo: 9-10 August 1991
<author>Lou Burnard
<date>11 Aug 1991
<present>
Lou Burnard (LB), Jane Edwards (JE), Stig Johansson (SJ; chair),
And Rosta (AR).
</front>
<text>
<div1>Day 1: Preliminary
<p>
SJ welcomed the workgroup members and recapitulated its work plan
as described in document TEI AI2 P1. A working draft which would
form the basis of the group's report (TEI AI2 W1) had been
circulated previously. SJ noted that the final version should be
ready by 1 October. In view of the short time scale for affecting
the final version of the TEI Guidelines, and the fact that the
group was concerned with an area not previously addressed at all
by the TEI, he felt that the aim should be to propose as many
tags (etc) as possible, rather than simply recommend new work
groups. Areas for further work would however also be identified,
as stated in the current working paper.
<p>
LB summarised briefly the methods of claiming expenses and
accepted the charge of preparing a record of the meeting. He
asked that electronic copies of all working papers be lodged with
the TEI secretariat in Chicago as soon as possible. SJ felt that
a further revision of the current working paper should be carried
out first.
<div1>Review of existing encoding practices
<p>
Discussion began by reviewing the list of relevant research
communities in the draft charge to the WG (document TEI AI2P1).
SJ noted that only English material and corpus linguistics had
been fully covered in his working paper. For natural language
recognition, JE mentioned material from the ZUE (?) system; for
sociolinguistics, Gumperz et al; for language acquisition,
Childes. Phonology was not adequately covered: JE referred to the
multi-level analyses being undertaken at Bell labs by Liberman et
al; AR agreed that multi-level analysis was essential for our
purposes; LB referred the group to the mechanisms developed by
the AI working committee, in particular the unit/level scheme as
described in TEI P1. SJ was concerned that we should not spend
too much time recapping on work being done in other groups. For
anthropology and ethnology, JE agreed to provide examples from
Tedlock (<q>spoken word</q>, 1983) and LB from the Oxford Text
Archive's holdings of SIL texts. SJ, noting that these would
otherwise be the only non-English examples, agreed further to
provide some examples from Swedish corpora. On rhetoric, speech,
drama & journalism, it was noted that drama was now the subject
of a different work group. Doubts were expressed as to the
relevance of this kind of material.
<p>
The research typology presented in the draft working paper  was
accepted in preference to that of AI2P1. Reviewing the topics
listed there, it was agreed that further examples were needed as
follows:
<gl>
<gt>Lexicography<gd>LB agreed to get examples and assistance from
Steve Crowdie (Longmans)
<gt>Sociolinguistics and discourse analysis<gd>Examples from Du
Bois and Gumperz were tabled by JE
<gt>Second language studies<gd> JE referred to the
<q>guestworkers speech archive</q> at Nijmegen (EALA); SJ to a
project in error analysis (<q>PIF</q>)carried out by Claus
Faerch.
<gt>Speech recognition<gd>AR agreed to get some material from the
SCRIBE project.
</gl>
It was agreed that examples of current practise would form an
important part of the draft report but should be included as a
separate appendix to the main document. Their provision, and its
analysis in section 6 of the current working paper, constituted
the WG's fulfilment of the charge to document existing practise.
Section 7 would constitute its fulfilment of the charge to assess
the current provisions of TEI P1, and to propose new tags or
extensions. As noted above, the charge to propose new work groups
was not accepted. The charge to respond to comments P1 routed to
the group was easily accepted, as none
of the comments so far received on P1 was directly relevant to
the WG's remit, though LB noted that the omission of spoken texts
from P1 had already been commented on adversely by one or two
people.
<p>
JE proposed that a summary table comparing different ways in
which the same features had been encoded would be useful. SJ felt
that detailed description of a small number of <q>significant</q>
schemes would be preferable. LB preferred a feature-based list.
It was agreed that there would be room for both.
<p>
Examples from the following schemes would be added to those
already present:
<ul>
<li>Roger Brown, Childes (child language)
<li>Jefferson
<li>HIAT (Ehlich)
<li>Gumperz (Santa Barbara)
<li>ESF <note>Dont know what this is LB</note>
<li>Survey of English Usage, London-Lund, ICE
<li>NatCorp proposals for spoken texts (Crowdie)
</ul>
JE was able to provide suitable examples for almost all of these
during the meeting, several being included in her forthcoming
book <cit>Talking Language: transcription and coding of spoken
discourse</cit>. LB undertook to send copies of a draft of this
volume to all members of the WG. AR would provide examples from
the Survey texts, and LB from the British National Corpus texts.
SJ stressed the urgency of receiving these.
<p>
As well as samples of encoded texts, brief descriptions of the
markup's meaning and bibliographic
references should be provided.
<div1>Review of AI2W1 I
<p>
Following a break, the group proceeded to go through SJ's draft,
Major points on which further revision was felt necessary are
noted below.
<p>
JE was concerned that the manageability, readability and usability of
encoding schemes should not be overlooked as these were highly
important for ease of learning and understanding. LB agreed but
noted that most of these factors were beyond the remit of an SGML
encoding scheme. SJ agreed to cite JE's views on readability in
the report. The word <q>tractability</q> was proposed and
accepted as a substitute for <q>manipulability</q> in section 2.1
<p>
In section 3, some short statement of the encoding needs of
individual groups (as demonstrated by the example texts) should be
included.
<p>
In section 4, rather than <q>levels</q> of transcription, the
group preferred the term <q>dimensions</q>. An alternative
typology was proposed, for discussion purposes:
<ul><li>Lexicalisation (i.e. how words, or representations of words,
are defined
orthographically, possibly extended by strange spellings,
phonemically, phonetically etc)
<li>Temporal aspects (prosody, intonation, pausing etc)
<li>Inter-speaker co-ordination (overlap, truncation, latching,
attribution)
<li>Units of analysis: (turns, syntagms, tone units etc)
<li>Non-verbal features (anthrophonics, gestures, events etc)
<li>Text documentation (recording details, transcription details
etc)
</ul>
<p>
Section 5. The need for this further typology was questioned. AR
noted that the notion of authorship provided an additional
complication. SJ said this would be noted in section 7.1. It was
agreed to change <q>conversation</q> on p. 4 into
<q>interaction</q> in order to loosen the sense. Problems of oral
narrative (story telling etc) should not be ignored, even if no
specific proposals were made by the wg. The list was not intended
as a typology but as indication of the variety of sources
involved. It was agreed that items <q>spoken to be written</q>
(e.g. dictation) should be added to the list on p4.
<p>
Tables showing contrastive treatments for the same kind of
feature should be added throughout section 6.
<p>In 6.1 the importance of including documentation of the
context for spoken texts was felt to need more emphasis.
Separating documentation from data makes both less tractable.
<p>
Sections 6.2 and 6.3 were discussed together because the units
defined by reference points (<q>text-units</q> in ICE terms) are
a special case of the more basic units; differing chiefly in that they
carry  a reference number. Given that units might be syntax-,
intonation-, pause-based, or a mixture, the present text was
felt to be rather too evaluative. It should also note that units
can be cross cutting and that some systems (e.g. DuBois) use more
than one kind.  6.2 should be moved to follow 6.3
<p>
In section 6.4, mention should be made of the various means taken
to preserve speaker anonymity, to indicate unknown speakers and
to document the degree of speaker awareness of being recorded,
<p>
A reference to the HIAT and ICE schemes should be added in
section 6.5.
<p>
The tractability problems raised by the mixing of orthographic
and phonetic transcription principles should be highlighted in
section 6.6.
<p>
LB queried the distinction between features treated in 6.6 and
6.10, noting that in ICE they are grouped together. SJ said that
the difference was that one group had conventional
representations such as <q>Mmm</q>, while the latter did not, and
proposed as an alternative name <q>Non-lexical vocalisation</q>.
AR suggested <q>quasi-lexical</q> which was agreed.
<p>
The meeting formally closed at 1800, though discussion continued
further.
<div1>Review of AI2W1 - II
<p>Opening the second day of the meeting, SJ proposed that
sections 6 and 7 of the document should be discussed in parallel
to speed up progress. This was agreed, though in practice the
effect was largely to discuss section 7.
<div2>Extensions to the TEI header (7.1)
<p>
LB said that proposals for changes to the TEI header would be
subject to review by the Text Documentation committee, which
would be meeting sometime in the autumn.
<p>
Agreed: the title statement should not be optional.
<p>
There was some discussion of the difficulties of identifying  boundaries
of some kinds of text: phone conversations were clearly delimited, but
radio broadcasts were not.
<p>
The proposed <tag>interaction.type</tag> tag belonged in the
<tag>encoding.declaration</tag>s (not discussed in the working
paper) rather than the <tag>file.description</tag>.
<p>
LB noted the parallel between the notions of
<tag>recording.statement</tag> and
<tag>transcription.statement</tag>  and that of the
existing <tag>source.description</tag>. This suggested
that recordings in which other recordings were embedded (a
problem raised by AR) could be handled by nesting
<tag>recording.statements</tag>
<p>
In discussion of the question of surreptitious recordings, a need
for both a general level mechanism (were all participants aware?) and a
low level mechanism (did this participant know?) was identified.
<p>
A need for a grouping tag for participants was identified and
<tag>list.of.participants</tag> was proposed. This should be
distinguished from the need for a tag identifying a number of
participants operating as a group, e.g. the audience of a radio
show, for which the tag <tag>participant.group</tag> was
proposed. The latter had the same characteristics as
<tag>participant</tag>, and an additional attribute <q>size</q>.
<p>
A need for ways of formally stating relationships between
participants was identified, LB suggested that if each
participant had a unique id, then their relationships could be
expressed by a number of <tag>relation</tag> elements contained
within the element, linked by means of a <q>target</q> attribute.
For example:
<xmp>
  <participant id=M1>Mary Jones</participant>
  <participant id=F1>Fred Jones
    <relation target=M1>spouse</relation>
  </participant>
</xmp>
There was some discussion of ways in which this could be extended to
cater for reflexive, one- or two-way relationships etc.
<p>
JE proposed that age and sex of participant would be more economically
handled as attributes rather than elements. This was felt to be
appropriate for the latter but not the former. It would be useful to be
able to specify a range or minima and maxima for age: this could be
conveniently done by allowing for attributes with numeric values. AR
proposed that in general, where an exhaustive list of attribute values
could be specified, this was preferable to leaving the options open.
<p>
It was possible that some or all of the tags under
<tag>transcription.statement</tag> belonged in the encoding
declarations. For the moment they would remain where they were,
though <tag>transcription.type</tag> should definitely move.
<p>
AR suggested the tag <tag>channel</tag> should be included within
the <setting> element, to include information about the means of
delivery of the speech being transcribed, e.g. by telephone,
two-way radio etc.
<p>
JE questioned the order of components within the header,
suggesting that the source should come first. LB commented that
this would involve a substantial departure from existing practise
in the TEI header.
<p>
The group then reviewed the components so far identified for the
header and agreed that information under the following headings
should be strongly recommended for inclusion wherever possible:
<ul>
<li>title and editor
<li>time or date of recording (normally the same as that of the setting)
<li>participant information
<li>circumstance of data capture (e.g. location, situation,
activities)
</ol>
It was noted that information could be presented informally, as
running text, rather than formally categorised.
<div2>Units of analysis
<p>LB queried the need for <tag>u</tag> to mark individual
utterances. How did this differ from the general purpose
<tag>s</tag>? It was agreed that <tag>u</tag> tags had different
attributes (notably, <q>speaker</q>) and should be retained.
There was some discussion of the general validity of the back-
channel/turn distinction currently emphasised in the draft.
<p>
For examples of the use of multiple hierarchies and the concur
feature, implied by the need for multiply nested segmentation, LB
referred the meeting to the discussion in P1, pages 141-4, and
also to a fully worked out example of multi-level analysis of an
eskimo story provided by Gary Simons, which he agreed to
distribute, after checking with its author.
<p>
It was agreed that an utterance was defined as a stretch of
discourse from a single speaker. If two participants spoke
simultaneously, this should be regarded as a two utterances.
Where speaker attribution was dubious, a list of possible speaker
identifiers should be supplied as the value for the speaker
attribute. A certainty attribute could be supplied, defaulting to
YES in the case of a single speaker, and always having the value
NO in the case of multiple speakers.
<p>
There was some discussion of the need to distinguish the role of
a speaker as author or participant in the case of scripted
material. LB hypothesized the case of an utterance such as <q>Are
we rolling Bob? Good evening. Tonight Mr Gorbachow said We will
bury you...</q> in which he distinguished (a) the newsreader
speaking in propria persona (b) the newsreader reading from a
script prepared by someone else which happens to quote (c) a
third party's speech. It was agreed that the appropriate tag for
case (c) was the <tag>q</tag> tag already present in the
Guidelines. SJ stated that the role of a speaker with reference
to the text should be documented in the header, which should
distinguish scripted and unscripted material, the distinction
being that scripted material can be departed from.
<p>
After further discussion, it was agreed that a
<tag>script.statement</tag> should be included in the header to
provide information which could be associated with a given utterance by
means of an IDREF supplied on a <q>SCRIPT</q> attribute to the
<tag>U</tag> element. (This implies that in the example above, the
switch from unscripted to scripted remarks in fact indicates the start
of a new, possibly nested,  utterance by the same speaker - LB)
<p>
The question of back-channelling and interruptions was discussed
further. SJ agreed to reconsider the matter. As an example, the
group discussed the following
<xmp>
   <u>This is <u type=back>uh huh</u> my turn </u>
</xmp>
which was generally felt to be unsatisfactory, as it obscured the
fact that <q>This is my turn</q> was a single utterance.
<p>
It was noted that truncation was not necessarily associated with
interruption, either of segments or words, since it could also be
indicated by intonation patterns.The group initially proposed
simply an attribute <q>trunc</q> with values Y or N which could
be attached to utterance or segment tags. Thus in the following
example
<xmp>
You know how they do that, so you can't s- ha- --
you dont have any balance (J&J 1.4.1)
</xmp>
the intonation unit beginning <q>so you can't...</q> is
truncated, as are the two partial words with which it ends. This
could be rendered as
<xmp>
<s type=IU trunc=y>
so you can't <s type=W trunc=y>s</s><s type=W trunc=y>ha</s>
</s>
</xmp>
Preference was expressed for a special purpose
<tag>truncated.word</tag>, by analogy with existing tags such as
<tag>foreign</tag> or <tag>highlighted</tag>. SJ felt that units
should not be marked as segments simply in order to carry a
truncation tag. The above example would thus become
<xmp>
<s type=IU trunc=y>
so you can't <truncated.word>s</truncated.word>
<truncated.word>ha</truncated.word>
</s>
</xmp>
<p>
An interruption could be regarded as an overlap associated with
truncation, or which coincided with a pause. Returning therefore
to the problem of overlapping segments, the group focused again
on the example of overlap given above. In the London-Lund corpus this
would be marked up as follows:
<xmp>
A This is
B uh uh
(a) my turn
</xmp>
LB proposed the following alternative:
<xmp>
<u sp=A>This is <point id=a1>my turn
<u sp=B><point same=a1>uh uh
</xmp>
where the <q>same</q> attribute was used to point from one
<tag>point</tag> to another, indicating synchrony of utterance.
It was noted that this synchronised only the start of each
utterance. Although overlapped segments clearly had extent, using
a true SGML element (say, <tag>olap</tag> or even <tag>s
type=olap</tag>) would not work. If, for example, A's turn was
overlapped by two speakers, a formulation such as
<xmp>
<u sp=A>This <olap id=A1>is <olap id=A2>my</olap> turn</olap>
<u sp=B><olap same=A1>uh uh</olap>
<u sp=C><olap same=A2>No it's mine</olap>
</xmp>
where segments A1 and A2 are overlapped by different speakers was
ambiguous (and illegal SGML). One alternative would be to define
a concurrent hierarchy for each speaker thus
<xmp>
<u sp=A>This <(b)olap id=A1>is <(c)olap id=A2>my</(b)olap>
turn</(c)olap>
<u sp=B><(b)olap same=A1>uh uh</(b)olap>
<u sp=C><(c)olap same=A2>No it's mine</(c)olap>
</xmp>
but this would require as many concurrent views for as there were
overlapping speakers and would also lead to some processing
difficulties with currently available SGML software.
<p>
As an alternative, LB proposed that an <tag>end.point</tag> could
be used to mark the alignment of places where overlap finished in
an utterance. For completeness, this could be linked to its
corresponding <tag>point</tag> by a further pointer attribute
named <q>start</q>, thus
<xmp>
<u sp=A>This <point id=A1>is <point id=A2>my
<end.point id=A3 start=A1> turn<end.point id=A4 start=A2>
<u sp=B><point same=A1 id=B1>uh uh<end.point same=A4 start=B1>
<u sp=C><point same=A2 id=C1>No it's mine<end.point same=A4
start=C1>
</xmp>
<p>
This formulation could be automatically derived from the simpler
input conventions proposed by JE and others. Summarising the
discussion, it was agreed that an utterance is a stretch of
spoken language from one speaker and a segment is anything
smaller. It may have a type (e.g. macrosyntagm, tone unit, turn,
arbitrary text unit etc) and occurrences can be nested. To cope
with the SGML prohibition on crossing of such nested segments, we
recommend the use of milestone tags <tag>point</tag> and
<tag>end.point</tag>, to mark synchronisation points where
overlap begins and ends, which take the following attributes
<gl>
<gt>id<gd>provides an arbitrary identifier<note>In later
discussion, it was noted that this could conveniently be derived
from a timeline. Alternatively, the values chosen could act as
pointers to discrete points on the timeline</note>
<gt>same<gd>identifies a point in another overlapping utterance
<gt>start<gd>identifies a point in the same utterance where
overlap begins (used only for end.point)
</gl>
<div2>Other features discussed
<p>In section 7.6 it was noted that issues of truncation had
already been addressed. It was additionally suggested that causes, both for
deletion and truncation, should be specified using an attribute.
<p>
It was noted that pauses may occur both within utterances and
between them. A need for a <q>units</q> attributes, with values
such as <q>seconds</q> or <q>syllables</q> was identified.
<p>
There was some discussion as to whether paralinguistic phenomena
should be partitioned into vocal non-verbal
actions (coughs, umms sneezes etc) and others (gestures, passing
trucks etc). It was suggested that the former might be regarded
as utterances, and a tag <tag>action</tag> or <tag>event</tag>
used for the latter. Treating coughs etc. as utterances would
imply that a cough by speaker A during an utterance by speaker B
would have to be regarded as a case of overlap. SJ would prefer
to include non-vocal involuntary noises within the utterance
where they occur, possibly with a speaker attribute. This would
preclude their representation by entity reference.
<p>
The full ramifications of encoding non-verbal actions were not
explored; it was noted that as well as a description of the
event, lists of identifiers for the participants would be needed,
as well as (probably) an alignment map. LB referred the meeting
to the discussion of movements in paper TEI MLW18 for some
examples.
<p>
In a brief discussion on performative features such as pitch,
speed and vocalisation, LB asked if these could not be regarded
as analogous to rendition in written texts and treated in a
similar way. It was generally felt that it would be better to
mark these using milestone tags such as <tag>pitch.change</tag>,
<tag>speed.change</tag> etc.
<p>
It was noted that the list of kinesic features in section 7.11
was not intended to be exhaustive but just to provide
suggestions. Evaluative preferences should not be included in it.
It was suggested that an attribute <q>iterated</q> might be
useful.
<div1>Conclusions
<p>
The group felt that substantial progress had been made, but
identified the following topics as needing considerable further
work:
<ul>
<li>quasi vocal things such as laughter
<li>quasi lexical things such as <q>mm</q>
<li>prosody
<li>parallel and discontinuous segments
<li>uncertainty of transcription, uncertainty in general
</ul>
JE would be out of touch till 27 August. SJ will work on revising
the draft and circulate  copies of the chosen set of examples
as soon as possible. LB will circulate minutes of the meeting
before 15 August. It was felt that funding for a second meeting
should be sought, perhaps adjacent to the NOED conference in
Oxford at the end of September. LB agreed to host the meeting and
SJ to seek authorization to hold it.
</ldoc>