On Lexical Ambiguity

.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 On Lexical Ambiguity TEI-AI-W-21 March 24, 1990 D. Terence Langendoen

Department of Linguistics University of Arizona Tucson, AZ 85721

Suppose we wish to mark up the example:

Wash sinks. simply for lexical ambiguity, using a standard feature-structure interpretation of the Lund corpus markup tags. First, we can give it a textual markup along the lines of my proposal in TEI-AI-W-20, as follows (again, details are omitted concerning character markup). < s id=s1> < w id=w1> Wash < /w> &amper;rbl; < w id=w2> sinks < /w> < c id=c11> . < /c> < /s> Pointers to the lexicon are not put in here, since they are to be considered part of the analysis.

Next, we provide an analysis for each word. I propose that multiple analyses be grouped under a single tag, for which I suggest the name analysis-list. This tag should be allowed to have attributes which specify whether the alternatives are ranked and the basis for the ranking, if any. Each analysis is tagged with analysis, as in my earlier proposals. A rank attribute on this tag specifies the ranking; it takes numerical values, with rank=1 being the highest rank, rank=2 the next highest, etc., and rank=0 meaning that this interpretation should be ignored. I assume that each analysis is a single feature-structure, tagged with fstruct, which is permitted besides its own ID attribute at least one IDREF attribute which points to an entry in the lexicon. I will call this latter attribute lexp, as in TEI-AI-W-20.

I also assume entity definitions which have the following effect. (I don't specify these in SGML because I don't quite know how.)

&er;noun;: < feature>< fname>cat< /fname>< fstruct>noun< /fstruct>< /feature>
&er;common;: < feature>< fname>subcat< /fname>< fstruct>common< /fstruct>< /feature>
&er;proper;: < feature>< fname>subcat< /fname>< fstruct>proper< /fstruct>< /feature>
&er;unmarked;: < feature>< fname>subsub< /fname>< fstruct>unmarked< /fstruct>< /feature>
&er;plural;: < feature>< fname>subsub< /fname>< fstruct>plural< /fstruct>< /feature>
&er;verb;: < feature>< fname>cat< /fname>< fstruct>verb< /fstruct>< /feature>
&er;main;: < feature>< fname>subcat< /fname>< fstruct>main< /fstruct>< /feature>
&er;base;: < feature>< fname>subsub< /fname>< fstruct>base< /fstruct>< /feature>
&er;s-form;: < feature>< fname>subsub< /fname>< fstruct>s-form< /fstruct>< /feature>
&er;NC;: &er;noun;&er;common;&er;unmarked;
&er;NC2;: &er;noun;&er;common;&er;plural;
&er;NP;: &er;noun;&er;proper;&er;unmarked;
&er;VA0;: &er;verb;&er;main;&er;base;
&er;VA3;: &er;verb;&er;main;&er;s-form;

Here is a representation of the various analyses.

< analysis-list id=w1 ranking=yes> < analysis rank=1> < fstruct lexp=e2> &amper;NC; < /fstruct> < /analysis> < analysis rank=0> < fstruct> &amper;NP; < /fstruct> < /analysis> < analysis rank=2> < fstruct lexp=e2> &amper;VA0; < /fstruct> < /analysis> < /analysis-list> < analysis-list id=w2 ranking=yes> < analysis rank=1> < fstruct lexp=e1> &amper;VA3; < /fstruct> < /analysis> < analysis rank=2> < fstruct lexp=e1> &amper;NC2; < /fstruct> < /analysis> < /analysis-list>

Finally, the lexicon is as follows, where, as before, the lexicon tag simply indicates where the lexicon section starts.

< lexicon> < entry id=e1> sink < /entry> < entry id=e2> wash < /entry> Terry Langendoen phone: (+1 602) 621-6898 Department of Linguistics bitnet: langendt@arizvm1 University of Arizona internet: langendt@arizvm1.ccit.arizona.edu Tucson, AZ 85721 USA fax: (+1 602) 621-9424