Notes on the Encoding of Linguistic Analysis <author>D. Terence Langendoen <address> <aline>Department of Linguistics <aline>University of Arizona <aline>Tucson, AZ 85721 USA <aline>E-mail: langendt@arizvm1 (bitnet) <aline>Phone: (602) 621-6898 </address> <date>18 January 1990 </titlep> <body> <h1>Sample markup possibilities for the English word 'unpacked' <p>Assuming a fully specified lexicon and (word-formation) grammar, here are schematic markups for the three interpretations of the English word 'unpacked', which assumes that none of these are entered in the lexicon, but that there are entries for the following: 'pack' 'unpack', 'un' (two different ones), 'ed' (two different ones). <ol> <li id=anal1> <xmp> <![ CDATA [ <word id=w999> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <lexitem id=un1l> <analysis> <gloss g='contents put in'> <adjective> <rule id=ed2r> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </analysis> </word> ]]> </xmp> <lp>The category tag (here <tag>adjective</tag>) also contains attributes identifying its argument structure and selectional restrictions. <lp>Rule un1r is the rule for forming 'negative adjectives' from adjectives. <lp>The rule has 2 parts, the prefix identified in the lexicon as un1l, and something which itself the result of an analysis. <lp>The lexical item un1l is prefixed to an adjective which in this case is composed by the rule ed2r, which suffixes the form lexically identified as ed2l to the lexical entry identified as pack3l the entry in fact may be a subentry under the lemma 'pack' to be identified by a mechanism such as Gary Simons suggests in AIW12. <lp>No attributes are provided here though they could be. <li id=anal2> <xmp> <![ CDATA [ <word id=w999> <analysis> <gloss g='contents removed'> <verb tense=no form=past-participle voice=passive arglist=subject rolelist=source typelist=container> <rule id=ed3r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>Rule ed3r forms passive past participles from verb stems I assume for purposes of illustration that the rule is distinct both from the rule that forms active past participles and from the rule that forms adjectives from verbs with the same morphology. <lp>I assume for this illustration that the verb 'unpack' is listed directly in the lexicon and does not have to be formed by a morphological rule from the prefix un2l and the verb stem pack3l. <li id=anal3> <xmp> <![ CDATA [ <word id=w999> <text>unpacked <analysis> <gloss g='removed contents'> <verb tense=past arglist=(subject direct-object) rolelist=(agent source) typelist=(person container)> <rule id=ed2r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>I use parentheses to enclose list items, pending clarification as to the correct SGML syntax to use in this situation. <lp>The three preceding analyses could be provided together with or without a ranking provided. Presumably, id's should be provided for each of the constituent analysis tags. If the ranking is omitted, then it is assumed that the alternatives are equally ranked. <li id=comb1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis id=a1 rank=1> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <lexitem id=un1l> <analysis> <gloss g='contents put in'> <adjective> <rule id=ed2r> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </analysis> <analysis id=a2 rank=2> <gloss g='contents removed'> <verb tense=no form=past-participle voice=passive arglist=subject rolelist=source typelist=container> <rule id=ed3r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> <analysis id=a3 rank=3> <gloss g='removed contents'> <verb tense=past arglist=(subject direct-object) rolelist=(agent source) typelist=(person container)> <rule id=ed2r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp><tag>Analysis</tag> is an embedded analysis-tag. <lp>We now give a variant of <liref refid=anal1> in which we flatten the analysis; that is, we provide an analysis in which we identify the rules and morphological elements that combine, but do not indicate the order of combination. <li id=flat1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <rule id=ed2r> <lexitem id=un1l> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>If we now eliminate reference to rules, we have a representation which just indicates the morphological parts. These parts could in fact be identified directly as character strings, as in: <li id=simple1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <part>un <part>pack <part>ed </analysis> </word> ]]> </xmp> </ol> </gdoc>

.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1 .* Document proper begins. Notes on the Encoding of Linguistic Analysis <author>D. Terence Langendoen <address> <aline>Department of Linguistics <aline>University of Arizona <aline>Tucson, AZ 85721 USA <aline>E-mail: langendt@arizvm1 (bitnet) <aline>Phone: (602) 621-6898 </address> <date>18 January 1990 </titlep> <body> <h1>Sample markup possibilities for the English word 'unpacked' <p>Assuming a fully specified lexicon and (word-formation) grammar, here are schematic markups for the three interpretations of the English word 'unpacked', which assumes that none of these are entered in the lexicon, but that there are entries for the following: 'pack' 'unpack', 'un' (two different ones), 'ed' (two different ones). <ol> <li id=anal1> <xmp> <![ CDATA [ <word id=w999> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <lexitem id=un1l> <analysis> <gloss g='contents put in'> <adjective> <rule id=ed2r> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </analysis> </word> ]]> </xmp> <lp>The category tag (here <tag>adjective</tag>) also contains attributes identifying its argument structure and selectional restrictions. <lp>Rule un1r is the rule for forming 'negative adjectives' from adjectives. <lp>The rule has 2 parts, the prefix identified in the lexicon as un1l, and something which itself the result of an analysis. <lp>The lexical item un1l is prefixed to an adjective which in this case is composed by the rule ed2r, which suffixes the form lexically identified as ed2l to the lexical entry identified as pack3l the entry in fact may be a subentry under the lemma 'pack' to be identified by a mechanism such as Gary Simons suggests in AIW12. <lp>No attributes are provided here though they could be. <li id=anal2> <xmp> <![ CDATA [ <word id=w999> <analysis> <gloss g='contents removed'> <verb tense=no form=past-participle voice=passive arglist=subject rolelist=source typelist=container> <rule id=ed3r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>Rule ed3r forms passive past participles from verb stems I assume for purposes of illustration that the rule is distinct both from the rule that forms active past participles and from the rule that forms adjectives from verbs with the same morphology. <lp>I assume for this illustration that the verb 'unpack' is listed directly in the lexicon and does not have to be formed by a morphological rule from the prefix un2l and the verb stem pack3l. <li id=anal3> <xmp> <![ CDATA [ <word id=w999> <text>unpacked <analysis> <gloss g='removed contents'> <verb tense=past arglist=(subject direct-object) rolelist=(agent source) typelist=(person container)> <rule id=ed2r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>I use parentheses to enclose list items, pending clarification as to the correct SGML syntax to use in this situation. <lp>The three preceding analyses could be provided together with or without a ranking provided. Presumably, id's should be provided for each of the constituent analysis tags. If the ranking is omitted, then it is assumed that the alternatives are equally ranked. <li id=comb1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis id=a1 rank=1> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <lexitem id=un1l> <analysis> <gloss g='contents put in'> <adjective> <rule id=ed2r> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </analysis> <analysis id=a2 rank=2> <gloss g='contents removed'> <verb tense=no form=past-participle voice=passive arglist=subject rolelist=source typelist=container> <rule id=ed3r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> <analysis id=a3 rank=3> <gloss g='removed contents'> <verb tense=past arglist=(subject direct-object) rolelist=(agent source) typelist=(person container)> <rule id=ed2r> <lexitem id=unpack1l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp><tag>Analysis</tag> is an embedded analysis-tag. <lp>We now give a variant of <liref refid=anal1> in which we flatten the analysis; that is, we provide an analysis in which we identify the rules and morphological elements that combine, but do not indicate the order of combination. <li id=flat1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <rule id=un1r> <rule id=ed2r> <lexitem id=un1l> <lexitem id=pack3l> <lexitem id=ed2l> </analysis> </word> ]]> </xmp> <lp>If we now eliminate reference to rules, we have a representation which just indicates the morphological parts. These parts could in fact be identified directly as character strings, as in: <li id=simple1> <xmp> <![ CDATA [ <word> <text>unpacked <analysis> <gloss g='contents not put in'> <adjective arglist=subject rolelist=source typelist=container> <part>un <part>pack <part>ed </analysis> </word> ]]> </xmp> </ol> </gdoc>