ESSLLI99, Butt/Frank/Kuhn:
Development of large scale LFG grammars --
Linguistics, Engineering and Resources
Collection of Exercises
If the problem is that a word is not listed in the lexicon, a message to this effect will appear in the XLE window, in addition to the morphology window appearing:
% parse "a monkey in the garden devours a banana" parsing {a monkey in the garden devours a banana} Chart unconnected because of unknown words Word possibly causing problem: garden 0 solutions, 0.01 CPU seconds, 0 subtrees 0 %
Add the missing entry.
Extend the lexicon and the VP-rule to cover temporal adverbs:
In terms of f-structure, their contribution should end up in the same set as those of local PPs.
Make sure that the grammatical functions you are using are defined in the CONFIG section.
Your analysis should avoid adding a spurious ambiguity to ordinary transitive sentences:
How do you have to modify the f-annotation of the PPs in the verb rule?
Again, don't forget to add the new grammatical function in the CONFIG section.
What are the options for fixing the problem?
Restructure the lexicon using (a hierarchy of) templates.
Using the handful of existing templates as a model, write more templates for the following (and use these names):
Name | Function |
---|---|
CASE | Case Assignment |
PROPERNOUN | For Proper Nouns |
NOUN-PL | For plural common nouns |
DET | For Determiners |
TENSE | Tense Marking |
ASPECT | Aspect Marking |
TRANS | For transitive verbs |
INTRANS | For intransitive verbs |
PREP | For prepositions |
V-S-RAISING | For modals |
The use of the templates is illustrated with the noun girl and the pronouns he, she, it, them.
If you want to write more templates or in order to try to organize things differently, you are most welcome to do so.
Note: In this exercise we have introduced the use of the variable %stem. This is a very useful device within XLE which allows you to make XLE do the work of figuring out the stem (based on the headword in the lexical entry you have written). This avoids cut-and-paste errors as in the monkey and park situation of the first exercise. However, it is really useful only in conjunction with a morphological analyzer that will return the ``real'' stems (eat as opposed to eats). This will be introduced at the end of the week.
A template for passivizable stems is already in the TEMPLATE section, but you still have to fill in the rewrite rules that make up the core of the lexical rule for passive.
First, write the rule that will suppress the subject, and make the object the subject.
Then add an alternative where the underlying subject is realized as an oblique PP with by. (Constrain this disjunct appropriately.)
The template should also work for ditransitive verbs. Add the template call to the appropriate lexical entries.
Add a new lexical rule for dative shift. Make sure it interacts appropriately with the passive lexical rule (which one has to apply first, i.e., should be embedded in the other one?)
Run the testfile and make sure you understand the treatment of topicalization via functional uncertainty (as was demonstrated in class).
Relative clauses are another candidate for treatment via functional uncertainty. At the moment, the relative pronoun can only be interpreted locally (i.e., without modals or embedded clauses).
Parse the two sentences in the testfile (repeated here) and make sure you understand the analysis of relative clauses.
[a.] The boy who laughed saw a monkey.
[b.] The boy who she saw is sleeping.
Add a functional uncertainty analysis to cover the following data.
[a.] The boy who should laugh is sleeping.
[b.] The banana which the monkey should eat is in the park.
[c.] The boy who she thinks she sees is in the park.
Optional: Add a treatment of relative pronouns in PPs.
[a.] The monkey of whom she is thinking is in the park.
[b.] The monkey of whom she should think is in the park.
[c.] The monkey of whom she thinks that he is sleeping is in the
park.
What should the f-structure under the TOPIC be when the PP is a relative phrase? (Recall that this phenomenon has been called ``Pied Piping'': the fronted relative pronoun is joined by the preposition, which can't be stranded.)
Make sure that you can't parse sentences like
[a.] *The boy which is in the park is sleeping.
[b.] *The banana who the monkey should eat is in the park.
Hint: This will involve introducing a feature ANIM with the values + or -. Use constraining equations to make sure you get only the right analyses.
Start from the grammar mini-french2.lfg
The testfile is mini-french-tests
The grammar comes with a number of predefined template definitions. Introduce templates for the following phenomena. Try to use the following template names:
num, gend, case,
common_noun, proper_noun,
n_agr, n_xx_agr,
pron_pers, spec, prep
v_s, v_s_o, v_s_o_o2, v_s_obl, v_s_xcomp_rais,
v_s_compfin
s_v_agr
Introduce templates and lexicon entries for raising verbs, as e.g. devoir, vouloir.
Il doit venir.
* Il doit vient.
The grammar contains a restricted analysis of relative clauses. Look at the analysis of relative clauses in your testfile mini-french-tests. Only subject and object relative pronouns that act as TOPIC of the local relative clause are captured.
Extend the analysis to cover the following phenomena:
Jean voit le chat que il doit donner à Marie.
Jean pense au chat que la fille doit aimer.
How do you proceed for subject relative pronouns?
Jean voit la fille qui doit venir.
Jean voit la fille à laquelle il pense.
Note that the head noun and the relative pronoun must agree in number and gender.
Again, consider the treatment of long dependencies via functional uncertainty:
Jean voit la fille à laquelle il doit penser.
Jean voit la chat que aime Marie.
Jean voit la fille que doit aimer Marie.
Start from the grammar fu-ex-ger.lfg
A testfile is in fu-ex-ger-testfile
Note that the TOPIC of the relative clause plays a role in checking for gender agreement with the noun that the relative clause modifies (look at the NP rule where CPrel is introduced).
In this light, what has to be the f-structure that you want under TOPIC with a PP as a relative phrase? (Recall that this phenomenon has been called ``Pied Piping'': the fronted relative pronoun is joined by the preposition, which can't be stranded in German.)
Start from the small grammar
coord-ex-engl.lfg
The testfile is coord-ex-testfile
parse {NP: Mary and John}
Is there any case information?
Now parse
parse {I saw Mary and John}
How does the case information get there?
(Click on the c-structure nodes with both mouse buttons to get the f-structure that's projected from that node.
If you hold down the control key and click with both buttons, you'll see the annotations that are in the rules for that node.)
Now, extend the scheme to cover coordinations with more than two conjuncts:
(^ CONJ-FORM) = and
.
First, try the (default) annotation ^=!
Where does the CONJ-FORM feature end up for
What happens with
Next, try ! $ ^
Compare the effect of
parse {NP: Mary and John}
and
parse {I saw Mary and John}
Finally, define CONJ-FORM as a non-distributive feature. See where the information ends up in the f-structure. You should now be able to parse all previous sentences.
(Hint: try saying something like - either I'm in a structure where nothing is said about number, or I'd like to be plural.)
What about
(In the testfile, you can find a special annotation to mark ungrammatical sentences - the first number follwed by an exclamation mark denotes the number of expected readings.)
Optional additional exercise:
When you're finished with this exercise sheet, build the coordination
analysis into the larger English, German, or French grammar.
NP: Jean et Marie
NP: la fille et les chats
Jean et Marie viennent souvent.
* La femme et la fille dort.
La femme ou la fille doit dormir.
La femme ou la fille doivent dormir.
Jean et moi viennent souvent.
Il voit la fille qui dort et le chat qui court.
La fille dort à l'école et sur le bateau.
* La fille pense au vacances et sur le bateau.
Il voit la table sur laquelle ou sous laquelle elle dort.
Jean, Pierre et Marie viennent souvent.
Le chat court dans la rue, dans le jardin, sur les toits et sur la table.
NP: le format et l'orientation du papier
NP: le format et l'orientation du papier et la qualité de numérisation
Start from the small grammar morph-ex-engl.lfg
The testfile is morph-ex-testfile
The CONFIG is set up in such a way that you can use the
morphological analyzer.
analyze-string
(for one or
two words).
It shows you the
output of the tokenizer.
analyze-string the
analyze-string leaves
analyze-string {leaves fall}
Hold down the control key and click on a tree node with the left mouse button in order to expand the tree display to show sublexical nodes.
You can now see which words in the sentence are still analyzed using full form entries.
Furthermore, you can look at the transition graph by choosing the menu item `Show Morph Window' in the `Commands' menu of the tree window (upper left-hand window).
+SP
is for `singular or plural' - how
can you capture this in terms of f-structure?).
Note that you have to introduce the subcategorization frame for each verb somewhere. This information is not included in the morphology output. So, where can you attach it to?
You should be able to parse the following sentences:
Recall that a string of letters can be assigned several categories, divided by semicolons. So you can do the following:
fall N-S XLE @MY-TEMPLATE-FOR-NOUN-STEMS; V-S XLE @MY-TEMPLATE-FOR-VERB-STEMS.
-Lunknown
mechanism
to deal with input for which there is no lexicon entry. This is
useful for the lemma forms output by the morphology. Write an
entry for -Lunknown
and assign it all sublexical stem
categories you want to deal with in this way.
(For verbs you have to ``guess'' what the subcategorization is.)
What analyses do you now get for
Try to parse sentences with other common nouns and intransitive verbs - in principle, they should all work now! If you guess verbs to be either transive or intransitive, it will also work for transitive verbs.
(Note that using %stem
in the PRED values now gives
you the lemma form as the predicate.)
You can now use sentence-initial capitalization for arbitrary words, since the tokenizer will turn it back into the lower-case variant. The tokenizer does a lot of such preprocessing tasks.
It is of course okay to keep around a few full form entries for particularly complicated words (e.g., auxiliaries).
For French and German, there are several issues that are more interesting than what occurs with the English morphology. We have some particular exercises for these languages - so if you're interested, ask us.
The grammar is now interfaced with tokenization and morphology for French. The relevant finite-state transducers are referenced in the grammar configuration.
For some categories you already find sublexical rules and lexical entries in the grammar sections FRENCH SUBLEX RULES and FRENCH SUBLEX LEXICON.
Lexical entries for nouns, verbs, prepositions and (new!) adverbs are now only specified for stem forms. Try to understand the differences that arise between the previous grammar version without sublexical rules and the new version, by looking at template definitions and e.g. the definition of agreement constraints.
Test how the grammar behaves now. Use ``unknown'' vocabulary for the categories that are described by generic Lunknown lexical entries. You may also introduce further generic Lunknown lexical entries.
Please add:
- to the CONFIG section: OPTIMALITYRANKING: NEUTRAL Unknown
NOGOOD.
- to the LUnknown lexical entry for N: Unknown $ o::*
Define the morphology interface for clitics and determiners, following
the model of the existing sublexical rules. The lexical entries will
have to be marked by XLE instead of *.
The relevant morphological analyses for clitics and determiners are
stated below. You can obtain these morphological analyses in XLE with the
command
analyze-string <string>.
je je+Nom+InvGen+SG+P1+PC
tu tu+Nom+InvGen+SG+P2+PC
il il+Nom+Masc+SG+P3+PC
elle il+Nom+Fem+SG+P3+PC
ils il+Nom+Masc+PL+P3+PC
le le+Masc+SG+Def+Det
le+Acc+Masc+SG+P3+PC
la le+Acc+Fem+SG+P3+PC
le+Fem+SG+Def+Det
les le+InvGen+PL+Def+Det
le+Acc+InvGen+PL+P3+PC
lui lui+Dat+InvGen+SG+P3+PC
leur leur+Dat+InvGen+PL+P3+PC
un un+Masc+SG+Indef+Det
une un+Fem+SG+Indef+Det
des un+InvGen+PL+Indef+Det
Start from the small English context-free grammar
proc-ex1-engl-cf.lfg
A display of the chart
can be displayed from the `Commands' menu in the tree
window.
(What is the number of
the chart edge for the N'
category that's at the
interface
between the two non-interacting ambiguities?)
Click on the boxes representing edges with both mouse buttons
to get a display of this particular ``sub-forrest''.
The idea of this exercise is to compare two versions of a grammar of German: one making the clause type distinction at the level of f-structure (proc-ex2-german-f.lfg), the other at the level of c-structure (proc-ex2-german-c.lfg). The latter employs the concept of rule parametrization by complex category symbols.
german-c.lfg
has not quite been
finished. You still have to do it for the NP and PP rule. As you can see,
for these categories the type information is still passed via the
f-structure feature TYPE. Introduce complex categories for these
categories.
You should best proceed in two steps.
parse {NP[std]: der Mann}
parse {NP[std]: sie}
parse {NP[std]: diesen Mann}
parse {NP[int]: welchen Mann}
parse {NP[std]: ihn}
parse {NP[rel]: den}
In the disjuncts for which the TYPE feature was fixed to one particular
value should also fix the _type
parameter to this value.
(^ TYPE) = ...
annotation by the new complex categories.
Make sure you pass in the correct instantiation of the parameter.
(Sometimes this will be a formal parameter that is percolated through
the rule, sometimes it will be a particular instance of the possible
clause types.)
Run the testfile testfile-german
with the former grammar:
parse-testfile testfile-german
Now, load your new grammar and compare the performance. To do so,
you can run the parse-testfile
command on the output
of the previous test run:
parse-testfile testfile-german.new
(In case you didn't manage to solve the first part of this exercise entirely, you can use the solution grammar proc-ex2-german-c.sol.lfg.
You may also compare the performance with the grammar proc-ex2-multi-param.lfg, which is more heavily parametrized.)
The testfile is test-ambiguity
Modify the transitive verb template (v_s_o) to allow for object drop.
Check your analysis with the relevant examples in test-ambiguity (introduced by the comment #object drop).
Extend the present NP rule to account for headless NPs. In the headless construction, the f-structure for the missing noun head should contain a feature PRED= 'pro', with PRON-TYPE= null. The adjective is to be represented as an adjunct, as usual:
1#1
First test your analysis with the NP le petit (`the small one').
Then analyze the following sentences: (also contained in the testfile)
(Context:
Deux boutons s'allument.
`Two buttons light up.')
Le vert indique si l'imprimante marche.
`The green (one) indicates that the printer works.'
Appuyez sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'
Ne pas appuyer sur le rouge du côté droit.
not press on the red on the side right
`Do not press the red (one) on the right-hand side'
The grammar contains a simple analysis for inherently reflexive verbs, e.g. se casser (break), s'évanouir (faint), s'allumer (light up) or se réveiller (wake up). These verbs are analyzed as intransitive reflexive verbs (VTYPE= reflexive). The reflexive clitic ( se) is not represented as an argument. It is represented as part of the verbal morphology by the feature VMORPH= cl(itic).
What happens if you replace these constraining equations by simple defining equations? Try the following ungrammatical sentences:
* Marie évanouit. (`Mary faints')
* Marie se dort. (`Mary sleeps')
Reintroduce the constraining equations.
Note: The passive lexical rule for transitive verbs is different from the previous versions in that it only defines the passive alternation. The active construction is stated separately in the transitive template. Proceed in a similar way for reflexivization, and only provide a template with designator rewrites (lexical rule) for the reflexive construction.
The lexical rule for reflexivation suppresses the OBJect of the corresponding nonreflexive active construction (NULL). As in the case of inherently reflexive verbs, the reflexive clitic is not represented as grammatical function subcategorized by the verb, but as a part of verbal morphology by the feature VMORPH= cl. As opposed to inherently reflexive verbs, which define a feature REFL = - (no semantic reflexivization), semantic reflexivization introduces the feature REFL= +. The argument binding of the suppressed argument (NULL) to the subject is supposed to take place in argument- or semantic structure, which are not used here.
3#3
Check your analysis with the sentence Marie se voit (`Mary sees herself').
Le conducteur casse le moteur.
`The driver breaks the motor'
Le moteur se casse.
`The motor breaks'
Les feux s'allument.
`The lights go on'
Les enfants se réveillent.
`The children wake up'
Which readings of those that you obtain do you consider appropriate?
The NP rule contains an analysis for noun compounds (photocopie couleur (`color photocopy'), document papier (`paper document')), which is currently commented out.
Make the compound rule active and check the effect by reparsing the examples you analyzed before, using parse-testfile <num>.
In the following, it may be wise to put the noun-compound rule into comments again.
Make sure that your .xlerc-file contains the following line:
set defaultSocketPorts(generator) 20<n>, with <n> your
student number,
e.g. set defaultSocketPorts(generator) 2001.
Otherwise your generation process will prevent other users to start their own generation processes.
Marie et Jean chantent. (`Mary and sing.')
Marie et Jean se réveillent. (`Mary and John wake up')
Un grand chat et un chien paisible se rencontrent dans une rue sans issue.
`A big cat and a dog peaceful (se) meet in a street without exit
`A big cat and a peaceful dog meet in a dead end street'
Marie danse dans le parc après le lever du jour.
`Mary dances in the park after sunrise'.
For each of them, go to the left-hand side f-structure window, click the Commands button, and choose Generate from this F-structure. The first time the generation server needs some time to get started. (Cf. the upcoming window Generator.) Go back to the XLE shell and see whether a sentence is generated. Which result do you get?
What do you observe with respect to the reflexive ambiguities and the PP attachment ambiguities in generation?
Try to state surface order constraints in the coordination macro to restrict the order of conjuncts in coordination. Test the results in generation.
Marie danse dans le parc après le lever du jour.
`Mary dances in the park after sunrise'.
Appuyez sur le rouge dans la fenêtre droite.
`Press the red one in the right window.'
NP: un grand chien affamé
a big dog starving (hungry)
`a big, starving dog'
Think about ways to constrain the order of adjectives in generation.
Extend your grammar slightly, to be able to analyze the following sentence:
Un grand joli chat affamé et un chien lourd, paisible et anxieux de
chats se rencontrent dans une rue sans issue.
A big pretty cat starving and a dog heavy, peaceful and anxious of cats
(se) meet in a street without exit
`A big, pretty, starving cat and a heavy, peaceful dog, anxious of cats
meet in a dead end street'
Proceed in the following way:
NP: un chien anxieux de chats a dog, anxious of cats
NP: un chien [paisible] et [anxieux de chats]
a dog peaceful and anxious of cats
Un grand joli chat affamé et un chien lourd, paisible et anxieux de chats se rencontrent dans une rue sans issue.
Un grand joli chat affamé et un chien lourd et paisible se rencontrent dans une rue sans issue.
Which alternatives are generated? Try generation without conjunct and adjunct order constraints.
Start from the grammar french-ambig2.lfg
The testfile is test-ot
The CONFIG section of your grammar contains two lines which state keywords for OT-constraint ranking in analysis and generation.
OPTIMALITYRANKING: NEUTRAL UNGRAMMATICAL NOGOOD.
GENOPTIMALITYRANKING: NEUTRAL UNGRAMMATICAL NOGOOD.
The ot-marks you will define in the following exercises in the o-projection will have to be integrated and ranked relative to each other in this part of the configuration.
In XLE, o-descriptions are defined as follows: otmark $ o::*
The o-projection can be inspected by clicking the o:: button in the left-hand f-structure window.
For analyses that involve ot-marks, XLE displays the active ot-marks in their ranked order in the lower right-hand side window. You can reactivate suboptimal analyses by clicking at selected ot-marks, or you can activate all suboptimal analyses by choosing Unoptimal in the Options menue.
XLE displays the number of optimal vs. unoptimal solutions in the
following way:
<numopt>+<numsubopt>, e.g. 1+7.
Define an OT mark ObjDrop for object drop that avoids the unwarranted object-drop ambiguity in:
Le vert indique si le moteur marche.
`The green one indicates whether the motor works.'
Do you still get an analysis for:
ne pas ouvrir. (`do not open')
Introduce an OT constraint HeadlessNP that filters unwarranted ambiguities in the following cases:
NP: le conducteur (`the driver')
NP: le moteur (`the motor')
Now reconsider again the following sentences:
Le vert indique si l'imprimante marche.
`The green (one) indicates that the printer works.'
Appuyer sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'
If you reparse the previous sentence with the (currently commented) compound analysis in NPap activated , the previous sentence gives you unwarranted ambiguities for la fenêtre droite ('the right-hand side') due to the lexical ambiguity of la as a noun (meaning `tone A'). Introduce a lexicalized OT-constraint in a noun entry for la that marks la as a RareNoun, and thus avoids one of these ambiguities.
Reparse the following sentences, and then put the compound rule into comments again.
Appuyer sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'
Ne pas appuyer sur le rouge du coté droit.
`Do not press the red (one) on the right-hand side.'
State an OT-constraint (using an ot-mark InhRefl) that filter unwarranted ambiguities for the following sentences, by expressing a preference for inherently reflexive verbs over semantic reflexivization:
Le moteur se casse. (`The motor breaks')
Les feux s'allument. (`The lights go on')
Again, consider the additional examples:
Les enfants se réveillent (mutuellement). (`The children wake (each other) up')
Marie se voit. ('Mary sees herself.')
Introduce an OT-constraint (using the mark PPobl) to prefer the oblique PP analysis over the adjunct reading. Test the results with the following sentences:
Marie renonce au voyage. ('Mary abandons the travel')
Marie renonce au premier essai. ('Mary abandons (at) the first attempt')
Le chat est tué par un chasseur. (`The cat is killed by a hunter.')
Le chat est tué par inadvertance. (`The cat is killed by accident.')
Try to state OT constraints (both in the grammar and in the lexicon) to cover the following generalization (in generation only):
Use the OT-marks Postnom and Prenom.
Check your constraint mechanism in generation with the following sentences:
NP: un grand joli chat (`a big pretty cat')
NP: un grand joli chat affamé (`a big, pretty, starving cat')
Les jolis chats de Marie dorment sous une grande table grise.
`Mary's pretty cats sleep under a big grey table'
Your generation preference constraints should determine the order of adjectives relative to the head correctly, i.e. as given in these example sentences.
Now execute parse-testfile test-ot.new and inspect the file test-ot.new.errors to see the effect of your constraints on the number of (optimal) analyses assigned.