ESSLLI99, Butt/Frank/Kuhn:

Development of large scale LFG grammars --

Linguistics, Engineering and Resources



Collection of Exercises


Exercises on rule annotation, subcategorization and constraining equations

Start from the grammar rule-anno-ex.lfg

Exercise 1 -- Missing Entries

If the problem is that a word is not listed in the lexicon, a message to this effect will appear in the XLE window, in addition to the morphology window appearing:

    
% parse "a monkey in the garden devours a banana"
parsing {a monkey in the garden devours a banana}  

 Chart unconnected because of unknown words
 Word possibly causing problem: garden 
0 solutions, 0.01 CPU seconds, 0 subtrees
0
%

Add the missing entry.

Exercise 2 - Extending c-structure rules to cover more adjuncts

Extend the lexicon and the VP-rule to cover temporal adverbs:

In terms of f-structure, their contribution should end up in the same set as those of local PPs.

Exercise 3 - New verbs with different subcategorization


Exercise 4 - Constraining equations and underspecification



Solution

rule-anno-ex.sol.lfg


Exercises on templates, lexical rules and functional uncertainty (English)

Exercise 1 -- Templates

Restructure the lexicon using (a hierarchy of) templates.

Using the handful of existing templates as a model, write more templates for the following (and use these names):

NameFunction
CASECase Assignment
PROPERNOUNFor Proper Nouns
NOUN-PLFor plural common nouns
DETFor Determiners
TENSETense Marking
ASPECTAspect Marking
TRANSFor transitive verbs
INTRANSFor intransitive verbs
PREPFor prepositions
V-S-RAISINGFor modals

The use of the templates is illustrated with the noun girl and the pronouns he, she, it, them.

If you want to write more templates or in order to try to organize things differently, you are most welcome to do so.

Note: In this exercise we have introduced the use of the variable %stem. This is a very useful device within XLE which allows you to make XLE do the work of figuring out the stem (based on the headword in the lexical entry you have written). This avoids cut-and-paste errors as in the monkey and park situation of the first exercise. However, it is really useful only in conjunction with a morphological analyzer that will return the ``real'' stems (eat as opposed to eats). This will be introduced at the end of the week.

Exercise 2 -- Lexical Rules

Exercise 3 -- Functional Uncertainty

Run the testfile and make sure you understand the treatment of topicalization via functional uncertainty (as was demonstrated in class).

Relative clauses are another candidate for treatment via functional uncertainty. At the moment, the relative pronoun can only be interpreted locally (i.e., without modals or embedded clauses).

Parse the two sentences in the testfile (repeated here) and make sure you understand the analysis of relative clauses.

[a.] The boy who laughed saw a monkey.
[b.] The boy who she saw is sleeping.


Add a functional uncertainty analysis to cover the following data.

[a.] The boy who should laugh is sleeping.
[b.] The banana which the monkey should eat is in the park.
[c.] The boy who she thinks she sees is in the park.


Optional: Add a treatment of relative pronouns in PPs.

[a.] The monkey of whom she is thinking is in the park.
[b.] The monkey of whom she should think is in the park.
[c.] The monkey of whom she thinks that he is sleeping is in the park.

What should the f-structure under the TOPIC be when the PP is a relative phrase? (Recall that this phenomenon has been called ``Pied Piping'': the fronted relative pronoun is joined by the preposition, which can't be stranded.)

Exercise 4 -- Constraining Equations

Make sure that you can't parse sentences like

[a.] *The boy which is in the park is sleeping.
[b.] *The banana who the monkey should eat is in the park.



Hint: This will involve introducing a feature ANIM with the values + or -. Use constraining equations to make sure you get only the right analyses.


Solution

fu-ex-engl.sol.lfg


Exercises on templates (French)

Start from the grammar mini-french2.lfg

The testfile is mini-french-tests

Exercise 1

The grammar comes with a number of predefined template definitions. Introduce templates for the following phenomena. Try to use the following template names:

num, gend, case,
common_noun, proper_noun, n_agr, n_xx_agr,
pron_pers, spec, prep
v_s, v_s_o, v_s_o_o2, v_s_obl, v_s_xcomp_rais, v_s_compfin
s_v_agr


Solution

french-templates.lfg


Exercises on functional uncertainty (French)

Start from the grammar french-templates.lfg

Exercise 1

Introduce templates and lexicon entries for raising verbs, as e.g. devoir, vouloir.

Il doit venir.
* Il doit vient.


Exercise 2

The grammar contains a restricted analysis of relative clauses. Look at the analysis of relative clauses in your testfile mini-french-tests. Only subject and object relative pronouns that act as TOPIC of the local relative clause are captured.

Extend the analysis to cover the following phenomena:


Solution

french-fu.lfg


German grammar - Exercises on templates, lexical rules and functional uncertainty

Start from the grammar fu-ex-ger.lfg

A testfile is in fu-ex-ger-testfile

1.
Restructure the lexicon using (a hierarchy of) templates. Look at how common nouns are done to get an initial idea. (You need not deal with every single entry, but do look at pronouns and verbs.)
2.
Run the testfile and try to understand the functional uncertainty analysis of topicalization.

3.
Add the rule parts required to extend the analysis to topicalization of prepositional objects:

4.
Currently, only few relative clauses work, since the relative pronoun can only be interpreted locally - so, without modal verbs, or embedded clauses. Add a functional uncertainty analysis to cover the following data:

5.
Add a treatment of relative pronouns in PPs.

Note that the TOPIC of the relative clause plays a role in checking for gender agreement with the noun that the relative clause modifies (look at the NP rule where CPrel is introduced).

In this light, what has to be the f-structure that you want under TOPIC with a PP as a relative phrase? (Recall that this phenomenon has been called ``Pied Piping'': the fronted relative pronoun is joined by the preposition, which can't be stranded in German.)


Exercises on Coordination (English)

Start from the small grammar coord-ex-engl.lfg
The testfile is coord-ex-testfile

1.
Add a simple coordination analysis in the NP rule. Assume for the moment that there's no f-structure contribution coming from the conjunction. Parse just a coordinated NP:

parse {NP: Mary and John}

Is there any case information?

Now parse

parse {I saw Mary and John}

How does the case information get there?

(Click on the c-structure nodes with both mouse buttons to get the f-structure that's projected from that node.

If you hold down the control key and click with both buttons, you'll see the annotations that are in the rules for that node.)

2.
Add coordination at the level of S. Look at how the f-structure information is propagated for an example that involves functional uncertainty and coordination.

3.
Use a rule macro to generalize all usages of the coordination scheme. Note: macro definitions go in the RULES section (not the TEMPLATES section).

Now, extend the scheme to cover coordinations with more than two conjuncts:

4.
Now, see what happens when you have the conjunctions introduce their own f-structure contribution, e.g. (^ CONJ-FORM) = and.
Check the different possibilities of annotating the conjunction in the coordination rule.

First, try the (default) annotation ^=!

Where does the CONJ-FORM feature end up for

What happens with

Next, try ! $ ^

Compare the effect of

parse {NP: Mary and John}

and

parse {I saw Mary and John}

Finally, define CONJ-FORM as a non-distributive feature. See where the information ends up in the f-structure. You should now be able to parse all previous sentences.

5.
What do you have to do to get the following data correct?

(Hint: try saying something like - either I'm in a structure where nothing is said about number, or I'd like to be plural.)

What about

(In the testfile, you can find a special annotation to mark ungrammatical sentences - the first number follwed by an exclamation mark denotes the number of expected readings.)

Optional additional exercise:
When you're finished with this exercise sheet, build the coordination analysis into the larger English, German, or French grammar.


Solution

coord-ex-engl.sol.lfg


Exercises on Coordination and Macros (French)

Start from the grammar french-sublex.lfg

Exercise 1: NP-/PP-coordination


Exercise 2: Parameterized macros for coordination


Solution

french-coord-macros.lfg


The Interface to Morphological Analyzers
- Exercises (English)

Start from the small grammar morph-ex-engl.lfg

The testfile is morph-ex-testfile
The CONFIG is set up in such a way that you can use the morphological analyzer.

1.
Play with the XLE command analyze-string (for one or two words). It shows you the output of the tokenizer.

2.
Parse the sentence

Hold down the control key and click on a tree node with the left mouse button in order to expand the tree display to show sublexical nodes.

You can now see which words in the sentence are still analyzed using full form entries.

Furthermore, you can look at the transition graph by choosing the menu item `Show Morph Window' in the `Commands' menu of the tree window (upper left-hand window).

3.
Write a sublexical rule for the D (determiner) category and make sure there are ``sublexical lexicon entries'' for the tags occurring in the morphology output for determiners (+SP is for `singular or plural' - how can you capture this in terms of f-structure?).

4.
Now also include the verb in the morhology-based part of the grammar. Where should the person and number information given by the morphology go in the f-structure?

Note that you have to introduce the subcategorization frame for each verb somewhere. This information is not included in the morphology output. So, where can you attach it to?

You should be able to parse the following sentences:

Recall that a string of letters can be assigned several categories, divided by semicolons. So you can do the following:

  fall         N-S XLE    @MY-TEMPLATE-FOR-NOUN-STEMS;
               V-S XLE    @MY-TEMPLATE-FOR-VERB-STEMS.

5.
As a next step, make use of the -Lunknown mechanism to deal with input for which there is no lexicon entry. This is useful for the lemma forms output by the morphology. Write an entry for -Lunknown and assign it all sublexical stem categories you want to deal with in this way.

(For verbs you have to ``guess'' what the subcategorization is.)

What analyses do you now get for

Try to parse sentences with other common nouns and intransitive verbs - in principle, they should all work now! If you guess verbs to be either transive or intransitive, it will also work for transitive verbs.

(Note that using %stem in the PRED values now gives you the lemma form as the predicate.)

6.
Further things the tokenizer will do for you

You can now use sentence-initial capitalization for arbitrary words, since the tokenizer will turn it back into the lower-case variant. The tokenizer does a lot of such preprocessing tasks.

7.
Optional exercise
Build the morphology interface into the larger English/French/German grammar (maybe just for some categories). You can either use your own grammar or the solution grammar coord-ex-engl.sol.lfg

It is of course okay to keep around a few full form entries for particularly complicated words (e.g., auxiliaries).

For French and German, there are several issues that are more interesting than what occurs with the English morphology. We have some particular exercises for these languages - so if you're interested, ask us.


Solution

morph-ex-engl.sol.lfg


Exercises on morphology interface (French)

Start from the grammar french-sublex.lfg

Exercise 1

The grammar is now interfaced with tokenization and morphology for French. The relevant finite-state transducers are referenced in the grammar configuration.

For some categories you already find sublexical rules and lexical entries in the grammar sections FRENCH SUBLEX RULES and FRENCH SUBLEX LEXICON.

Lexical entries for nouns, verbs, prepositions and (new!) adverbs are now only specified for stem forms. Try to understand the differences that arise between the previous grammar version without sublexical rules and the new version, by looking at template definitions and e.g. the definition of agreement constraints.

Test how the grammar behaves now. Use ``unknown'' vocabulary for the categories that are described by generic Lunknown lexical entries. You may also introduce further generic Lunknown lexical entries.

Exercise 2: Sublexical rules

Define the morphology interface for clitics and determiners, following the model of the existing sublexical rules. The lexical entries will have to be marked by XLE instead of *.
The relevant morphological analyses for clitics and determiners are stated below. You can obtain these morphological analyses in XLE with the command
analyze-string <string>.

je   je+Nom+InvGen+SG+P1+PC
tu   tu+Nom+InvGen+SG+P2+PC
il   il+Nom+Masc+SG+P3+PC
elle il+Nom+Fem+SG+P3+PC
ils  il+Nom+Masc+PL+P3+PC

le   le+Masc+SG+Def+Det
     le+Acc+Masc+SG+P3+PC
la   le+Acc+Fem+SG+P3+PC
     le+Fem+SG+Def+Det
les  le+InvGen+PL+Def+Det
     le+Acc+InvGen+PL+P3+PC

lui  lui+Dat+InvGen+SG+P3+PC
leur leur+Dat+InvGen+PL+P3+PC

un  un+Masc+SG+Indef+Det
une un+Fem+SG+Indef+Det
des un+InvGen+PL+Indef+Det


Solution

french-sublex2.lfg


Processing Aspects, Part I
- Exercises

Start from the small English context-free grammar proc-ex1-engl-cf.lfg


Solution

proc-ex1-engl-cf.sol.lfg


Processing Aspects, Part II
- Exercises

The idea of this exercise is to compare two versions of a grammar of German: one making the clause type distinction at the level of f-structure (proc-ex2-german-f.lfg), the other at the level of c-structure (proc-ex2-german-c.lfg). The latter employs the concept of rule parametrization by complex category symbols.



Ambiguity, Generation and Overgeneration

Start from the grammar french-ambig.lfg

The testfile is test-ambiguity

Exercise 1: Object Drop

Modify the transitive verb template (v_s_o) to allow for object drop.

Check your analysis with the relevant examples in test-ambiguity (introduced by the comment #object drop).

Exercise 2: Headless NPs

Extend the present NP rule to account for headless NPs. In the headless construction, the f-structure for the missing noun head should contain a feature PRED= 'pro', with PRON-TYPE= null. The adjective is to be represented as an adjunct, as usual:

1#1

First test your analysis with the NP le petit (`the small one').

Then analyze the following sentences: (also contained in the testfile)

(Context:
Deux boutons s'allument. `Two buttons light up.')

Le vert indique si l'imprimante marche.
`The green (one) indicates that the printer works.'

Appuyez sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'

Ne pas appuyer sur le rouge du côté droit.
not press on the red on the side right
`Do not press the red (one) on the right-hand side'


Exercise 3: Reflexive Constructions

The grammar contains a simple analysis for inherently reflexive verbs, e.g. se casser (break), s'évanouir (faint), s'allumer (light up) or se réveiller (wake up). These verbs are analyzed as intransitive reflexive verbs (VTYPE= reflexive). The reflexive clitic ( se) is not represented as an argument. It is represented as part of the verbal morphology by the feature VMORPH= cl(itic).

Exercise 4: Lexical Ambiguities

The NP rule contains an analysis for noun compounds (photocopie couleur (`color photocopy'), document papier (`paper document')), which is currently commented out.

Make the compound rule active and check the effect by reparsing the examples you analyzed before, using parse-testfile <num>.

In the following, it may be wise to put the noun-compound rule into comments again.


Exercise 5: Generation

Make sure that your .xlerc-file contains the following line:

set defaultSocketPorts(generator) 20<n>, with <n> your student number,
e.g. set defaultSocketPorts(generator) 2001.

Otherwise your generation process will prevent other users to start their own generation processes.


Solution

french-ambig2.lfg


Exercises on Constraint Ranking

Start from the grammar french-ambig2.lfg

The testfile is test-ot

CONFIG for OT-style constraint ranking

The CONFIG section of your grammar contains two lines which state keywords for OT-constraint ranking in analysis and generation.

OPTIMALITYRANKING: NEUTRAL UNGRAMMATICAL NOGOOD.
GENOPTIMALITYRANKING: NEUTRAL UNGRAMMATICAL NOGOOD.

The ot-marks you will define in the following exercises in the o-projection will have to be integrated and ranked relative to each other in this part of the configuration.

In XLE, o-descriptions are defined as follows: otmark $ o::*

The o-projection can be inspected by clicking the o:: button in the left-hand f-structure window.

For analyses that involve ot-marks, XLE displays the active ot-marks in their ranked order in the lower right-hand side window. You can reactivate suboptimal analyses by clicking at selected ot-marks, or you can activate all suboptimal analyses by choosing Unoptimal in the Options menue.

XLE displays the number of optimal vs. unoptimal solutions in the following way:
<numopt>+<numsubopt>, e.g. 1+7.


Exercise 1: Object Drop

Define an OT mark ObjDrop for object drop that avoids the unwarranted object-drop ambiguity in:

Le vert indique si le moteur marche.
`The green one indicates whether the motor works.'

Do you still get an analysis for:

ne pas ouvrir.   (`do not open')


Exercise 2: Headless Noun Phrases

Introduce an OT constraint HeadlessNP that filters unwarranted ambiguities in the following cases:

NP: le conducteur  (`the driver')
NP: le moteur   (`the motor')

Now reconsider again the following sentences:

Le vert indique si l'imprimante marche.
`The green (one) indicates that the printer works.'

Appuyer sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'


Exercise 3:

If you reparse the previous sentence with the (currently commented) compound analysis in NPap activated , the previous sentence gives you unwarranted ambiguities for la fenêtre droite ('the right-hand side') due to the lexical ambiguity of la as a noun (meaning `tone A'). Introduce a lexicalized OT-constraint in a noun entry for la that marks la as a RareNoun, and thus avoids one of these ambiguities.

Reparse the following sentences, and then put the compound rule into comments again.

Appuyer sur le rouge dans la fenêtre droite.
`Press the red (one) in the right window.'

Ne pas appuyer sur le rouge du coté droit.
`Do not press the red (one) on the right-hand side.'


Exercise 4: Reflexive Constructions

State an OT-constraint (using an ot-mark InhRefl) that filter unwarranted ambiguities for the following sentences, by expressing a preference for inherently reflexive verbs over semantic reflexivization:

Le moteur se casse.    (`The motor breaks')

Les feux s'allument.   (`The lights go on')

Again, consider the additional examples:

Les enfants se réveillent (mutuellement).   (`The children wake (each other) up')

Marie se voit.    ('Mary sees herself.')


Exercise 5: Oblique vs. Adjunct PPs

Introduce an OT-constraint (using the mark PPobl) to prefer the oblique PP analysis over the adjunct reading. Test the results with the following sentences:

Marie renonce au voyage.   ('Mary abandons the travel')

Marie renonce au premier essai.   ('Mary abandons (at) the first attempt')

Le chat est tué par un chasseur.    (`The cat is killed by a hunter.')

Le chat est tué par inadvertance.    (`The cat is killed by accident.')

OPTIONAL: Prenominal vs. postnominal APs

Try to state OT constraints (both in the grammar and in the lexicon) to cover the following generalization (in generation only):

Use the OT-marks Postnom and Prenom.

Check your constraint mechanism in generation with the following sentences:

NP: un grand joli chat    (`a big pretty cat')

NP: un grand joli chat affamé    (`a big, pretty, starving cat')

Les jolis chats de Marie dorment sous une grande table grise.
`Mary's pretty cats sleep under a big grey table'

Your generation preference constraints should determine the order of adjectives relative to the head correctly, i.e. as given in these example sentences.

Now execute parse-testfile test-ot.new and inspect the file test-ot.new.errors to see the effect of your constraints on the number of (optimal) analyses assigned.



Solution

french-ot.lfg