corpora of multiword expressions - version 1.2 (2020)
shared task on semi-supervised identification of verbal multiword expressions - edition 1.2 (2020)
Lexicalized components and open slots
Just like a regular verb, the head verb of a VMWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise.
Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone is not.This definition of a lexicalized component naturally extends to any syntactic type of MWE. Namely, the head of a (nominal, adjectival, prepositional etc.) MWE is lexicalized (always realized by the same lexeme) together with at least one component of at least one of its modifiers. The head verb of a VMWE is always considered lexicalized. When it can be replaced by another verb, like in to make/take a decision, we consider that these are two different VMWEs, although possibly synonymous.
Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:
- Max took the bull by the horns.
- The news took John by surprise.
- Bob took part in the inquiry
- Money burns a hole in Bob’s pocket.
Note on terminology: our definition of lexicalization applies to the component words of a VMWE, and not to the whole VMWE. This might be counter-intuitive, given the traditional definition of lexicalization as a diachronic process by which a lexeme (word or phrase) acquires the status of an autonomous lexical unit, that is, "a form which it could not have if it had arisen by the application of productive rules" (Bauer 1983, p. 50, apud Lipka et al. 2004, p. 6). In other words, traditionally linguistic studies would use the term "lexicalized" to refer to the whole VMWE, as it has idiosyncratic behavior and thus must be listed in the language's lexicon. Our definition, however, stems from computational linguistics and in particular from the parsing literature, in which lexicalized rules refer to rules containing terminal lexemes attached to non-terminal symbols, and a lexicalized grammar is a grammar in which the rules are lexicalized (Manning and Schütze 1999, p. 417; Jurafsky and Martin 2009, p. 507). In this sense, we regard VMWEs as syntactic subtrees in which some of the nodes are annotated with the corresponding terminal symbols that are always realized by the same lexeme (i.e. the lexicalized components) and others are non-terminal nodes that can be realized by any lexeme taken from a larger class (i.e. the open slots).
Prepositions have a special status with respect to the notion of lexicalization. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE, i.e. it belongs to the valency properties of the verb. Selected prepositions were discarded in edition 1.0 of the guidelines, and are now re-introduced experimentally and optionally via the inherently adpositional verbs (IAV). If the language team decides to take them into account, they are to be considered in the post-annotation step (step 4), i.e. when all other categories have previously been identified and categorized in the given sentence.
Reflexive clitics in inherently reflexive verbs and possesive pronouns in verbal idioms also have a special lexicalization status (see also the note on more or less frozen determiners). In some languages, the same reflexive clitic or possesive pronoun is used regardless of the person and number, inflecting for case only:
смея се laugh se.REFL to laugh
намирам се find se.REFL to be (somewhere)
smijem se laugh.1.SG self I laugh
smiješ se laugh.2.SG self You laugh
smiju se laugh.3.PL self they laugh
znajduję się find.1.SG.PRES self I find myself
znajdujesz się find.2.SG.PRES self you find yourself
znajdują się find.3.PL.PRES self they find themselves
pójdą na swoje they will go on ones's own they will establish their own household
pójdziemy na swoje we will go on ones's own we will establish our own household
smejim se laugh.1.SG self I laugh
smejiš se laugh.2.SG self You laugh
smejijo se laugh.3.PL self they laugh
радујем се radujem se look.1.SG.PRES froward to I look forward to
радујеш се raduješ se look.2.SG.PRES forward to you look forward to
радује се raduje se look.3.SG.PRES forward to She/He looks forward to
In other languages, reflexive clitics and possesive pronouns agree with the subject and the verb:
- No examples found for Bulgarian.
sie wundert sich she wonders self.3.SG she wonders
ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
Ο Γιάννης έκανε την πλάκα του The John made the fun his John had fun
Τα παιδιά έκαναν την πλάκα τους The kids made the fun their The kids had fun
- I will do my best, They will do their best
yo me quejo I self.1.SG complain I complain
tú te quejas you self.2.SG complain You complain
je me trouve I self.1.SG find I find myslef
tu te trouves you self.2.SG find you find yourself
je vide mon sac I empty my bag I express my secret feelings
elle vide son sac she empties her bag she expresses her secret feelings
io mi meraviglio I self.1.SG wonder I wonder
tu ti meravigli you self.2.SG woder you wonder
eu me queixo I self.1.SG complain I complain
tu te queixas you self.2.SG complain You complain
eu mă gândesc I Refl.Cl.1sg.Acc. think I am thinking
tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking
It this case, the clitic or the pronoun is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic and the possesive prounun is a unique lexeme (with lemma się, se, sich, etc. or swój, son, one's) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs and verbal idioms.