Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.1 (2018)


Verbal Multiword expressions

Multiword expressions (MWEs) are (continuous or discontinuous) sequences of words with the following compulsory properties:

  • They show some degree of orthographic, morphological, syntactic or semantic idiosyncrasy with respect to what is considered general grammar rules of a language. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not annotated.
  • Their component words include a head word and at least one other syntactically related word. Most often the relation they maintain is a syntactic (direct or indirect) dependence but it can also be e.g. a coordination. Depending on the category of the head word, the whole MWE can be nominal, adjectival, prepositional, verbal, sentential, etc.
  • At least two components of such a word sequence have to be lexicalized. In this task we only annotate the lexicalized components and ignore open slots.

Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that to kick the bucket means 'to die' and to spill the beans actually means 'to reveal a secret'.

However, as non-compositionality is a subjective notion, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that (verbal) MWEs have some degree of semantic non-compositionality that implies limited flexibility.

Verbal MWEs (VMWEs) are simply multiword expressions whose syntactic head in the prototypical form is a verb.