Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.0 (2017)


Textual annotation scope

In this annotation task, all occurrences of all syntactic types of VMWEs are to be annotated in the text.

We annotate, as integral parts of VMWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated at but case suffixes are not. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see VPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.

Both continuous and discontinuous sequences of lexicalized components of VMWEs are annotated.

Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or verb-particle combinations. In this version of the guidelines, verb+preposition combinations like to rely on somebody and to come across something are no longer considered VMWEs.

The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a VMWE or not. We do not annotate their internal syntactic structure. We do annotate, however, VMWEs embedded in other VMWEs. For instance, the VMWE to let the cat out of the bag contains the embedded VMWE let out and both are to be annotated as different VMWEs.

Once identified in a text, VMWEs are also to be assigned to exactly one of the categories described in the following sections. In this version of the guidelines, we no longer admit hesitation between two different categories. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.