Annotation guidelines
PARSEME corpora annotated for multiword expressions


Textual annotation scope

In this annotation task, all occurrences of all syntactic types of VMWEs are to be annotated in the text.

We annotate, as integral parts of VMWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see VPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.

Similarly, auxiliairies and modals accompanying the main verb of a VMWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the VMWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.

Both continuous and discontinuous sequences of lexicalized components of VMWEs are annotated.

Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or verb-particle constructions. In this version of the guidelines, verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are re-introduced optionally and experimentally as via the inherently adpositional verbs (IAVs).

The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a VMWE or not. We do not annotate their internal syntactic structure. We do annotate, however, VMWEs embedded in other VMWEs. For instance, the VMWE to let the cat out of the bag contains the embedded VMWE let out and both are to be annotated as different VMWEs. Embeddings are discussed on each category's page, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).

Once identified in a text, VMWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single VMWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.


An error has occured !



PARSEME corpora annotation guidelines version 1.3.6 stable version, last updated on September 20, 2022