Annotation guidelines (version 2.0)
Used by the
corpora annotated for multiword expressions
Textual annotation scope
In this annotation task, all occurrences of all syntactic types of MWEs are to be annotated in the text.
We annotate, as integral parts of MWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see IVPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.
Similarly, auxiliairies and modals accompanying the main verb of a MWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the MWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.
Both continuous and discontinuous sequences of lexicalized components of MWEs are annotated.
Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or idiomatic verb-particle constructions. Verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are annotated optionally and experimentally as inherently adpositional verbs (IAVs). On the other hand, prepositions selected by functional MWEs, such as in spite of, according to, etc. are considered lexicalized.
The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a MWE or not. We do not annotate their internal syntactic structure. We do annotate, however, MWEs embedded in other MWEs. For instance, the MWE to make a faux pas contains the embedded MWE faux pas and both are to be annotated as different MWEs. Embeddings are discussed on some category's pages, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).
Once identified in a text, MWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single MWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.