Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the PARSEME corpora annotated for multiword expressions


Annotation process

We propose the following methodology for MWE annotation:

  • Step 1 - identify a candidate, that is, a combination of at least two words which could form a MWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). Find the neutral form of the candidate. The following steps should be applied to this neutral form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
  • Step 2 - determine which components of the candidate (in its neutral form) are lexicalized, that is, if they are omitted, the MWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
  • Step 3 - depending on the syntactic structure of the candidate's neutral form, formally check if it is a MWE using the generic and category-specific decision diagrams and tests decribed below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
  • Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.

The unique entry point to Step 3 above is the following test:

Top test - [DIST] - Distribution

What is the distribution of the neutral form of the candidate in the particular context? This can be tested by replacing the MWE candidate with a single word having the given part of speech, and checking if such a replacement, although possibly changing the meaning, does not lead to a loss of grammaticality or acceptability. If such a replacement test passes for a large class of single words of the same POS, the candidate is considered as having the distribution of this POS.

  • Determiner, conjunction, adposition or interjection ⇒ Apply the functional MWE testsFuncMWE tests positive?
    • Annotate with the FuncMWE subcategory determined via the guidelines
    • It is not a MWE, exit
  • Adjectival or adverbial phrase ⇒ Apply the adjectival and adverbial MWE testsAMWE tests positive?
    • Annotate with the AMWE subcategory determined via the guidelines
    • It is not a MWE, exit
  • Verb, verbal phrase or verbal clause ⇒ Apply the verbal MWE testsVMWE tests positive?
    • Annotate with the VMWE subcategory determined via the guidelines
    • It is not a MWE, exit
  • Noun or nominal phrase ⇒ Apply the nominal MWE testsNMWE tests positive?
    • Annotate with the NMWE subcategory determined via the guidelines
    • It is not a MWE, exit

An error has occured !