Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the
corpora annotated for multiword expressions
Annotation process
We propose the following methodology for MWE annotation:
- Step 1 - identify a candidate, that is, a combination of at least two words which could form a MWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). Find the neutral form of the candidate. The following steps should be applied to this neutral form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
- Step 2 - determine which components of the candidate (in its neutral form) are lexicalized, that is, if they are omitted, the MWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
- Step 3 - depending on the syntactic structure of the candidate's neutral form, formally check if it is a MWE using the generic and category-specific decision diagrams and tests decribed below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
- Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.
The unique entry point to Step 3 above is the following test:
Top test - [DIST] - Distribution
What is the distribution of the neutral form of the candidate in the particular context? This can be tested by replacing the MWE candidate with a single word having the given part of speech, and checking if such a replacement, although possibly changing the meaning, does not lead to a loss of grammaticality or acceptability. If such a replacement test passes for a large class of single words of the same POS, the candidate is considered as having the distribution of this POS.
- Determiner, conjunction, adposition or interjection ⇒ Apply the functional MWE tests ⇒ FuncMWE tests positive?
- Annotate with the FuncMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Adjectival or adverbial phrase ⇒ Apply the adjectival and adverbial MWE tests ⇒ AMWE tests positive?
- Annotate with the AMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Verb, verbal phrase or verbal clause ⇒ Apply the verbal MWE tests ⇒ VMWE tests positive?
- Annotate with the VMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Noun or nominal phrase ⇒ Apply the nominal MWE tests ⇒ NMWE tests positive?
- Annotate with the NMWE subcategory determined via the guidelines
- It is not a MWE, exit
An error has occured !