Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the
corpora annotated for multiword expressions
Definitions and scope
This document aims at formalising idiomaticity in language via guidelines for manual annotation of multiword expressions (MWEs) in running texts. They were defined with several objectives in mind:
- Universality: the typology, terminology and methodology are unified across many languages (currently about 30), while leaving room for truly language-specific features
- Tractability: the cross-linguistic formalisation of idiomaticity should be done in a computationally tractable way
- Reproducibility: the annotation process should be as much reproducible as possible.
- The annotation flow follows a decision diagram driven by linguistic tests. For two annotators examining the same MWE candidate, if their answers to the tests are the the same, the outcome of the annotation is also the same.
- Semantic non-compositionality is considered as the major property of MWEs to be modeled. From linguistics we know that non-compositionality is a matter of scale but for the sake of tractability annotation decisions must be binary.
- Semantic non-compositionality is hard to test directly, therefore it is approximated by lexical and morpho-syntactic inflexibility.
- Inflexibility tests are partly driven by the syntactic structure, therefore there is strong dependence on the underlying syntactic theory. PARSEME annotation largely relies on the Universal Dependencies for the annotation of morpho-syntax, due to the shared objectives of universality.
An error has occured !