Annotation guidelines
corpora annotated for multiword expressions
Annotation process and decision tree
We propose the following methodology for VMWE annotation:
- Step 1 - identify a candidate, that is, a combination of a verb with at least one other word which could form a VMWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). If the candidate has the structure of a meaning-preserving variant, find the corresponding canonical form. The following steps should be applied to this canonical form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
- Step 2 - determine which components of the candidate (or of its canonical form) are lexicalized, that is, if they are omitted, the VMWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
- Step 3 - depending on the syntactic structure of the candidate's canonical form, formally check if it is a VMWE using the generic and category-specific decision trees and tests below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
- Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.
The decision tree below indicates the order in which tests should be applied in step 3. The decision trees are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.
Generic decision tree
If you are annotating Italian or Hindi, go to the Italian-specific decision tree or Hindi-specific decision tree. For all other languages follow the tree below.
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Particle ⇒ Apply VPC-specific tests ⇒ VPC tests positive?
- Annotate as a VMWE of category VPC.full or VPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
An error has occured !