Annotation guidelines
PARSEME corpora annotated for multiword expressions


Annotation process and decision tree

We propose the following methodology for VMWE annotation:

  • Step 1 - identify a candidate, that is, a combination of a verb with at least one other word which could form a VMWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). If the candidate has the structure of a meaning-preserving variant, find the corresponding canonical form. The following steps should be applied to this canonical form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
  • Step 2 - determine which components of the candidate (or of its canonical form) are lexicalized, that is, if they are omitted, the VMWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
  • Step 3 - depending on the syntactic structure of the candidate's canonical form, formally check if it is a VMWE using the generic and category-specific decision trees and tests below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
  • Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.

The decision tree below indicates the order in which tests should be applied in step 3. The decision trees are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.

Generic decision tree

If you are annotating Italian or Hindi, go to the Italian-specific decision tree or Hindi-specific decision tree. For all other languages follow the tree below.

  • Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
    • Apply the VID-specific testsVID tests positive?
      • Annotate as a VMWE of category VID
      • It is not a VMWE, exit
    • Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
        • Apply the VID-specific testsVID tests positive?
          • Annotate as a VMWE of category VID
          • It is not a VMWE, exit
        • Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
          • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
            • Annotate as a VMWE of category IRV
            • It is not a VMWE, exit
          • Particle ⇒ Apply VPC-specific testsVPC tests positive?
            • Annotate as a VMWE of category VPC.full or VPC.semi
            • It is not a VMWE, exit
          • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
            • Annotate as a VMWE of category MVC
            • Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category ID
              • It is not a VMWE, exit
          • Extended NP ⇒ Apply LVC-specific decision treeLVC tests positive?
            • Annotate as a VMWE of category LVC
            • Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit
          • Another category ⇒ Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit

An error has occured !



PARSEME corpora annotation guidelines version 1.3.6 stable version, last updated on September 20, 2022