Annotation guidelines
PARSEME corpora of multiword expressions - version 1.2 (2020)
PARSEME shared task on semi-supervised identification of verbal multiword expressions - edition 1.2 (2020)


The notational convention used throughout the document is the following:

  • Italic is used to display example sentences and expressions.
  • Bold is used to highlight the lexicalized components of a candidate VMWE inside an example (positive or negative).
  • Underline is used to focus the reader's attention on the important part of an example
  • An asterisk (*) precedes ungrammatical examples.
  • A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
  • Different colors are used to display examples:
    • Red is used for counter-examples, that is, expressions which look like VMWEs but are not one, whatever the language.
    • According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
      • Shades of green are used for positive examples in Germanic languages.
      • Shades of blue are used for positive examples in Romance languages.
      • Shades of orange are used for positive examples in Slavic languages.
      • Shades of pink are used for positive examples in other language families.
  • Examples are preceded by the 2-letter language code in parentheses
  • Examples can be shown and hidden using the toggle buttons in the header.