corpora of multiword expressions - version 1.2 (2020)
shared task on semi-supervised identification of verbal multiword expressions - edition 1.2 (2020)
The notational convention used throughout the document is the following:
- Italic is used to display example sentences and expressions.
- Bold is used to highlight the lexicalized components of a candidate VMWE inside an example (positive or negative).
- Underline is used to focus the reader's attention on the important part of an example
- An asterisk (*) precedes ungrammatical examples.
- A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
- Different colors are used to display examples:
- Red is used for counter-examples, that is, expressions which look like VMWEs but are not one, whatever the language.
- According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
- Shades of green are used for positive examples in Germanic languages.
- Shades of blue are used for positive examples in Romance languages.
- Shades of orange are used for positive examples in Slavic languages.
- Shades of pink are used for positive examples in other language families.
- Examples are preceded by the 2-letter language code in parentheses
- Examples can be shown and hidden using the toggle buttons in the header.