Annotation guidelines
corpora of multiword expressions - version 1.2 (2020)
shared task on semi-supervised identification of verbal multiword expressions - edition 1.2 (2020)
Identifying multiword tokens
The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.
Swedish-specific tests for identifying MWTs
Test MWT.SV.1 - [VERB-MWT] - Verbal MWT
Does the candidate token function as a verb?
- we do not have to decide if it is an MWT (for the purpose of VMWE annotation)
-
mätredskap measuring-tool measuring instruments
sysselsättning task-settingemployment - go to the next test
-
tillhandahålla to-hand-hold provide
förklara for-clearexplain
klargöra clear-makeclarify
Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT
Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)?
- it is an MWT
-
tillvarata to-be-take take care of, ta till vara take to betake care of
avbryta off-breakcancel, bryta av break offbreak off - go to the next test
- Warning! Examples not found in database for id=6.3_D_test-mwt-sv-2-no
Test MWT.SV.3 - [CRAN-MWT] - Cranberry component in a MWT
If you split the token into its component words, is any of these words a cranberry word (i.e. it cannot be used as a standalone word, with the same part-of-speech)?
- it is not an MWT
-
beklaga be-complain lament → be is possible as a verb but not as a particle
erbjuda er-offer offer → er is possible as a pronoun but not as a particle
försvåra for-difficult make difficult → svåra is possible as an adjective but not as a verb
jämföra compare → jäm is not used as a stand-alone word - it is an MWT
-
på|peka on|point point out
för|klara for|clear explain
klar|göra creal|make clarify