Annotation guidelines (version 2.0)
Used by the PARSEME corpora annotated for multiword expressions


Identifying multiword tokens

The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important to provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.

Swedish-specific tests for identifying MWTs

Test MWT.SV.1 - [NNC-MWT] - Noun+noun compound

Is the candidate a noun+noun compound, i.e. does it function as a noun, and consist of two (or more) components that are all nouns (and that can function as stand-alone nouns)? Note that modifier nouns may occur in a compounding form.

  • it is a MWT
    • jord|gubbe |gubbe earth man strawberry
      skol|boks|hylla school book shelf school bookshelf
  • go to the next test
    • mät|redskap measure tool measuring tool 'mät' is from the verb 'mäta' (measure), it is not a noun

Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT

Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)? In some cases, a direct split is not possible. In such cases, it is permissible to change the word order and to insert function words, but not any additional content words.

  • it is a MWT
    • tillvarata to-be-take take care of, ta till vara take to betake care of
      avbryta off-breakcancel, bryta av break offbreak off
  • go to the next test
    • ut|bilda out educate educate '*bilda ut' ('out educate') is not a valid expression

Test MWT.SV.3 - [DEVERBAL_SPLIT-MWT] - Splittable as a deverbal expression

For candidates that can potentially have a deverbal form (nominal, adjectival and adverbial expressions), is the deverbal form either split, or splittable (according to the definition in MWT.SV.2)?

  • it is not an MWT. Note that the answer might be no in two cases:
    • The current expression cannot be deverbal
      • med|föra with bring entail 'medföra' is already a verb
        allaredan all ready already 'allaredan' is an adverb that cannot be verbailzed
    • The deverbalization either does not result in an existing expression, or it cannot be split with the semantics kept the same (or with a slight shift)
  • it is MWT, but only the decision rules for deverbal expressions should be applied

An error has occured !