Annotation guidelines
PARSEME corpora annotated for multiword expressions


Identifying multiword tokens

The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.

Swedish-specific tests for identifying MWTs

Test MWT.SV.1 - [VERB-MWT] - Verbal MWT

Does the candidate token function as a verb?

  • we do not have to decide if it is an MWT (for the purpose of VMWE annotation)
    • mätredskap measuring-tool measuring instruments
      sysselsättning task-settingemployment
  • go to the next test
    • tillhandahålla to-hand-hold provide
      förklara for-clearexplain
      klargöra clear-makeclarify

Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT

Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)?

  • it is an MWT
    • tillvarata to-be-take take care of, ta till vara take to betake care of
      avbryta off-breakcancel, bryta av break offbreak off
  • go to the next test

Test MWT.SV.3 - [CRAN-MWT] - Cranberry component in a MWT

If you split the token into its component words, is any of these words a cranberry word (i.e. it cannot be used as a standalone word, with the same part-of-speech)?

  • it is not an MWT
    • [No example]
    • beklaga be-complain lamentbe is possible as a verb but not as a particle
      erbjuda er-offer offerer is possible as a pronoun but not as a particle
      försvåra for-difficult make difficultsvåra is possible as an adjective but not as a verb
      jämföra comparejäm is not used as a stand-alone word
  • it is an MWT
    • på|peka on|point point out
      för|klara for|clear explain
      klar|göra creal|make clarify

An error has occured !



PARSEME corpora annotation guidelines version 1.3.6 stable version, last updated on September 20, 2022