Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.0 (2017)


Glossary

Cranberry word

A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:

  • jemandem Angst einjagen to-someone chase-in fear to frighten someone
    jemanden einen Besuch abstatten
  • to go astray
  • se mettre martel en tête SELF put a hammer in head to worry a lot
  • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
  • biti si kvit owe nothing to somebody; each party got what it deserved/asked for

Candidate VMWE

A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.

Syntactic operator

A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:

  • eine Entscheidung treffen to make a decision
    Angst haben to have fear
    ein Verbrechen begehen to commit a crime
  • to make a decision
    to have fear
    to commit a crime
  • oddać hołd to give-back tributeto pay tribute
  • priti v poštev to come into consideration to consider

Collocation

A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:

  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
  • the graphic shows
    drastically drop
  • zalać rynek to flood the market to dominate the market
  • občutno zmanjšati significantly reduce
    drastično zmanjšati drastically reduce

Canonical form

The canonical form of a candidate VMWE is a prototypical verbal phrase preserving the same meaning.

  • the canonical form of das Herz welches er bricht the heard which he breaks is er bricht ihr das Herz he breaks her heart
    the canonical form of Wortbruch word-break a promise which has not been hold is Wort brechen to break the word not to hold a promise
  • the canonical form of the heart which he broke is he broke (her) heart
    the canonical form of making an impression on him is (she) makes an impression on him
  • the canonical form of decyzje, które podjął decisions which he took is podjął decyzjęhe took a decision
  • the canonical form of decisão nunca antes tomada decision never before taken is tomar uma decisãotake a decision
  • the canonical form of odločitev, ki jo je sprejel decisions which he took is sprejeti odločitevhe took a decision

Reflexive clitics

Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IReflV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:

  • mich, dich, sich, uns, euch
  • me, te, se, nous, vous
  • mi, ti, si, ci, vi
  • się, sobie
  • me, te, se, nos, vos
  • se, si

Particles

Particles are hard to distinguish from homographic prepositions:

  • ich schlage vor allen zu verzeihen I propose to forgive everyone
    ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
  • to get up a petition
    to get up a hill
  • n.a.
  • n.a.
  • sem za njen predlog I support her proposal
    sem za hišo I'm standing behind the house

The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:

  • das Schild um|fahren to drive over the sign
    den See umfahren to drive around the lake

Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.

Unexpected change in meaning

An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility​. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:

  • Ich gebe dir mein Buch I give you my book Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • I give you my book I give you my notebook/novel/volume/publication
  • daję ci książkęI give you a book daję Ci zeszyt/powieść/tom/publikację I give you a notebook/novel/volume/publication
  • dam ti knjigoI give you a book dam ti zvezek/roman/publikacijo I give you a notebook/novel/publication

the same does not hold for:

  • Ich gebe Dir mein Wort I give you my word, i.e. I promise #Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • I give you my word #I give you my notebook/novel/volume/publication
  • daję ci słowo I give you a wordI give you my word daję Ci wyraz/sylabę/czasownik I give you a word/syllable/verb
  • dam ti besedo I give you a wordI give you my word #dam ti izraz/zlog/glagol I give you a word/syllable/verb

I.e. the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test 2 [LEX] applies and:

  • jmd. sein Wort geben to give one's word to s.o.
  • to give one's word to someone
  • dać komuś słowo to give someone a wordI give one's word to someone
  • n.a.

is a VMWE.

Similarly, Test 22 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:

  • Ich fange das Buch an I begin to read the book does not imply Ich fange das Buch I catch the book
    Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
  • to check in upon arrival does not imply to check upon arrival (it is VPC)
    to look up into the sky implies to look into the sky (it is not a VPC)
  • n.a.
  • n.a.

Ungrammaticality

Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).