Annotation guidelines (version 2.0)
Used by the PARSEME corpora annotated for multiword expressions


Welcome to the official annotation guidelines of the PARSEME corpora version 2.0.

This version extends the annotation guidelines to all syntactic types of multiword expressions. For previous versions, you can check the index of versions. See also what is new in the guidelines version 2.0 as compared to version 1.3.

Here, you'll find detailed definitons, examples and linguistic tests to guide your decision as to whether a given combination in your language is a multiword expression. Use the table of contents on the left to navigate between sections and the header buttons to show/hide examples.

In addition to these general guidelines, language teams may also provide extra documentation, like lists of borderline cases and decisions taken concerning them. They should all be compatible with these general guidelines.

If you spot errors or if something remains unclear after reading the guidelines, please contact us and we'll do our best to correct the problems.

Authors and contributors (alphabetical order)

Chérifa Ben Khelil, Archna Bhatia, Claire Bonial, Marie Candito, Fabienne Cap, Silvio Cordeiro, Kaja Dobrovoljc, Vassiliki Foufi, Polona Gantar, Voula Giouli, Najet Hadj Mohamed, Carlos Herrero, Uxoa Iñurrieta, Mihaela Ionescu, Iskandar Keskes, Alfredo Maldonado, Stella Markantonatou, Verginica Mititelu, Johanna Monti, Joakim Nivre, Mihaela Onofrei, Viola Ow, Carla Parra Escartín, Manfred Sailer, Carlos Ramisch, Renata Ramisch, Monica-Mihaela Rizea, Agata Savary, Nathan Schneider, Ivelina Stonayova, Sara Stymne, Ashwini Vaidya, Veronika Vincze, Abigail Walsh, Hongzhi Xu.

Developers (alphabetical order)

Quentin Barrouyer, Carlos Ramisch, Agata Savary, Baptiste Souche

Table of contents


Section 1

Definitions and scope

This document aims at formalising idiomaticity in language via guidelines for manual annotation of multiword expressions (MWEs) in running texts. They were defined with several objectives in mind:

  • Universality: the typology, terminology and methodology are unified across many languages (currently about 30), while leaving room for truly language-specific features
  • Tractability: the cross-linguistic formalisation of idiomaticity should be done in a computationally tractable way
  • Reproducibility: the annotation process should be as much reproducible as possible.
These objectives imply several principles and constraints:
  • The annotation flow follows a decision diagram driven by linguistic tests. For two annotators examining the same MWE candidate, if their answers to the tests are the the same, the outcome of the annotation is also the same.
  • Semantic non-compositionality is considered as the major property of MWEs to be modeled. From linguistics we know that non-compositionality is a matter of scale but for the sake of tractability annotation decisions must be binary.
  • Semantic non-compositionality is hard to test directly, therefore it is approximated by lexical and morpho-syntactic inflexibility.
  • Inflexibility tests are partly driven by the syntactic structure, therefore there is strong dependence on the underlying syntactic theory. PARSEME annotation largely relies on the Universal Dependencies for the annotation of morpho-syntax, due to the shared objectives of universality.
Previous versions (1.0 to 1.3) of these guidelines focused on verbal MWEs (VMWEs), which are of particular interest to the PARSEME COST action since they frequently introduce morphosyntactic variability, discontinuity and long-distance dependency issues. This document extends these previous efforts to MWEs of all syntactic types. It defines the extended terminology, MWE typology and annotation methodology based on decision diagrams upon linguistic tests, illustrated with examples in various languages.


Section 1.1

Notation

The notational convention used throughout the document is the following:

  • Italic is used to display example sentences and expressions.
  • Bold is used to highlight the lexicalized components of a candidate MWE inside an example (positive or negative).
  • Underline is used to focus the reader's attention on the important part of an example
  • An asterisk (*) precedes ungrammatical examples.
  • A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
  • Different colors are used to display examples:
    • Red is used for counter-examples, that is, expressions which look like MWEs but are not one, whatever the language.
    • According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
      • Shades of green are used for positive examples in Germanic languages.
      • Shades of blue are used for positive examples in Romance languages.
      • Shades of orange are used for positive examples in Slavic languages.
      • Shades of pink are used for positive examples in other language families.
  • Examples are preceded by the 2-letter language code in parentheses
  • Examples can be shown and hidden using the toggle buttons in the header.

Section 1.2

Words and tokens

While the definition of an MWE inherently relies on the notion of a word, manual annotation is performed on texts which are automatically tokenized. It is therefore important to understand the distinction between words and tokens in the context of MWEs.

A word is a linguistically (notably semantically) motivated unit. The detection of words is, thus, language-dependent and annotation experts should have a clear idea of how to define it for their own language (even if this definition proves hard in general).

See also the UniDive task on harmonizing the definition of a “syntactic word” across languages.

A token is a technical and pragmatic notion, defined according to more or less linguistically motivated clues and depending on the particular tokenization tool at hand. Note that the notion of a token is ambiguous in NLP. It can also mean an individual occurrence of a certain linguistic unit, as opposed to a type, i.e. the set of all surface realisations of a unit. In these guidelines, we refrain from using this second sense.

Tokens should ideally be as close as possible to words. However, in practice - due to the hardness of the (automatic) tokenization task - the relation between tokens and words is not always 1-to-1. The following cases occur:

  • A token coincides with a word:
    • مدهشة surprising, نزهة walk, ب with, قام to do
    • вземам, решение, наяве, бял, на, се, д-р
    • mít, hlad, se, úžas
    • einen, Spaziergang, machen, Überraschung
    • κάνωkano make
      παίρνωperno take
      έναςenas a
      απόφασηapofasi decision
    • take, a, walk, astonishment
    • dar, un, paseo, sorpresa, maldecir, bienvivir
    • ibilaldi, bat, egin, ezuste
    • faire, une, promenade, étonnement
    • tóg, siúl, ionadh
    • δίδωμιdidо̄mi give give
      καλός kalos beautiful beautiful
      περί peri about about
    • napraviti, jedan, šetnja, začuđenost
    • tesz, egy, séta, meglepetés
    • mengambil, sebuah, berjalan, heran
    • fare, una, passeggiata, sorpresa
    • 取る, その, 歩く, 驚き
    • წერსcers writes
      ხატავსxatavs draws
      ჩვენčʻven we
      გულიguli heart
    • iet, māja, tālāk, un
    • ferħ, libes, sabiħ
    • een, wandeling, maken, verrassing
    • robić to do, na on, dokładność precision
    • comer eat, uma a, guarda-chuva umbrella, antessala anteroom
    • face, o, plimbare
    • iti, na, en, sprehod, začudenost
    • bëj, një, shëtitje, papritur
    • седети sedeti to seat, скрштених skrštenih crossed, руку ruku hands
    • gå, på, promenad, förvåning
    • 采取, 一个, 步行, 惊愕
  • Several tokens build up one word, like in abbreviations, possessive markers, words with "accidental" separators, inflected or derived forms of foreign names, etc. In this case we speak of a multitoken word (MTW): The pipe symbol '|' indicates token separation in these examples
    • قرار|ها| decision-her her decision
    • т|.|н|. etc.
      год|. year
    • z|.|B|. for instance
      Wie geht|'|s How goes it How are you
    • κ. κύριος Mister
      υπΔρ υποψήφιος διδάκτορας PhD candidate
    • M|. Mister
      pp|. pages
      Pandora|'|s
    • A|/|A|. a la atención de for the attention of
      a|/|f|. a favor in favor
      Rte|. remitente sender
    • etab|. eta abar and so on
    • می|-|روم، آیت|-|الله، کتاب|-|ها
    • aujourd|'|hui today
    • οἷον τ'εἰμίhoion t'eimi of.what.sort and be.1SG I am able to
    • danas today
    • időjárás|-|jelentés weather forecast
    • vice|-|presidente vice-president
    • ე.ი.e|.i|. ესე იგი, i.e.
      გაეროgaero United Nations Organization, UN
      ბ-ნიb|-ni ბატონი, Mister
    • libs|et she wore
    • a|.|u|.|b|.| please
      Pandora|'|s Pandora's
    • Chomsky|'|ego of Chomsky
      SMS|-|ować to write an SMS
    • vice|-|presidente vice-president
    • prim|-|ministru prime minister
      d|-|voastră polite "you"
    • g|. Mister
      str|. pages
      le|-|to
    • s'kadoesn't have not has doesn't have
    • FIFA|-|у FIFA|-|u FIFA.ACC
      tweet|-|овање tweet|-|ovanje to write tweets
    • EU|:|s EU's
  • One token can contain several words, like in contractions and compounds. In this case we speak of a multiword token (MWT). Identifying MWTs is important because they can be potential candidates for MWEs. However, defining what is a word and a MWT is a hard question and language-specific MWT tests are needed to this end. Examples of MWTs include: See also the representation of MWTs in Universal Dependencies. The precise word forms cannot always be straightforwardly deduced from the MWT containing them and vice versa, as in don't, della, du, etc.
    • وسيكتبوناها =و + س + يكتبون + هاand they are going to write it they are going to write it
    • вагон-ресторант train carriage+restaurant train buffet
    • Schulaufgabe = Schule+Aufgabe school+exercisehomework
      Apfelbaum = Apfel+Baum apple treeapple tree
    • στον = σε+τονston = se+ton
    • don't = do+not
    • del = de+el of the from/of the
      al = a+el to+the to the
      compárese = compare+se compare SE_PARTICLE be it compared
      suicidarse = suicididar+se suicide SELF to commit suicide
    • sudurluze = sudur+luze nose+long long-nosed
      jarleku = jar(ri)+leku sit+place seat
    • کتابش=کتاب+ش
    • du = de+le from the
    • sa = i+an in the
      b'fhearr = ba+fhearr be.COND better prefer
    • καίτοι = καί + τοιkaitoi = kai + toi and indeed and indeed
    • uzbrdo = uz+brdo uphill
    • della = di+la of the
    • სახლშია = სახლ+ში+აsaxlšia = saxl+ši+a house+in+is, is in the house
      მაგიდაზე = მაგიდა+ზეmagidaze = magida+ze table+on, on the table
    • huiswerk = huis+werk home+workhomework
      appelboom = appel+boom apple treeapple tree
      pannenkoek = pan + koek pancake
    • Białymstoku=Białym+stoku white+slope Białystok.INST (a city name)
      robiłem=robi+łem do.3.SG.PRES+be.1.SG.PAST.AGLI did
      żeśmy = że+śmy that+be.1.PL.AGL that-we
    • neles = em+eles on them
    • într-o = într-+o in a
    • nanj = na+njega on him
    • ma = më + eto me + it to me + it to me + it
    • напоље = на + поље napolje = na + polje outside
      новосадски = ново + садски novosadski = novo + sadski Novi Sad (an adjective from a city name)
    • arvsmassa = arv+massa genetic stock

While a MWE always contains at least two words, the relation between MWEs and tokens can be twofold:

  • A MWE contains several tokens, whether each of them coincides with a word or not:
    • نزهة ب قام make with walk make a walk (2 words , 2 tokens)
    • вземам решение make a decision (2 words, 2 tokens)
      прочитам от корица до корица to read from cover to cover (5 words, 5 tokens)
    • eine Rede halten (2 words, 2 tokens) a speech hold to give a speech
      wie geht's (2 words, 4 tokens) how goes it how are you
    • παίρνω μία απόφασηperno mia apofasi take a decision to decide (2 words, 2 tokens)
      παίζω στα δάχτυλαpezo sta dachtyla play in-the fingers to know very well (3 words, 4 tokens)
    • to take a walk (2 words, 2 tokens)
      to open Pandora's box (3 words, possibly 5 tokens)
    • dar un paseo 2 words, 2 tokens to give a walk to take a walk
      dar por sentado 3 words, 3 tokens to give for seated to take for granted
      irse de rositas 3 words, 4 tokens to go_self of little_roses to get off scot free
    • ibilaldia egin (2 words, 2 tokens)
    • دستور داد (2 words, 2 tokens)
    • b'fhearr liom (2 words, 4 tokens) I would prefer
    • τοὺς λόγους ποιέομαιtous logous poieomai the word do.1SG to speak
    • dignuti ruke to raise hands to give up (2 words, 2 tokens), otvoriti Pandorinu kutiju open Pandora's box to face with problems (3 words, 3 tokens)
    • sétát tesz to take a walk (2 words, 2 tokens)
    • tenere un discorso (2 words, 2 tokens) hold a speech to give a speech
      cavalcare l'onda (3 words, 4 tokens) ride the wave ride the wave
    • რა თავში იხლისra tʻavši ixlis what head+in heat 'What good will it do them' (3 words, 4 tokens)
      ფარდას ჩამოაფარებსpʻardas čʻamoapʻarebs Will cover it with a curtain 'Will make it invisible to others' (2 words, 2 tokens)
    • kien idur fuq il-fatt turns on the fact
    • een wandeling maken (2 words, 2 tokens) a walk make to take a walk
    • robi z igły widły make.3.SG a pitchfork out of a needle he makes a mountain out of a molehill (4 words, 4 tokens)
      robił|em z igły widły made.3.SG.M1+be.1.SG.AGL a pitchfork out of a needle I made a mountain out of a molehill (4 words, 5 tokens)
    • dar uma caminhada to give a walk (2 words, 2 tokens)
      cair de pára-quedas to fall with parachute to arrive unprepared in the middle of a situation (3 words, possibly 5 tokens) According to new orthography rules, this word would be written 'paraquedas'. Old spelling may still be found in annotated texts, though.
      queixar-se-ia complain-self-would would complain (2 words, possibly 5 tokens)
    • a da ortul popii to die (3 words, 3 tokens)
    • klicati jelene to call cerfs to vomit (2 words, 2 tokens)
      vreči puško v koruzo throw a rifle in the corn to give up (4 words, 4 tokens)
    • marr vendim (2 words, 2 tokens)take decision take decision make a decision
      hedh një sy (3 words, 3 tokens) throw an eye take a look
    • данути душом danuti dušom to breathe soul to feel relieved (2 words, 2 tokens)
      причати на|памет pričati na|pamet to talk by heart to talk not relying on facts (3 words, 2 tokens)
    • hålla ett tal (2 words, 2 tokens) hold a speech to give a speech
    • 一 个 决定 (2 words, 2 tokens) do one CL decision to make a decision
  • A MWE contains one (multiword) token:
    • no example found for Arabic
    • no example found for Bulgarian
    • vorbereiten to pre-arrange to prepare
      anfangen at-catch to begin
    • έδωσα-πήρα gave-1SG took-1SG I tried hard
    • to pretty-print
    • suicidarse suicide_self to commit suicide
    • n.a.
    • court-circuiter to short circuit
    • προσ-άγωpros-agо̄ towards lead.1SG to lead towards
    • pripremiti to pre-arrange to prepare
    • kinyír out.cut to kill
    • corto-circuitare to short circuit suicidarsi suicide_self to commit suicide
    • voorbereiden to pre-arrange to prepare
      aanvangen at-catch to begin
    • no example found for Polish
    • queixar-se-ia compain-SELF-would would complain
    • a se-ndura RCLI.ACC-have.the.heart to have the heart
    • pripraviti to pre-arrange to prepare
    • keqkuptoj bad_understand misunderstand
    • видео-линкvideo-link
    • klargöra clear-make clarify påpeka on-point point out

Note finally that multitoken words are not considered MWEs since they contain one (multitoken) word only:

  • no example found for Bulgarian
  • αερολογώaerologo air+talk to talk aimlessly
  • n.a.
  • odolustu blood+empty to bleed
  • λογοποιέομαι logopoieomai word-do to speak
  • SMS-ati to write an SMS
  • anteporre to put + in front of
  • სისხლისღვრაsisxlisġvra blood+spill, 'bleeding'
  • SMS-ować to write an SMS
  • pós-datar to post-date
  • a binedispunewell-disposeto cheer up
  • SMS-jati to write an SMS
  • no example found for Albanian
  • SMS-овати SMS-ovati to write an SMS

Whenever the distinction between a word and a token is judged by a particular language team as hard to tackle, a possible option is to consider these two notions equivalent for the needs of corpus annotation.


Section 1.3

Multiword expressions

A multiword expression (MWE) is a (continuous or discontinuous) sequence of words with the following compulsory properties:

  • It contains at least two component words which are lexicalised, i.e. always realized by the same lexemes. Only these lexicalized components are annotated. For instance in he paid several important visits to the president, we annotate only the components highlighted in bold.
  • Its neutral form forms a weakly connected graph, i.e., in its dependency graph, every (lexicalized) component is achievable from every other component, if directions of the dependencies are disregarded. For instance, in the following MWE Non-neutral form the highlighted components do not form a weakly connected graph but this form in not a neutral one. When transforming it to a neutral form Non-neutral form the connectivity condition is fulfilled.
  • It shows some degree of orthographic, morphological, syntactic and/or semantic idiosyncrasy with respect to what is considered general grammar rules of a language. This condition is tested by the decision diagrams documented in in sections 5 to 9. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not considered MWEs.

Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to straightforwardly deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that to kick the bucket means 'to die' and to spill the beans actually means 'to reveal a secret'.

However, as non-compositionality is a subjective notion and is hard to test directly, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that MWEs have some degree of semantic non-compositionality that implies limited flexibility.

Depending on the distribution of its neutral form, a MWE can be verbal, nominal, adjectival, adpositional, etc.

Verbal MWEs

A verbal MWE (VMWE) is a multiword expression whose neutral form is such that: (i) it has a distribution of a verb, a verbal phrase or a verbal clause, (ii) its syntactic head is a verb.

  • she paid several visits to the president
  • pūst miglu acīs to blow mist into eyes to lie, to talk nonsense
  • władza czerpie z tego korzyści propagandowe the authorities draw propaganda benefits from this the authorities reap benefits from this for propaganda

Note that reasoning in terms of neutral forms is crucial here. A MWE may occur in a variant whose distribution is non-verbal. But when its neutral form is retrieved, the verbal distribution becomes apparent, and such a MWE is considered verbal.

  • the visits which she paid to the president - the distribution of this MWE is nominal but this is not a neutral form; when neutralized the verbal distribution and the verb headedness are restored
  • czerpane z tego korzyści propagandowe - the distribution of this MWE is nominal but this is not a neutral form; when neutralized the verbal distribution and the verb headedness are restored

Conversely, some MWEs derive from VMWEs but their neutral forms are not verbal. Such MWEs are considered deverbal nominal, adjectival or adverbial MWEs:

  • Wortbruch word-break a promise which has not been hold - nominal MWE deriving from ein Wort brechen
  • a take-off - nominal MWE deriving from to take off
    (a) run-down (apartment) - adjectival MWE deriving from to run down
  • la prise en compte the fact of taking into account - nominal MWE deriving from prendre en comptetake into account
    une mise à disposition the fact of making available - nominal MWE deriving from mettre à dispositionmake available
  • zabawa czyimś kosztem a play at someone else's expenses - nominal MWE derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses

Some other MWEs contain verbs but are not derived from VMWEs and have a non-verbal distribution (nominal, adjectival, adverbial, etc.). These candidates are assigned the category which conforms with their distribution: nominal MWEs, modifier MWEs or functional MWEs.

  • Vergiss-mein-nicht forget-me-notforget-me-not - nominal MWE
  • forget-me-not - nominal MWE
  • peut-être may-be maybe - adverbial MWE
    porte-feuille carry-sheets wallet - adverbial MWE
    couru d'avance run in advance forgone/predictable - adjectival MWE
  • pūt un palaid blow and let gofrivolous, absent-minded - adjectival MWE
  • vergeet-mij-niet forget-me-not forget-me-not - nominal MWE
  • (zrobić coś za) Bóg-zapłać (do something for a) God-pay to do something for free nominal MWE

Nominal MWEs

A nominal MWE (NMWE) is a multiword expression whose neutral form has a distribution of a noun.

  • I’ll have a hot dog for lunch.
    This was a real wild goose chase a foolish and hopeless search for or pursuit of something unattainable.
  • Leurs armes blanches sont en acier inoxydable Their white weapons are made of stainless steel Their bladed weapons are made of stainless steel
  • zili brīnumi blue wonder something unusual, surprising
  • Hij lust geen blinde vink He doesn't like 'blinde vink' He doesn't like blinde vink (Dutch meat)
  • Ostatnia transakcja okazała się dla firmy gwoździem do trumny The last transaction turned out for the company a nail to the coffin The last transaction turned out to be an event that caused the failure of the company
    W antykwariacie znalazła kilka białych kruków In the antique shop she found a few white ravens In the antique shop she found a few very rare books

It may or may not be headed by a noun:

  • Vergiss-mein-nicht forget-me-not
  • forget-me-not
  • porte-feuille carry-sheets wallet
  • vergeet-mij-niet forget-me-not forget-me-not
  • (zrobić coś za) Bóg-zapłać (do something for a) God-pay to do something for free

A major challenge in annotating NMWEs is to distinguish them from proper names and multiword terms. Proper names have a special semantic status because they function as names of entities rather than their descriptions. Proper names may contain MWEs and vice versa but most proper names do not pass the linguistic tests proposed here and thus we do not consider them MWEs. We defined specific tests (SPECIF-REF, NAMING-CONV and SEM-TYPE) to distinguish proper names from MWEs.

  • John Smith - entity name, not a MWE
    UN Secretary-General - entity name containing a NMWE
  • Agnieszka Kownacka - entity name, not a MWE
    Jego Królewska Mość Król Belgii His Royal Majesty the King of Belgium - entity name containing a NMWE

Mutiword terms overlap with MWEs. Examples include:

  • white gold an alloy consisting of gold and platinum or nickel
  • pied d'athlète athlete's foot skin infection of the feet caused by a fungus
  • acs ābols apple of an eye eyeball
  • biały metal white metal alloy containing approximately 88% of tin
    rok świetlny light year a distance covered by a light ray in 1 year

But many mutiword terms do not pass inflexibility tests either and we consider them semantically compositional (i.e. non-MWEs), as in:

  • bipolar disorder
  • affection respiratoire aiguë acute respiratory affection acute respiratory disease
  • ēšanas traucējumieating disorders
  • obiektowy język programowania object programming language object-oriented programming language

Note that some MWEs whose internal structure is the one of a nominal phrase have a distribution of an adverb, an adposition or an adjective, etc. Those should not be annotated as NMWEs but as functional/adjectival/adverbial:

  • sailing head to wind sailing with the bow of the boat facing directly into the wind
  • Ils marchent main dans la main They are walking hand in hand They are walking holding each other's hand - adverbial MWE
    je fais ça toute seule, les doigts dans le nez I do it alone, fingers in my nose I do it easily
  • par mata tiesu - adverbial MWE
  • Zij lopen hand in hand They are walking hand in hand They are walking holding each other's hand - adverbial MWE
  • Wygrali tę wojnę psim swędem They won this war by a dog's stench/itch They won this war by a lot of luck - adverbial MWE

Recall that a MWE may be a multiword token. Deciding what is a word is notoriously difficult, especially in languages exhibiting frequent closed compounds, like Germanic languages. Closed compounds (i.e. compounds in which components are spelled together, possibly with some phonological changes on the border of morphemes) can be idiomatic:

  • Meer|schweinchen little see pig cobaye
  • passerby - inflects like a nominal phrase: passers|by
  • bonhomme good man fellow - inflects like a nominal phrase: bons|hommes
  • rzeczpospolita thing popular republic - inflects like a nominal phrase: rzeczy|pospolitej

or fully compositional:

  • Schul|jahr school year
  • school|jaar school year

or partly idiomatic and partly compositional:

We consider closed compounds as containing several words, and submit them to the PARSEME decision diagrams and annotate them as NMWEs if the tests are passed. We hypothesize that, most of the time, it is straightforward to annotators to identify word boundaries in a closed compound. If this is not the case, language-specific rules must be added. Splitting closed compounds directly in the corpus, if they are not split already, is not recommended, so as to keep the tokenization consistent with the underlying morpho-syntactic annotation.

See also the UniDive task on harmonizing the definition of a “syntactic word” across languages.

It happens that only part of a closed compound is idiomatic. For such cases, a UD/PARSEME white paper proposes subtoken spans, e.g.:

  • Hauptrolle spielen to play the main role - Role spielen to play a role is a VMWE, but the noun Role role can be freely modified, which yields a closed compound like Hauptrole main role

This feature is not implemented yet. In the meantime, we suggest annotating the whole token as belonging to the MWE.

We consider that nominal MWEs embrace pronominal MWEs:

  • I saw just a few
    I expect no one to come
    we love each other
  • dažs labs few good somebody
  • powtarzał ciągle to samo he repeated always this the same he repeated always the same

Similarly to functional MWEs (below), pronominal MWEs constitute closed lists of cases, and their inflexibility is hard to test. They are also frequently ambiguous with idiomatic determiners.

  • I saw a few - a PronID
    I saw a few examples - a DetID
  • dažs labs jūtas svarīgs few good feels important somebody feels important - a PronMWE
    dažs labs šoferis jūtas svarīgs few good driver feels important some drivers feel important - a DetMWE
  • Ik gaf een paar voorbeelden I gave a few examples - a DetID
  • powtarzał ciągle to samo he repeated always this the same he repeated always the same - a PronMWE
    powtarzał ciągle to samo pytanie he repeated always this the same question he repeated always the same question - a DetMWE

Adjectival and adverbial MWEs

The class of adjectival and adverbial MWEs (AMWEs) includes adjectival idiom (AdjID) and adverbial idiom (AdvID). Those are multiword expressions whose neutral form has a distribution of an adjective or an adverb, respectively.

  • larger than life behaving in a way that is more exciting than other people to attract - AdjMWE
  • dzimis laimes krekliņā born in a shirt of luck lucky - an AdjMWE
    aiz restēm behind bars in prison - an AdvMWE
  • fris en fruitig raring to go - AdjMWE
  • urodzona w niedzielę born on Sunday lazy - an AdjMWE
    średnio na jeża averagely on a hedghog not great - an AdvMWE

They do not have to be headed by adjective or adverbs, as in:

  • the other way round - an AdvMWE headed by a noun
  • pūt un palaid blow and let go frivolous - an AvjMWE containing no adjectives
  • out of the box - an AdvMWE headed by a noun
  • na potęgę on power very much - an AvdMWE containing no adverb

Additionally, we cover AMWEs which derive from verbal MWEs but their neutral form has an adjectival/adverbial distribution (see above), rather than a verbal one. The extent of such MWEs is yet unknown.

Functional MWEs

A functional MWE (FuncMWE) is a multiword expression whose neutral form has a distribution of a function word. We consider four subcategories of FuncMWEs:

  • determiner idiom (DetID)
    • I work from home roughly every other day
    • tas pats cilvēks that self person the same person
      katru otro dienu every second day every other day
    • zadałem sobie to samo pytanie I asked myslef this same question I asked myslef the same question
      przekaż mu te oto słowa transfer him these here words transfer him these words
  • adposition idiom (AdpID)
    • in front of the station
    • op basis van based on
    • gwarancji nie ma nawet w przypadku arcymistrza there is no guarantee event in the case of a grandmaster
      co do pierwszego pytania what to the first question as to the first question
  • conjunction idiom (ConjID)
    • she was fortunate in that she had friends to help her
    • la cérémonie sera projetée sur grand écran afin que tout le monde puisse suivre the ceremony will be projected on a big screen so that everyone can follow
    • lai gan although
      neskatoties uz not looking at nevertheless
    • zmęczony mimo źe dzień się dopiero zaczynał tired although that the day was only beginning tired although the day was only beginning
  • interjection idiom (IntjID)
    • damn it!
    • bon sang! good blood! damn it!
    • pie velna! at the devil! Damn it!
    • do diabła! To the devil! Damn it!

Functional MWEs constitute relatively short closed lists of cases. We recommend establishing such lists for each language and apply them consistently to corpus annotation (while paying attention to possible ambiguity), like in:

  • By the way, are you coming to Budapest?
    I recognized her by the way she was walking.
  • met betrekking tot regarding
  • co do pierwszego pytania what to the first question as to the first question
    rozumiesz co do ciebie mówię? do you understand what to you I say? do you understand what I'm telling you?

Of course, we still need criteria to decide which candidates should occur in such lists. But testing functional MWE candidates for non-compositionality is notoriously hard because they contain few content words (nouns, verbs, adjectives or adverbs) and have syntactic structures in which little flexibility is allowed, even with no presence of idiomaticity. The solution is to be consistent with the FuncMWE-specific decision diagram [add the link] (which is deterministic, whenever the answers to atomic tests remain stable), even if it does not fully conform to out intuitions.


Section 1.4

Neutral forms of MWEs

MWEs occurring in a corpus can have various syntactic structures. For instance, to take someone by surprise can be inflected (they took me by surprise), negated (they did not take me by surprise), passivised (I was taken by surprise), subject to extraction (the surprise by which I was taken). Similarly, a brain washing, can be transformed into a structure with a nominal-adpositional modifier (washing of a brain), an extraction (brain whose washing [did not succeed]), etc..

Since the linguistic tests are structure-driven (cf. e.g. structural tests), there is a necessity to neutralize variation before the tests are applied. In this section we introduce definitions answering these needs.

Neutral form

A neutral form (previously called canonical form) of a MWE or a MWE candidate is its least syntactically marked form which preserves its meaning. We consider that:

  • a form with a finite verb is less marked than with an infinitive, a participle, an analytical tense or a modal
    • she will take him by surprise - the neutral form is she takes him by surprise [- this is the plan for the future]
      she has taken him by surprise - the neutral form is she took him by surprise [just now]
      she was taking him by surprise - the neutral form is she took him by surprise [and this happened at the same time as ...]
      she wants to take him by surprise - the neutral form is she takes him by surprise [, that's her plan]
    • lo stava prendendo in giro - the neutral form is lo prese in giro
    • pociągnądo odpowiedzialności they will pull her to responsibility they will accuse her is a neutral form
      będą ją pociągać do odpowiedzialności they will pull her to responsibility they will accuse her - the neutral form is pociągnądo odpowiedzialności they will pull her to responsibility they will accuse her
      będą ją pociągali do odpowiedzialności they will pull her to responsibility they will accuse her is a neutral form
      chcą ją pociągnąć do odpowiedzialności they want to pull her to responsibility they want to accuse her - the neutral form is pociągnądo odpowiedzialności they will pull her to responsibility they will accuse her
      pociągnęlido odpowiedzialności they pulls her to responsibility they accused her is a neutral form
      bo by pociągnęlido odpowiedzialności they would pull her to responsibility they would accuse her is a neutral form
      pociągnęliby ją do odpowiedzialności they would pull her to responsibility they would accuse her is a neutral form; only the finite verb in the conditional form is annotated
      pociągającdo odpowiedzialności pulling her to responsibility accusing her - the neutral form is pociągnądo odpowiedzialności they will pull her to responsibility they will accuse her
  • active voice is less marked than passive and other diathesis alternations,
    • he was taken by surprise - the neutral form is someone took him by surprise
    • décisions importantes se prennent en groupes important decisions take themselves in groups important decisions are taken in groups - the neutral form is on prend des decisions importantes en groupes one takes important decisions in groups
    • lui fu preso in giro - the neutral form is qualcuno lo prese in giro
    • została pociągnięta do odpowiedzialności she was pulled to reponsibility she was accused - the neutral form is pociągnęlido odpowiedzialności they pulled her to reponsibility they accused her
      w takich warunkach decyzje podejmują się same under such circumstances decisions take themselves on their own under such circumstances no effort is needed to take decisions - the neutral form is w takich warunkach ludzie podejmują decyzje [bez wysiłku]
  • a non-negated form is less marked than a negated one,
    • they did not take him by surprise - the neutral form is it is not true that they took him by surprise
    • non lo presero in giro - the neutral form is non è vero che lo presero in giro.
    • nie pociągną jej do odpowiedzialności they will not pull her to responsibility they will not accuse her - the neutral form is pociągnądo odpowiedzialności they will pull her to responsibility they will accuse her
  • a form with an extraction is more marked than one without it,
    • the surprise by which they took him - the neutral form is they took him by surprise[, this surprise ...]
      the brain whose washing did not succeed - the neutral form is [there was] a brain washing[, it did not succeed for a brain]
    • la conferenza a cui ho preso parte - the neutral form is ho preso parte a una conferenza
      la decisione che è stata presa è giusta - the neutral form is hanno preso la decisione giusta
    • decyzja, którą podjęłam the decision which I made - the neutral form is podjęłam decyzję I made a decision
  • a form with an adpositional modifier is more marked than one without it,
    • the washing of a brain - the neutral form is brain washing
    • pranie przeznaczone dla mojego mózgu the washing dedicated to my brain - the neutral form is pranie mózgu [przeznaczone dla mnie] brain washing [dedicated to me]
  • a form with interposed complex determiners and quantifiers is more marked than one without it,
    • they took a significant number of steps - the neutral form is they took steps whose number was significant
    • Il governo ha intrapreso molteplici azioni. - the neutral form is il governo ha intrapreso azioni che sono molteplici
    • dostali połowę spadku they received a half of the heritage - the neutral form is dostali spadek[, ale nie cały tylko połowę] they received a heritage [but half of it rather than the whole]
      nie mieli cienia wątpliwości they didn't have a shade of a doubt - the neutral form is [nie jest prawdą, że] mieli jakąkolwiek wątpliwość it is not true that they had any doubt
  • a form with coordination is more marked than one without it,
    • a guide to red and yellow cards in soccer - the two neutral forms are a guide to red cards and to yellow cards
    • l'arbitro ha dato cartellini gialli e rossi - the neutral form is l'arbitro ha dato cartellini gialli e cartellini rossi.
    • dwie czerwone i cztery żółte kartki two red and four yellow cards - the two neutral forms are dwie czerwone kartki i cztery źółte kartki
This reasoning may be applied several times until the least syntactically marked form preserving the meaning is found:
  • a bunch of decisions which were made by her - the form contains passivization, extraction and a complex determiner; the neutral form is she made decisions [which were quite numerous]
  • una serie di decisioni che furono prese da lei - the neutral form is lei prese delle decisioni che furono numerose.
  • wiele decyzji i działań, które zostały podjęte many decisions and actions which were taken - the form contains passivization, extraction and coordination; the neutral form is podjęli decyzje i podjęli działania [których było wiele] thay took decisions and they took actions [which were many]
In some cases, transforming a MWE or a MWE candidate to a less syntactically marked form does not preserve its meaning. In this case, a more syntactically marked form is considered neutral.
  • the die is cast the point of no retreat has been reached - this is a neutral form on its own rather than they cast the die
  • les carrotes sont cuites the carrots are cooked it's too late - this is a neutral form on its own rather than j'ai cuit les carrotes I have cooked the carrots
  • il dado è tratto - this is a neutral form on its own rather than tirare il dado
  • kości zostały rzuconethe dice have been thrownalea iacta est - this is the neutral form on its own rather than ktoś rzucił kości someone threw the dice
    nie od razu Kraków zbudowanoCracow was not built at once Rome was not built in a day - this is the neutral form on its own rather than Zbudowali Kraków od razu they built Cracow at once
    metoda kija i marchewki the method of a stick and a carrot offer people things in order to persuade them to do something and punish them if they refuse to do it - this is the neutral form on its own rather than metoda kija i metoda marchewki the method of a stick and the method of a carrot

Neutral form in MWEs containing deverbal forms

We consider that the existence of deverbal nouns, masdars, adjectives and adverbs in MWEs does not imply syntactic marking. For instance, a wild goose chase, a decision maker and a heartbreaking story are neutral forms on their own. Consequently, they are considered nominal and adjectival MWEs, rather than verbal MWEs. Their connection to the corresponding VMWEs, if any (make a decision and break hearts, in the last 2 cases) is made explicit though their subcategories (NV.VID, NV.IVPC.full and AV.VID, respectively).

Other examples of such cases include:

  • Vergiss-mein-nicht forget-me-not - this is a neutral form on its own (nominal MWE); it is not deverbal since vergiss mein nicht is not a VMWE
    Wortbruch word-break a promise which has not been hold - this is a neutral form on its own (a deverbal nominal MWE deriving from ein Wort brenchen to break a word to fail holding a promise)
  • forget-me-not - this is a neutral form on its own (nominal MWE); it is not deverbal since forget me not is not a VMWE
    a wild goose chase - this is a neutral form on its own (nominal MWE); it is not deverbal since chase a wild goose is not a VMWE
    during take-off and landing - this is a neutral form on its own (a deverbal nominal MWE, here NV.IVPC.full, deriving from took off)
    a run-down apartment - this is a neutral form on its own (a deverbal adjectival MWE, here AV.IVPC.full, deriving from run down)
  • peut-être may-be maybe - this is a neutral form on its own (adverbial MWE); it is not deverbal since peut être not is not a VMWE
    porte-feuille carry-sheets wallet - this is a neutral form on its own (nominal MWE); it is not deverbal since porter des feuilles not is not a VMWE
    couru d'avance run in advance forgone conclusion - this is a neutral form on its own (adjectival MWE); it is not deverbal since courir d'avance not is not a VMWE
    la prise en compte the fact of taking into account- this is a neutral form on its own (a deverbal nominal MWE, here NV.VID, deriving from prendre en compte to take into account)
    une mise à disposition putting into disposal the fact of making available - this is a neutral form on its own (a deverbal nominal MWE, here NV.LVC.cause, deriving from mettre à disposition to put into disposal to make available)
  • nontiscordardimé - this is a neutral form on its own (nominal MWE); it is not deverbal since 'non ti scordare di me' is not a VMWE.
    una storia strappalacrime - this is a neutral form on its own (nominal MWE); it is not deverbal since 'strappare le lacrime' is not a VMWE.
  • zrobić coś za Bóg-zapłać do something for a God-pay to do something for free - this is a neutral form on its own (nominal MWE); it is not deverbal since Bóg zapłaci God will pay is not a VMWE
    zabawa czyimś kosztem a play at someone else's expenses - this is a neutral form on its own (a deverbal nominal MWE, here NV.VID, deriving from bawić się czyimś kosztem to enjoy oneself at someone else's expenses)

Note that it is notoriously hard to distinguish deverbal nouns, adjectives and adverbs from verbal inflected forms like gerunds, participles, etc.

  • all hearts broken by her verbal MWE (VMWE) vs. broken hearts - deverbal nominal MWE (NV.VID)
    she was breaking his heart - verbal MWE (VMWE) vs. heart-breaking story - deverbal adjectival MWE (AV.VID)
  • tutti i cuori spezzati da lei verbal MWE (VMWE) vs. cuori spezzati - deverbal nominal MWE (NV.VID)
    ogni volta mi rompi le scatole verbal MWE (VMWE) vs. sei un rompiscatole - deverbal nominal MWE (NV.VID)
  • wszystkie serca, które zostały przez nią złamane all the hearts which were broken by her verbal MWE (VMWE) vs. wszystkie złamane przez nią serca - deverbal nominal MWE (NV.VID)
    łamać serca to break hearts - verbal MWE (VMWE) vs. łamanie serc breaking of hearts - deverbal nominal MWE (NV.VID)

The underlying morpho-syntactic annotation might help in decision making.

Non-unicity of a neutral form

Note that a given MWE type often has more than one neutral form:
  • decisions made - the neutral form can be she makes decisions, I make decision, she/I/we made decisions, etc.
  • le decisioni prese - the neutral form can be io prendo una decisione, lei prende una decisione, etc.
  • serca, które zostały złamane - the neutral form can be złamała serca, złamał serca, złamali serca, etc.
Thus, a neutral form is not the same thing as a lemma, i.e. a unique form representative of a MWE.
In previous versions of these guidelines, a neutral form was called canonical form.

Section 1.5

Lexicalized components and open slots

Just like a single word, notably a verb, the headword of a MWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise. Similarly, the possessive modifier is compulsory in the NMWE someone's right-hand man.

Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone/someone's is not. The headword of a MWE, in its neutral form, is always considered lexicalized. When it can be replaced by another word, like in to make/take a decision, we consider that these are two different MWEs, although possibly synonymous.

Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:

  • Max took the bull by the horns.
  • The news took John by surprise.
  • Bob took part in the inquiry
  • Money burns a hole in Bob’s pocket.

Note on terminology: our definition of lexicalization applies to the component words of a MWE, and not to the whole MWE. This might be counter-intuitive, given the traditional definition of lexicalization as a diachronic process by which a lexeme (word or phrase) acquires the status of an autonomous lexical unit, that is, "a form which it could not have if it had arisen by the application of productive rules" (Bauer 1983, p. 50, apud Lipka et al. 2004, p. 6). In other words, traditionally linguistic studies would use the term "lexicalized" to refer to the whole MWE, as it has idiosyncratic behavior and thus must be listed in the language's lexicon. Our definition, however, stems from computational linguistics and in particular from the parsing literature, in which lexicalized rules refer to rules containing terminal lexemes attached to non-terminal symbols, and a lexicalized grammar is a grammar in which the rules are lexicalized (Manning and Schütze 1999, p. 417; Jurafsky and Martin 2009, p. 507). In this sense, we regard MWEs as syntactic subtrees in which some of the nodes are annotated with the corresponding terminal symbols that are always realized by the same lexeme (i.e. the lexicalized components) and others are non-terminal nodes that can be realized by any lexeme taken from a larger class (i.e. the open slots).

Special case of adpositions

Adpositions have a special status with respect to the notion of lexicalization in verbal MWEs. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE, i.e. it belongs to the valency properties of the verb and is not lexicalized. Other cases include:

  • pay tribute to someone
  • Marco prende parte a una lezione.
  • cackać się z kimś to clutter with someone to act to midly and carefully with someone
Selected prepositions were discarded in edition 1.0 of the guidelines, and re-introduced in edition 1.1 experimentally and optionally via the inherently adpositional verbs (IAV). If the language team decides to take them into account, they are to be considered in the post-annotation step (step 4), i.e. when all other categories have previously been identified and categorized in the given sentence.

In functional MWEs, however, we consider that selected prepositions have a different status: they are lexicalized in FuncMWEs if they are always realized by the same lexemes. This concernes prepositions both preceeding and succeeding the headword:

  • now that
    so that
    given that
    a lot of
    in addition to
    in spite of
    in presence of (in its presence)
  • en dépit de
    au sein de
    suite à
    lors de
    avant que
    avant de
    alors que
    bien que
    en l'absence de (en son absence)
    à l'époque
    à l'époque de
  • cosicché
    dato ciò
    dato che

This difference in considering adpositions as lexicalized in functional MWEs, but not in verbal MWEs, is justified by several factors:

  • headwords in functional MWEs usually subcategorize for one preposition
  • the whole Functional MWEs, together with its selected prepositions, can be most often replaced by a single word (which shows the lexicalized character of the whole string), which is not the case with verbal MWEs
  • this choice better aligns with the principles of Universal Dependencies, where some of such functional MWEs, together with their selected prepositions, are annotated with the fixed relation

Special case of reflexive clitics

Reflexive clitics in inherently reflexive verbs and possesive pronouns in verbal idioms also have a special lexicalization status (see also the note on more or less frozen determiners). In some languages, the same reflexive clitic or possesive pronoun is used regardless of the person and number, inflecting for case only:

  • смея се laugh se.REFL to laugh
    намирам се find se.REFL to be (somewhere)
  • ??
  • -- This category does not apply to Modern Greek
  • n.a.
  • n.a.
  • This category does not apply to Ancient Greek.
  • smijem se laugh.1.SG self I laugh
    smiješ se laugh.2.SG self You laugh
    smiju se laugh.3.PL self they laugh
  • znajduję się find.1.SG.PRES self I find myself
    znajdujesz się find.2.SG.PRES self you find yourself
    znajdują się find.3.PL.PRES self they find themselves
    pójdą na swoje they will go on ones's own they will establish their own household
    pójdziemy na swoje we will go on ones's own we will establish our own household
  • n.a.
  • n.a.
  • smejim se laugh.1.SG self I laugh
    smejiš se laugh.2.SG self You laugh
    smejijo se laugh.3.PL self they laugh
  • радујем се radujem se look.1.SG.PRES froward to I look forward to
    радујеш се raduješ se look.2.SG.PRES forward to you look forward to
    радује се raduje se look.3.SG.PRES forward to She/He looks forward to

In other languages, reflexive clitics and possesive pronouns agree with the subject and the verb:

  • No examples found for Bulgarian.
  • sie wundert sich she wonders self.3.SG she wonders
    ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
  • Ο Γιάννης έκανε την πλάκα του O Yanis ekane tin plaka tu The John made the fun his John had fun
    Τα παιδιά έκαναν την πλάκα τους Ta pedia ekanan tin plaka tus The kids made the fun their The kids had fun
  • I will do my best, They will do their best
  • yo me quejo I self.1.SG complain I complain
    te quejas you self.2.SG complain You complain
  • n.a.
  • je me trouve I self.1.SG find I find myslef
    tu te trouves you self.2.SG find you find yourself
    je vide mon sac I empty my bag I express my secret feelings
    elle vide son sac she empties her bag she expresses her secret feelings
  • μου εἰς τὴν γνώμην εἰσῄειmou eis tēn gnо̄mēn eisēei I.GEN into the opinion.ACC come.into.IMPF.3sg it came to my mind
  • io mi suicido.
  • zij vergist zich she is mistaken self.3.SG she is mistaken
    wij vergissen ons we are mistaken self.2.PL we are mistaken
  • eu me queixo I self.1.SG complain I complain
    tu te queixas you self.2.SG complain You complain
  • eu mă gândesc I Refl.Cl.1sg.Acc. think I am thinking
    tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking

It this case, the clitic or the pronoun is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic and the possesive prounun is a unique lexeme (with lemma się, se, sich, etc. or swój, son, one's) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs and verbal idioms.


Section 1.6

Multiword expressions versus collocations

Collocations are not considered MWEs in this task and should not be annotated. However, the boundary between both categories is not always easy to define and should be handled with care.

We understand collocations as combinations of words whose idiosyncrasy is purely statistical. In other words, tokens in collocations tend to co-occur with each other more often than expected by chance, but they show no substantial orthographic, morphological, syntactic and (most notably) semantic idiosyncrasy. In this way we oppose MWEs to collocations.

Note that other authors understand collocations slightly differently. E.g. for Sag et al. (2002), collocations are any statistically significant cooccurrences, i.e. they include all forms of MWEs. For Baldwin and Kim (2010), collocations form a proper subset of MWEs. According to (Melcuk, 2010), collocations are binary semantically compositional combinations of words subject to lexical selection constraints, i.e. they intersect with what is here understood as MWEs.

Some combinations happen to be very frequent and are perceived as "frozen":

  • سؤالعلى أجاب answer a question
    كتاب إشترى buy a book
    وجبة قدمserve a meal
  • качвам цената raise the price
  • eine Frage beantworten to answer a question, die Graphik zeigt the grahpic shows, einen Bus nehmen to take a bus
  • παίρνω το λεωφορείοperno to leoforio take-1SG the bus
  • drastically drop
    the graphic shows
    to take a bus
  • responder a una pregunta to answer a question
    el gráfico muestra the graphic shows
    coger el autobús to take the bus
  • interesa agertu interest show to show interest
    galdera bati erantzun question one-to answer answer a question
    autobusa hartu bus take to take the bus
  • riješiti dvojbu to solve a dilemma, pripremati jelo to prepare a meal
  • rispondere a una domanda to answer a question
    il grafico mostra the graphic shows
    prendere un bus to take a bus
  • de bus nemen to take the bus
  • zalać rynek to flood the market to dominate the market
  • bater um recorde to break a record (bater to beat has a regular sense of to overcome in addition to the litteral sense)
    entrar em cartaz enter into poster arrive in theaters (for a movie) (the MWE is em cartaz in poster in theaters, the verb just usually collocates with this MWE)
  • lua un autobuztake a bus
  • drastičen upad drastical drop, graf prikazuje graphic shows, vzeti taksi to take a taxi
  • графикон приказује grafikon prikazuje the graph displays, дијаграм илуструје dijagram ilustruje the diagram illustrates
  • 古人 云 the-ancient say the ancient people said
    据 报道 according-to report according to what is reported

However, applying regular lexical alternations to them does not markedly impact their meaning.

  • الاستبيانعلىأجابanswer a questionnaire
    فطور القدم serve a breakfast
    جريدة إشترى buy a newspaper
  • вдигам цената raise the price, увеличавам цената raise the price, качвам залога raise the bet, качвам температурата raise the temperature
  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to go by bus
  • παίρνω το πλοίοperno to plio take the ship
    παίρνω το τραίνοperno to treno take the train
  • significantly drop, drastically decrease, the diagram shows, the graphic illustrates, to take a coach
  • responder a una petición to answer a request
    el diagrama muestra the diagram shows
    coger el tren to take the train
  • interesa erakutsi interest show to show interest →'erakutsi' and 'agertu' are synonyms in this context in Basque
    zalantza bati erantzun doubt one-to answer answer a doubt
    trena hartu train take to take the train
  • riješiti dilemu to solve a dilemma, pripremati obrok to prepare a meal
  • rispondere a una richiesta to answer a request
    il diagramma mostra the diagram shows
  • met de bus gaan to go by bus
  • zdominować/zarzucić/zapełnić/nasycić rynek to dominate/overwhelm/fill/saturate the market
  • quebrar/bater/ultrapassar/estabelecer um recorde to break/beat/overcome/establish a record
    o recorde foi quebrado the record was broken
    entrar/estar/permanecer/ficar/continuar/ter em cartaz enter/be/remain/stay/continue/have in poster
  • lua o mașină
  • občuten upad significant drop, drastično zmanjšanje drastical decrease, diagram prikazuje diagram shows, slika prikazuje picture shows
  • насецкати лук / насецкати першун / исецкати сланину naseckati luk / naseckati peršun / iscekati slaninu to chop onions / to chop parsely / to chop bacon
  • 古人 说 the-ancient say the ancient people said 圣者 云 the-saint say the saint said 据 称 according-to report according to what is reported 有 报道 have report there are reports

The difficulty of distinguishing collocations from MWEs lies in the fact that lexical variability is relevant to some MWEs:

  • نصيحةأعطى/أسدى to give / weave an advice , كلمة /خطابألقىthrew a word / speech give a word/speech
  • нямам пукната пара/пукнат грош to not have a single penny, to be very poor
    имам твърда/дебела глава to have a thick head, to be stubborn and not listen to advice
  • einen Willen/Menschen brechen to break a will/person
  • παίρνω / λαμβάνω απόφασηperno / lamvano apofasi take / take decision to decide
  • to come in handy/useful, to stand firm/fast, to break someone's spirit/will, to take the cake/biscuit
  • dar un paseo/una vuelta give a walk / a turn to go for a walk
    darse/tomar una ducha give.self/take a shower take a shower
  • min eman/egin pain give/do to hurt (somebody)
    eskola/klasea eman class give to give a class →'eskola' and 'klasea' are synonyms in Basque
  • περὶ πολλοῦ / ἐλάττονος ποιέομαιperi pollou / elattonos poieomai above much.GEN / more.GEN / little.GEN do.1SG to hold in high / higher / low esteem
  • slomiti čiju/čiji volju/duh to break someone's will/spirit
  • cogliere/prendere di sorpresa, dare/fornire un contributo
  • zapisać się złotymi literami/zgłoskami to record iteself with golden letters/syllables to be remebered and commemorated for a merit
    zamarznąć na kość/lód/sopel to freeze to bone/ice/icicle to freeze strongly
  • levar em conta/consideração take into account/consideration
    chutar o balde/pau da barraca to kick the bucket/the tent's stick to act irresponsibly
  • lua o decizie/hotărâremake a decision
  • imeti nekaj na voljo/razpolago to have something available/at disposal, odpreti nekomu pot/vrata to open a way/a door (for someone) to give someone an opportunity to do something
  • крити нешто као змија ноге/крити нешто као гуја ноге kriti nešto kao zmija noge/kriti nešto kao guja noge to hide (sth.) like a snake hides its legs/to hide (sth.) like a serpent hides its legs to hide something with extreme caution

However, the extent of the vocabulary concerned by this variability is different for collocations and MWEs. Namely, a head verb in a collocation usually selects a whole semantic class for each of its required arguments. For instance, the verb to take to use a vehicle to travel selects a whole semantic class of means of transport. Similarly, the verb to drop can select a large set of adverbs describing the degree: drastically/significantly/remarkably/slightly/reasonably drop. Conversely, lexical variability in a MWE is limited to a closed list of lexemes, sometimes only loosely semantically related. For instance, the MWEs to take a cake/biscuit and to stand firm/fast do not keep their idiomatic readings with semantically close complements: #to take a cookie/wafer, *to stand hard/rigid/solid etc. See also Test VID.2 [add a link to the similar NID test].

Some Light-verb constructions (LVCs) and multiverb constructions (MVCs), as well as the corresponding devarbal nominal, adjectival and adverbial MWEs (VMWENom, VMWEAdj and VMWEAdv [add links to the pages of these categories]), belong to the gray zone between MWEs and collocations in the sense that some operator (light) verbs seem to select large classes of nouns, as in to make a speech/declaration/remark/etc. However, some studies (e.g. Bonial 2014) show that there is no such thing as truly productive light verbs (e.g. to give a look vs. to give a stare). Therefore, we do include LVCs and MVCs in our annotation scope.


Section 1.7

Multiword expressions versus metaphor

Another phenomenon closely related to MWEs is metaphor. According to (Shutova 2010), "a metaphor occurs when one concept is viewed in terms of the properties of the other. In other words it is based on similarity (presence of common characteristics) between two concepts".

Many MWEs, especially idioms, are based on metaphors. For instance, to take the bull by the horns means to address a problem (the bull) starting with its most challenging aspect (the horns). To set the world on fire is to do something extraordinary and get the admiration (set on fire) of other people (the world), to put all one's eggs in one basket means to rely on one particular course of action (a basket) for success rather than giving oneself several possibilities.

However, verbal metaphors are not always MWEs. Consider the newspaper title "simple steps to lift your dark cloud of stress", and the extract of a poem by Wordsworth, cited by Shutova: "and then my heart with pleasure fills, and dances with the daffodils". The metaphorical expressions to lift dark cloud of stress to relax and my heart ... dances with the daffodils I am happy are not semantically compositional. These expressions, however, were probably constructed for the needs of one article/poem only and are not sufficiently established in the common vocabulary to be considered MWEs.

The distinction between MWEs and metaphors is a relatively unstudied and open question. There are few precise tests, other than statistical, which would allow human annotators to resolve it reliably. Gross (1982) gives some clues on the reproducibility and predictability of metaphors. We suggest that the annotators take notes of such cases and discuss them within their communities, both local and international.


Section 2

Textual annotation scope

In this annotation task, all occurrences of all syntactic types of MWEs are to be annotated in the text.

We annotate, as integral parts of MWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see IVPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.

Similarly, auxiliairies and modals accompanying the main verb of a MWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the MWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.

Both continuous and discontinuous sequences of lexicalized components of MWEs are annotated.

Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or idiomatic verb-particle constructions. Verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are annotated optionally and experimentally as inherently adpositional verbs (IAVs). On the other hand, prepositions selected by functional MWEs, such as in spite of, according to, etc. are considered lexicalized.

The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a MWE or not. We do not annotate their internal syntactic structure. We do annotate, however, MWEs embedded in other MWEs. For instance, the MWE to make a faux pas contains the embedded MWE faux pas and both are to be annotated as different MWEs. Embeddings are discussed on some category's pages, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).

Once identified in a text, MWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single MWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.


Section 3

Categories of MWEs

The top level of MWE categories is motivated by a mixture of morphosyntactic and functional criteria, inspired from the classification of syntactic relations in Universal Dependencies, and includes:

  • verbal MWEs (VMWEs), with several subcategories (defined and annotated in versions 1.0 to 1.3 of these guidelines)
  • nominal MWEs (NMWEs), including nominal idioms and nominal MWEs derived from VMWEs
  • adjectival and adverbial MWEs (AMWEs), including adjectival and adverbial idioms, with separate subcategories for those derived from VMWEs
  • functional MWEs (FuncMWEs), including multiword determiners, adpositions, conjunctions and interjections

This classification, covering all syntactic types of MWEs, is new in version 2.0 of the guidelines. Previous versions covered verbal MWEs only. For a summary of changes with respect to edition 1.3, see the what's new file.

In practice, to identify and categorize MWEs during manual annotation, one must start at the unique entry point and follow the decision diagrams specific for the distribution of a MWE candidate:


Section 3.1

Categories of verbal MWEs

We distinguish the following categories of verbal MWEs:

  • Two universal categories, i. e. valid for all languages participating in the task:
    • Light verb constructions (LVCs) with two subcategories:
      • LVCs in which the verb is semantically totally bleached (LVC.full)
        • حكم أصدر pronounce judgmenthe pronouncd a judgment
        • държа под контрол to keep under control
        • eine Rede halten a speech holdto give a speech
        • (OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)
        • παίρνω μία απόφασηperno mia apofasi take-1SG a decision to decide
          δίνω μια εξήγησηdino mia exigisi give.1SG an explanation to explain
          ασκώ κριτικήasko kritiki to criticise
        • to give a lecture
        • hacer una promesa to_make a promise to make a promise
        • min hartu pain take to hurt oneself
          lo egin sleep do to sleep
        • avoir du courage to have courage
        • bain triail as extract trial from try
        • διάνοιαν ἔχεινdianoian ekhein thought.ACC have.INF to have a thought
          τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish
          λόγοις χράομαιlogois khraomai words.DAT use.1SG I speak
          ἐν νῷ ἔχωen nо̄ ekhо̄ en mind.DAT have.1SG I have in mind
          ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry
        • držati govor hold a speech to give a speech
        • fare un discorsoto_make a speechto give a speech
          fare una promessa to_make a promise to make a promise
        • გავლენას ახდენსgavlenas axdens he/she performs influence he/she affects
          ზიანს აყენებსzians aqenebs he/she puts damage he/she harms
        • pieņemt lēmumu to take a decisionto make a decision
        • ħa deċizjoni took a decision
        • een toespraak houden a speech hold to give a speech
        • podjąć decyzję to take a decision
        • fazer uma promessa to make a promise
        • a lua o decizie to take a decisionto make a decision
        • imeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion
        • jap mësim give lesson give a lecture
          bëj një premtim do a promise make a promise
        • донети одлуку doneti odluku to bring a decision to take a decision
        • hålla ett tal hold a speechto give a speech
        • 做 讲座 do speech to give a speech
      • LVCs in which the verb adds a causative meaning to the noun (LVC.cause)
        • قيمه أعطى give a value to give a value for somth or someone
        • давам възможност give an opportunity
        • (OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)
        • δίνω προτεραιότητα
        • to grant rights
          to give a headache
          to provoke the destruction of the building
        • dar dolor de cabeza to_give pain of head to give a headache
          hacer ilusión to_make excitement to make excited/to look forward to
        • cuir lúcháir ar put joy on give delight to
        • τιμωρίαν ἀποδίδωμιtimо̄rian apodidо̄mi punishment.ACC give.1SG I inflict punishment
          ὀργὰς παρασκευάζομαιorgas paraskeuazomai anger.ACC.PL cause.1SG I make angry
          δίκην ἐπιτίθημιdikēn epitithēmi justice.ACC impose.1SG I fine (sb)
          τιμωρίαν ποιέωtimо̄rian poieо̄ punishment.ACC do.1SG I inflict punishment
        • zadati glavobolju komu to give a headache to someone, izazvati nezadovoljstvo to cause dissatisfaction
        • dare il mal di testa to_give pain of head to give a headache
          dare noia to_give trouble to annoy
        • nest nelaimi to carry misfortuneto bring misfortune
        • rechten verlenen rights grant to grant rights
        • nakłada obowiązek na użytkowników put a duty on the users
          dać prawo to give the rightto grant the right
          narazić na straty expose to losses
          stawiać komuś celto put an aim to someone to set a goal to someone
        • da cuiva bătăi de cap give sb. a hard time
        • dati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)
        • jap të drejtë give the right grant rights
        • задати главобољу некоме zadati glavobolju nekome nekome to give a headache to someone to make problems to someone
          створити прилику stvoriti priliku nekome create an opportunity
        • 授予 权力 give power to grant power
    • verbal idioms (VIDs):
      • إجتماععقدtie a meeting to lead a meeting
      • правя се на дръж ми шапката to behave myself as 'hold my hat' pretend to be naive and innocent
        цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
        река и отсека to say and cut to say firmly, decisively
      • schwarz fahren to drive black take a ride without a ticket, in Kraft treten into force step to come into effect, in die Waagschale werfen in the weighing pan throw to bring to bear
        einen drauf setzen going one better
      • (OEG) 𓐣𓂝𓏝 𓃹𓈖𓇋𓋴 𓌃𓅱𓏝 𓈖 𓋹𓈖𓐍𓅱 wč̣ꜥ Wnꞽś mṭw n ꜥnḫ.w Unas (Wnꞽś) shall-separate (wč̣ꜥ) the word (mṭw) for (n) the living (ꜥnḫ.w). Unas shall judge the living (PT 273b, W)
      • κόβω φλέβεςkovo fleves cut vains to be at a complete state of boredom
        απορώ και εξίσταμαι wonder1SG.PST and be-amazed1SG.PST to wonder
        παίρνω των ομματιών μουperno ton omation mu take the eyes mine to leave (in dispair)
        χάνω τα αυγά και τα καλάθιαchano ta avga ke ta paschalia loose-1SG the eggs and the baskets to be at a complete and utter loss
        κόβει το μάτι μουkovi to mati mu cut.3SG the.SG.NOM eye.SG.NOM my to be sharp-eyed
        παίρνουν τα μυαλά μου αέραpernun ta miala mu aera take.3PL the.PL.NOM brain.PL.NOM air.SG.ACC to become arrogant
        δεν δίνω του αγγέλου μου νερόden dino tu agelu mu nero not give my angel water to be stingy
      • to go bananas
        fortune favors the bold
        to drink and drive
        to voice act
        to pretty-print
        to short-circuit
        to tumble dry
      • hacer de tripas corazón make of intestines heart to pluck up the courage
        dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
        dar gato por liebre to_give cat for hare to rip off, to take for a ride
      • adarra jo horn play to pull (somebody's) leg, to be kidding
        burua hautsi head break to rack one's brains, to think very hard
        ikusi eta ikasi see and learn
        hortxe dago koska just-there is the-crux that's the crux of the matter
      • défendre son bifteck defend one's beefsteak to defend one's interests
        court-circuiter to short-circuit
      • ag cur is ag cúiteamh arguing and debating arguing back and forth
      • περὶ πολλοῦ ποιέομαιperi pollou poeomai above much.GEN do.1SG I hold in high esteem
        οἷον τ'ἦνhoion t’ēn of.what.sort.NOM and was.3SG it was possible
        δίκην δίδωμιdikēn didо̄mi justice.ACC give.1SG I get punished
      • mlatiti praznu slamu to beat empty straw to talk aimlessly, mazati komu oči to blur eyes to someone to cheat someone
      • gettare le perle ai porci to_throw the pearls to the pigs to waste something good on someone who doesn't care about it
        andare e venire to_come and goback and forth
        corto-circuitare
        to short-circuit
      • შიშს ჭამსšišs čams he/she eats horror to be startled, to panic; to be horrified / deeply shocked
        უარს აცხადებსuars acʻxadebs He/she declares a refusal to refuse
        აღმართს ახვნევინებსaġmartʻs axvnevinebs he/she makes (someone) plow uphill He/she/she forces their will on others
        გაივლის ვინმეს ხელშიgaivlis vinmes xelši he/she will pass through someone's hand he/she will go through someone's control or possession
      • atstiept kājas to strech one's legs to die
      • għasfur żgħir qalli a bird small told me to hear something from the grapevine
        iqum u joqgħod jump and stay to fidget
      • het ijs breken ice break to break the ice
      • rzucać grochem o ścianę throw peas agains a wall to try to convince somebody in vain
        pluć i łapać to spit and catch to be lazy, to do nothing useful
      • fazer das tripas coração transform the tripes into heart to try everything possible
        pintar e bordar paint and knit to abuse
      • a trage pe sfoară to pull on rope to fool
        a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
      • ubiti dve muhi na en mah to kill two flies with one strike to achieve two aims at once, spati kot ubit to sleep like dead to sleep soundly
      • i bie murit me kokë hit the wall with head to try the impossible
        i flakën to it put flame to cause trouble
      • држати реч držati reč to hold a word to keep a promise
        храбре срећа прати hrabre sreća prati fortune follows the bold fortune favors the bold
        китити се туђим перјем kititi se tuđim perjem decorate oneself with someone else's feathers steal someone's thunder / take credit for someone else's accomplishments
      • 吃 闭门羹 eat closed-door-soup to be locked out
        哑巴 吃 黄连 dumb-person eat bitter-medicine a dumb person eats bitter medicine, and he cannot speak out the bitterness
  • Three quasi-universal categories, valid for some language groups or languages but non-existent or very exceptional in others:
    • inherently reflexive verbs (IRV):
      • усмихвам се to smile
      • sich bemühen to endeavour, sich enthalten himself contain to abstain
      • (OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).
      • - NA in Modern Greek
      • to find oneself in a difficult situation
        to to help oneself to the cookies
      • suicidarse to suicide
        quejarse to complain
      • n.a.
      • se suicider to suicide
        se soucier to worry
      • n.a.
      • This category does not apply to Ancient Greek.
      • smijati se to laugh
      • suicidarsi to suicide
        lamentarsi to moan
      • zich bemoeien to get involved, zich vergissen to be mistaken
      • bać się to fear SELFto be afraid
      • se queixar to complain
      • a se gândi to think
      • bati se to be afraid, smejati se to laugh, drzniti si to dare to do something
      • gëzohem rejoice myself to be happy
        pendohem repent myself to regret
        kujdesem to care myself to take care
      • бојати се bojati se to be afraid
        коцкати се kockati se to gamble
    • idiomatic verb-particle constructions (IVPC) with two subcategories:
      • fully non-compositional IVPCs (IVPC.full), in which the particle totally changes the meaning of the verb
        • not applicable to Bulgarian
        • er gibt auf he gives up, er wirft ihr das vor he throws her that against he reproches that to her
        • μπαίνω μέσα get in get in to go bankrupt
          βάζω μπροςvazo bros put forward to start
        • to do in
        • n.a.
        • n.a.
        • cas chuig turn towards happen to have
        • This category does not apply to Ancient Greek.
        • postaviti za to set for to appoint
        • buttare giù to_throw down to swallow
        • hij geeft op he gives up
        • not applicable to Polish
        • jogar fora This seems to be the only VPC in Portuguese. We annotate it as ID and do not use the VPC category.
        • n.a.
        • n.a.
        • hedh poshtë
        • n.a.
      • semi non-compositional IVPCs (IVPC.semi), in which the particle adds a partly predictable but non-spatial meaning to the verb
        • not applicable to Bulgarian
        • κάνω πίσωkano piso do back to back off
        • to eat up
        • n.a.
        • tabhair suas give up
        • This category does not apply to Ancient Greek.
        • andare avanti to_go forward to move on
        • opeten to eat up
          opdrinken to drink up
        • n.a.
        • n.a.
        • eci para
        • n.a.
        • 把握 住 机会 grasp hold opportunity to grasp the opportunity successfully → a Chinese Resultative Verbal Construction (RVC)
    • multi-verb constructions (MVC):
      • will sagen want to say that is to say
      • (MEG) 𓁹𓏏 𓀀 𓈝𓅓𓏏𓂻 𓅓 𓏃𓈖𓏏𓇋𓇋𓏏𓊛 ꞽr.t (⸗ꞽ) šm.t m ḫnt.yt My (⸗i) making (ir.t) of going (šm.t) southwards (m ḫnt.yt) I made a departure southwards. (Sin. B 5-6)
      • έχω να κάνωecho na kano have to do to cope
        έδωσα πήραedosa pira give.1PST take.1PST I struggled
      • to let go
        to make do
      • querer decir to_want to_say to mean
      • ?
      • laisser tomber let fall to give up
        vouloir dire want say to mean
      • ?
      • φθάνουσι ἐρχόμενοιphthanousi erkhomenoi overtake.3PL go.PTC they go first
        τυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to go
      • može biti can be it is possible
      • lasciar andare to_let go to unhand
        voler dire to_want say to mean
      • wil zeggen want to say that is to say
        laten vallen let fall to give up
        leren kennen to learn know to become acquainted
      • dać komuś żyćto let someone livenot to bother someone
        można wytrzymaćone can standthe situatiion is reasonably good
      • querer dizer want say to mean
        ouvir falar hear speak to know/remember vaguely
      • n.a.
      • n.a.
      • do të thotë
      • дај шта даш daj šta daš give what you give to be satisfied with small (from someone)
        ићи куда некога ноге носе ići kuda nekoga noge nose to go where one's feet carry somone to go without an aim
      • 排列 成 arrange become to arrange to be
        试试 看 try see to try and see
  • language-specific categories, defined for a particular language in a separate documentation.

We also introduce an optional experimental category which (if admitted by the given language) is to be considered in the post-annotation step:

  • inherently adpositional verbs (IAVs)
    • فيرغبwant to he has a desire to do something
    • излизам пред някого/нещо come in front of someone/something to surpass, to outdo
      излизам със становище come out with a statement
    • n.a.
    • to come across
      to rely on
    • confiar en to_trust in to trust in entender de to_understand of to know about
    • n.a.
    • caith anuas ar throw down on belittle
    • This category does not apply to Ancient Greek.
    • suočiti s to face with
    • confidare su to_trust in to trust in intendersi di to_understand of to know about
    • behoren tot to belong to
    • godzić się na każde warunki to agree on any condition
      mieć do czynienia z czymś to have to do with sth
      odwieść kogoś od czegoś to dissuade someone from doing sth
    • conta pe count on
    • dati skozi give through to go through, gre za it goes about it is about
    • mbështetem në
    • n.a.

Section 3.2

Categories of nominal MWEs

We distinguish two classes of nominal MWEs (NMWEs):

  • Nominal idiom (NID)- a universal category, caracterized by lexical, morphological or syntactic irregularity:
    • (OEG) 𓇓𓏏 𓆤𓏏 nsw - bꞽtꞽ The king of Upper Egypt (𓇓𓏏) and Lower Egypt (𓆤𓏏). The king of Egypt (PT 776a, P) → For the meaning of nsw-bꞽtꞽ see Schenkel, Das Wort für 'König' (von Oberägypten), 1986.
    • φακός επαφήςfakos epafis lense contact.GEN.SG contact lense
    • a big fish an important person
      a hot dog a sandwich with a hot sausage
    • un pesce grosso
      il braccio destro
    • აბრამის ბატკანიabramis batkani Lamb of Abraham Completely innocent, simple person Lamb of God
      ადამის ჟამისadamis žamis Of Adam's time Old, very old person
    • baltais zvirbulis the white sparrow a person who stands out from the crowd
    • blinde vink small roll of minced meat, wrapped in a slice of veal or beef
      hotdog a sandwich with a hot sausage
      zwarte markt black market
    • biały kruk a whit raven a rare thing
    • kokë e madhe big head an important person
    • осиње гнездоosinje gnezdo wasps' nest dangerous place
    • акула бізнесуakula biznesu business shark an agile, goal-oriented person with excellent business skills and undeniable advantages over competitors
      біла воронаbila vorona white crow is a person who is different from the rest
      об’ємний звукob’jemnyj zvuk surround sound sound coming from all directions
      холодна війнаxolodna vijna cold war a period of prolonged tension between countries that did not involve direct military action but included economic and political competition, espionage, etc.
  • Pronominal idiom (PronID) - a universal category constituting a closed lists of cases:
    • (OEG) 𓅱𓌡𓏤 𓊪𓈖 𓇋𓅓 𓎡 wꜥ pn ꞽm(.ꞽ) ⸗k this (pn) one (wꜥ) who-is-in (ꞽm(.ꞽ)) you (⸗k). This one who is in you. (PT 254a)
    • I saw just a few
      I expect no one to come
      we love each other
    • je ne suis pas capable de manger quoi que ça soit I am not able to eat what that this be I cannot eat anything
    • ci amiamo l'un l'altro
    • viens otrsone othersome people
      tas patsthet selfthe same
      kaut kassomething
      dažs labsfew goodsomebody; some people
    • powtarzał ciągle to samo he repeated always this the same he repeated always the same
      coś tam jeszcze something there more something more
      byłoby to co innego it would be what different it would be something else
    • Ne duam njëri-tjetrin. We love each other. We love each other.
    • сам по себиsam po sebi by itself
    • кохаємо один одногоkoxajemo odyn odnoho love each other means that two or more people feel deep love, affection and mutual love for each other
      сама собоюsama soboju by herself of course
  • Deverbal nominal MWE (NV) with subcategories corresponding to the categories of VMWEs from which the nominal MWE can be derived:
    • universal subcategories:
      • Deverbal nominal stemming from an LVC.full (NV.LVC.full)
        • a decision maker - deriving from the LVC.full to make a decision
        • lēmuma pieņēmējs a decision maker - derives from the LVC.full pieņemt lēmumu to take a decision to make a decision
        • sianie zgorszenia sowing scandal provoking scandal - derives from the LVC.full siać zgorszenie sow scandal provoke scandal
        • marrës vendimesh "a decision maker" → from "marr një vendim" ("to make a decision").
        • пружање подршкеpružanje podrške providing support - derives from the LVC.full пружати подршку pružati podršku to provide support
      • Deverbal nominal stemming from an LVC.cause (NV.LVC.cause)
        • a doubt-raiser - deriving from the LVC.cause to raise doubts
        • nelaimes nešana bringing of misfortune - derives from the LVC.cause nest nelaimi to bring misfortune
        • dostarczanie wrażeń delivering of impressions giving impressions - derives from the LVC.cause dostarczać wrażeń deliver impressions give impressions
        • ngjallës dyshimesh raiser of doubts a doubt-raiser derives from the LVC.cause ngjall dyshime (to raise doubts)
        • изазивање реакцијеizazivanje reakcije provoking a reaction - derives from the LVC.cause изазивати реалцију izazivati reakciju provoke a reaction
      • Deverbal nominal stemming from a VID (NV.VID)
        • Wortbruch word-break a promise which has not been hold - derives from the VID ein Wort brechen word break not to hold a promise
        • (OEG) 𓅓𓎕 𓄣 𓈖 𓇓𓏏𓈖 mḥ ꞽb n(.ꞽ) nsw the-one-who-fills (mḥ) the heart (ꞽb) of (n(.ꞽ)) the king (nsw) The king's confidant (Urk. I 190, 11) = > mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) '(My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ)' 'My lord trusted me' → It is an NV.VID.full.
        • a heart breaker - deriving from the VID to break one's heart
        • la prise en compte the fact of taking into account - derives from the VID prendre en comptetake into account
          une mise à disposition the fact of making available - derives from the VID mettre à dispositionmake available
        • uno spezzacuori - deriving from the VID spezzare un cuore
        • kāju atstiepšana stretching of one's legs dying - derives from the VID atstiept kājas to stretch one's legs to die
        • zabawa czyimś kosztem a play at someone else's expenses - derives from the VID bawić się czyimś kosztem to enjoy oneself at someone else's expenses
        • thyerës zemrash breaker of hearts heartbreaker derives from the VID thyej zemrën (break the heart)
        • долазак на светdolazak na svet coming to the world birth - derives from the VID dolaziti na svet
          одузимање животаoduzimanje života depriving of life deprivation of life - derives from the VID одузети живот oduzeti život take a life
    • quasi-universal subcategories:
      • Deverbal nominal stemming from an IRV (NV.IRV)
        • cackanie sięz przestępcami dealing too mildly with bandits - derives from the IRV cackać się dealing too mildly with someone
      • Deverbal nominal stemming from an IVPC.full (NV.IVPC.full)
        • a take-off - deriving from the IVPC.full to take off
      • Deverbal nominal stemming from an IVPC.semi (NV.IVPC.semi)
      • Deverbal nominal stemming from an MVC (NV.MVC)
    • optional experimental subcategory:
      • Deverbal nominal stemming from an IAV (NV.IAV)

    Section 3.3

    Categories of adjectival and adverbial MWEs

    We distinguish three classes of adjectival and adverbial MWEs (AMWEs, previously also called modifier MWE or ModMWEs):

    • Adjectival idiom (AdjID) - a universal category, caracterized by lexical, morphological or syntactic irregularity: mistake made by students, sorry :)
      • (OEG) 𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹 𓋴𓆓𓄔𓏏𓅓 𓌃𓅱𓂧 𓇋𓍘𓅱 pśč̣.(w)t śč̣m.t mṭw ꞽtꞽ.w The Enneads (pśč̣.(w)t) which-hear (śč̣m.t) the word (mṭw) of the monarch (ꞽtꞽ.w). The Enneads which interrogate the monarch (PT 511c, W)
      • a well-worn coat
        to be up in arms to be very angry
        a bottom-up algorithm an algorithm starting from details and moving on to more general principles
      • un argomento trito e ritrito repeated over and over
        una fregatura bella e buona a real fraud
      • мртав пијанmrtav pijan dead drunk dead drunk
        нов новцијатnov novcijat new new brand new
        вредан поменаvredan pomena worthy of mention worth mentioning
    • Adverbial idiom (AdvID) - a universal category, caracterized by lexical, morphological or syntactic irregularity:
      • (OEG) 𓆓𓏏𓇿 𓂋 𓈖𓅘𓎛𓎛 č̣.t r nḥḥ for the linear-eternity (č̣.t) to (r) the circular-eternity (nḥḥ). for ever and ever (PT 414c, W)
      • by and large generally speaking
      • par la force des choses by the strength of the things inévitably
      • tutto sommatoeverything summed up all in all
      • ულუკმოდ დარჩენილიulukmod darčʻenili without a bite left left hungry; left starving
      • aiz restēm behind bars in prison
        droši viensure only sure enough
      • zrobić coś raz dwa to do something one two to do something quickly
        pod kluczem under the key in prison
      • очас послаočas posla immediatly work in the blink of an eye
      • зробити для галочкиzrobyty dlja haločky to do something just for show to do something perfunctorily
        збирати по крихтахzbyraty po kryxtax collecting crumbs assemble something in small, often insignificant parts
        останнім часомostannim časom recently it is a phrase that indicates a certain period of time that has recently ended
    • Deverbal adjectival/adverbial MWE (AV) - with subcategories corresponding to the categories of VMWEs from which the AMWE can be derived:
      • universal subcategories:
        • Deverbal AMWE stemming from an LVC.full (AV.LVC.full)
          • żołnierz wzięty do niewoli a soldier took into castody emprisoned soldier - wziąć do niewoli is an LVC.full
          • донет законdonet zakon brought law passed law derives from the LVC.full донети закон doneti zkon to pass a law
        • Deverbal modifier stemming from an LVC.cause (AV.LVC.cause)
        • Deverbal modifier stemming from a VID (AV.VID)
      • quasi-universal subcategories:
        • Deverbal AMWE stemming from an IRV (AV.IRV)
        • Deverbal AMWE stemming from an IVPC.full (AV.IVPC.full)
          • a run-down apartment - adjectival MWE deriving from the IVPC.full to run down
        • Deverbal AMWE stemming from an IVPC.semi (AV.IVPC.semi)
        • Deverbal AMWE stemming from an MVC (AV.MVC)
      • optional experimental subcategory:
        • Deverbal AMWE stemming from an IAV (AV.IAV)

      Section 3.4

      Categories of functional MWEs

      We distinguish four classes of functional MWEs (FuncMWEs), all of them universal:

      • Determiner idiom (DetID) :
        • I work from home roughly every other day
        • tas pats cilvēks that self personthe same person
          katru otro dienuevery second dayevery other day
        • zadałem sobie to samo pytanie I asked myslef this same question I asked myslef the same question
          przekaż mu te oto słowa transfer him these here words transfer him these words
        • той чи інший бікtoj čy inšyj bik one side or the other one of several
          той чи той випадокtoj čy toj vypadok this or that case in each of the two options
      • Adposition idiom (AdpID):
        • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N)
        • in front of the station
        • di fronte alla stazione
        • განზე გაგონილიganze gagonili heard from the side heard unintentionally
          თითზე ჩამოსათვლელიtʻitʻze čʻamosatʻvleli countable on (one’s) fingers a few
        • līdz pat until evenup to; until
        • hij speelde één grastoernooi ter voorbereiding op het Grand Slam he played one grass tournament in preparation for the Grand Slam
        • gwarancji nie ma nawet w przypadku arcymistrza there is no guarantee event in the case of a grandmaster
        • смештај у близини лукеsmeštaj u blizini luke accommodation in proximity of the port accommodation near the port
        • під час вечеріpid čas večeri during dinner when an action or event is in progress
          у межах співпраціu mežax spivpraci within the framework of cooperation within the framework of something
          за допомоги друзівza dopomohy druziv with the help of friends using something or someone to achieve a goal
      • Conjunction idiom (ConjID):
        • (OEG) 𓈖 𓈖𓏏𓏏 n-n.tt “for (n) (the fact) that (n.tt) because (PT 716e, T)
        • she was fortunate in that she had friends to help her
        • la cérémonie sera projetée sur grand écran afin que tout le monde puisse suivre the ceremony will be projected on a big screen so that everyone can follow
        • lei è fortunata in quanto ha amici che la aiutano
        • vārdnīca, arī locījumu tabula a dictionary, as also an inflection table a dictionary as well as inflection table
        • zmęczony mimo źe dzień się dopiero zaczynał tired although that the day was only beginning tired although the day was only beginning
        • Пазите само да не оштетите кип.Pazite samo da ne oštetite kip. Just be careful not to damage the statue
        • для того, щобdlja toho, ščob in order to with the aim of, in order to, in order to achieve something
          не тільки, але йne til'ky, ale j not only, but also not only ... but also ...
          чи то…, чи то…čy to…, čy to… either..., or... indicates the possibility of several options, but is not precisely defined
      • Interjection idiom (IntjID):
        • damn it!
        • bon sang! good blood! damn it!
        • mannaggia!
        • ვაი შენს ტყავს!vai šens tqavs! Wow to your skin! You're in trouble! or, Oh, poor you!
        • pie velna! at the devil! Damn it!
        • do diabła! To the devil! Damn it!
        • алал вераalal vera blessing faith congratulations
        • Слава Богу!Slava Bohu! Thank God! expresses gratitude or relief when something good has happened or danger has passed
          До дідька!Do did'ka! Damn it! 1) a lot. 2) used to express dissatisfaction with someone's behavior, actions, deeds, etc. 3) goes away
          Дідька лисого!Did'ka lysoho! Damn bald guy! 1) used as a categorical denial of something 2) absolutely nothing
          Якщо хочете, …Jakščo xočete, … If you want, … a polite suggestion or invitation to do something

      Section 4

      Annotation process

      We propose the following methodology for MWE annotation:

      • Step 1 - identify a candidate, that is, a combination of at least two words which could form a MWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). Find the neutral form of the candidate. The following steps should be applied to this neutral form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
      • Step 2 - determine which components of the candidate (in its neutral form) are lexicalized, that is, if they are omitted, the MWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
      • Step 3 - depending on the syntactic structure of the candidate's neutral form, formally check if it is a MWE using the generic and category-specific decision diagrams and tests decribed below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
      • Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.

      The unique entry point to Step 3 above is the following test:

      Top test - [DIST] - Distribution

      What is the distribution of the neutral form of the candidate in the particular context? This can be tested by replacing the MWE candidate with a single word having the given part of speech, and checking if such a replacement, although possibly changing the meaning, does not lead to a loss of grammaticality or acceptability. If such a replacement test passes for a large class of single words of the same POS, the candidate is considered as having the distribution of this POS.

      • Determiner, conjunction, adposition or interjection ⇒ Apply the functional MWE testsFuncMWE tests positive?
        • Annotate with the FuncMWE subcategory determined via the guidelines
        • It is not a MWE, exit
      • Adjectival or adverbial phrase ⇒ Apply the adjectival and adverbial MWE testsAMWE tests positive?
        • Annotate with the AMWE subcategory determined via the guidelines
        • It is not a MWE, exit
      • Verb, verbal phrase or verbal clause ⇒ Apply the verbal MWE testsVMWE tests positive?
        • Annotate with the VMWE subcategory determined via the guidelines
        • It is not a MWE, exit
      • Noun or nominal phrase ⇒ Apply the nominal MWE testsNMWE tests positive?
        • Annotate with the NMWE subcategory determined via the guidelines
        • It is not a MWE, exit

      Section 5

      Specific tests for categorizing verbal MWEs

      Once a candidate VMWE has been pre-identified in steps 1 and 2 of the annotation process, and its distribution was established as verbal, the confirmation of its status as a VMWE, as well as its categorization, is done according to the decision diagrams and tests described in the following sections:

      Additionally, language-specific categories (LS) can be defined and tests for them can be used to annotate them in a given language or language group only.


      Section 5.1

      Generic structural tests for verbal MWEs (S)

      Structural tests are quite simple preliminary tests that help determining the syntactic structure of the VMWE candidate. This is required in order to point at the right category-specific identification tests.

      The decision diagram below indicates the order in which the structural tests should be applied when the candidate MWE has a verbal distribution established in the DIST test. The decision diagrams are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.

      Generic decision tree for verbal MWE candidates

      If you are annotating Italian or Hindi, go to the Italian-specific VMWE decision diagram or Hindi-specific decision diagram. For all other languages follow the tree below.

      • Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
        • Apply the VID-specific testsVID tests positive?
          • Annotate as a VMWE of category VID
          • It is not a VMWE, exit
        • Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
            • Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit
            • Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
              • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
                • Annotate as a VMWE of category IRV
                • It is not a VMWE, exit
              • Particle ⇒ Apply IVPC-specific testsIVPC tests positive?
                • Annotate as a VMWE of category IVPC.full or IVPC.semi
                • It is not a VMWE, exit
              • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
                • Annotate as a VMWE of category MVC
                • Apply the VID-specific testsVID tests positive?
                  • Annotate as a VMWE of category ID
                  • It is not a VMWE, exit
              • Extended NP ⇒ Apply LVC-specific decision treeLVC tests positive?
                • Annotate as a VMWE of category LVC
                • Apply the VID-specific testsVID tests positive?
                  • Annotate as a VMWE of category VID
                  • It is not a VMWE, exit
              • Another category ⇒ Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit

      Test S.1 - [HEAD] - Syntactic head

      Does the candidate contain a unique verb functioning as the functional syntactic head of the whole?

      • Apply the VID-specific tests
        • تنلاصبرbe patient you getif you stay patient you will get what you want →non of the verbs is clearly the head, as there in no universally accepted syntactic representations of coordinations
        • цъфна и вържа → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • leben und leben lassen live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • έδωσε πήρεedose pire gave3SG.PA took3SG.PA he succeeded none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • to pretty-print → there is an unusual case of an adjective modifying a verb
          to drink and drive → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • coser y cantarto_sew and to_singeasy as pie, a piece of cake
        • ikusi eta ikasi see and learn → none of the verbs is clearly the head
        • ag cur is ag cúiteamh arguing and debating arguing back and forth → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • ἠντεβόλει καὶ ἱκετεύεēntebolei kai iketeue supplicate.3SG and beseech.3SG he begged and beseeched
        • žariti i paliti to stoke and to burn to be powerful , vedriti i oblačiti to brighten and to cloud to be poweful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • vivi e lascia vivere live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • stāvēt un krist to stand and to fallto be very sure (of something); to defend with confidence → none of the verbs is clearly the head, they are coordinated
        • leven en laten leven live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • pluć i łapać to spit and catchto be lazy, to do nothing useful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • pintar e bordar paint and knit to abuse
        • živi in pusti živeti to live and let live to live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • hyr e dil come and go come and go none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • ведрити и облачити vedriti i oblačiti to brighten and cloud to be very powerful
          што не иде не иде što ne ide ne ide what doesn't go, doesn't go don't force something → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
        • det knallar och går it trots and walks it is OK/as usual → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • continue to the next test
        • ريح للرجليه أسلمhe gave his feets to the wind he runs away so fast أسلمto give is the head and the NP depends on it
        • гушна букета to hug the bunch of flowers to die гушна is the head and the NP depends on it
          правя на салата to make into salad to scold правя is the head and the PP depends on it
        • eine Fratze ziehen a grimace pull to make a face ziehen is the head and the NP depends on it
          er gibt auf he gives up gibt is the head and auf is the particle depending on it
        • κάνω γκριμάτσαkano grimatsa to make grimace to make a face κάνω is the head and the NP depends on it
          παίρνω μία απόφασηperno mia apofasi take a decision to make a decision, to decide παίρνω is the head and the NP depends on it
          βάζω μπροςvazo bros put forward to start βάζω is the head and μπρος depends on it
        • to make a face make is the head and the NP depends on it
          to give up give is the head and up is a particle depending on it
        • dar la cara to_put the face face the consequences dar is the head and the NP depends on it
          hacer muecas to_make grimmaces to make a face hacer is the head and the NP depends on it
        • lan egin work do to work → the verb egin is the head and the NP depends on it
        • éirigh as rise out of quit → the verb éirigh is the head and the particle as depends on it
        • χάριν ἔχειkharin ekhei gratitude.ACC have.3SG he is grateful → ἔχει is the head and the NP depends on it
        • složiti facu make a face to show reaction složiti is the head and the NP depends on it
        • fare le linguacce to_make the grimaces fare is the head and the NP depends on it
          far fuori to_make out to kill fare is the head and fuori is a particle depending on it
        • naar de bekende weg vragen for the known road ask vragen is the head and naar de bekende weg is the extended NP depending on it
        • zbijać bąki to smash fartsto fool around, to do nothing usefulzbijać is the head and the NP bąki depends on it
          dać komuś popalićto let someone smoketo make someone's life hard dać is the head and the infinitive popalić depends on it
        • bater as botas bater is the head and the NP depends on it
          criar vergonha na cara criar is the head and the two NPs depend on it
        • a face baie to make bath to bath face is the head and the NP depends on it
          a ieși înainte to go forth to greet ieși is the head and înainte is a particle depending on it
        • imeti krompir to have potatoes to be lucky imeti is the head and the NP depends on it
        • heq dorë remove hand give up heq is the head, and dorë depends on it.
        • обесити нос obesiti nos hang one's nose to feel downобесити is the head and the NP нос depends on it
          седети скрштених руку to seat with arms crossedto be inactive, withut the initiative седети is the head and the NP (in the instrumental case) скрштене руке depends on it
        • att ge upp to give up ge is the head and upp is the particle depending on it

      The aim of this test is to categorize (as VID or no VMWE) those candidates which have no single clearly identified head verb. This is necessary because all other tests refer to the single head verb v and its dependents. Note that the test should be applied to the neutral form of each candidate. This is required because there may be no verb or the verb may not be the syntactic head in such a non-neutral variant.

      • قرارال أخذ to make a decision passes the test → variants like هأخذ الذيقرار ال the decison that he made, قراراتال أخذ making decisions , مأخوذةقراراتdecisions made passes the test as well
      • вземам решение passes the test → variants like решението, което беше взето pass the test as well
      • eine Entscheidung treffen make a decision passes the test → variants like die Entscheidung wurde getroffen the decision was made, die Entscheidung, welche getroffen wurde the decision which was made, das Treffen der Entscheidung the making of the decision pass the test as well
      • παίρνω μία απόφαση make a decision passes the test → variants like η απόφαση που πήραμε, πάρθηκε απόφαση, παίρνοντας απόφαση pass the test as well
      • to make a decision passes the test → variants like the decision which was made, decision-making, the making of the decision pass the test as well
      • tomar una decisión passes the test → variants like la decisión fue tomada, tomando esa decisión, la decisión que tomaron pass the test as well
      • erabakia hartu decision take to make a decision passes the test → variants like hartutako erabakia the decision (which was) made, erabaki hura hartzea (the fact of) making that decision, erabakiak hartutakoan when the decisions were made pass the test as well
      • déan comhairle make counsel make a decision passes the test → variants like comhairle a dhéanamh counsel to make to make a decision ag déanamh comhairle at making counsel making a decision
      • δόξαν ἔχουσιdoxan ekhousi reputation.ACC have.3PL they have a reputation passes the test
        δόξαν ἣν ἔνιοι ἔχουσι περὶdoxan hēn enioi ekhousi peri opinion.ACC which some have.3PL about the opinion which some hold about is a variant and passes the test
      • donijeti odluku make a decision passes the test → variants like odluka donesena tada decision made then pass the test as well
      • prendere una decisione to_take a decision make a decision passes the test → variants like la decisione è stata presa the decision was made, la decisione, che è stata presa the decision which was made, prendendo la decisione taking the decision pass the test as well
      • een beslissing nemen to make a decision passes the test → variants like de beslissing werd genomen the decision was made, de beslissing, die genomen werd the decision which was made, het nemen van de beslissing the making of the decision pass the test as well
      • zbijać bąki to smash fartsto fool around, to do nothing useful passes the test → variants like zbijanie bąków farts smashingfooling around, doing nothing useful, zbijający bąki smashing farts pass the test as well
      • tomar uma decisão make a decision passes the test → variants like a decisão que foi tomada the decision which was made, decisão tomada decision made pass the test as well
      • a lua o decizie make a decision passes the test → variants like decizia care a fost luată the decision which was made, luarea deciziei decision-making pass the test as well
      • zlomiti komu srce to break someone's heart to hurt someone's feelings bad passes the test → variants like srca, ki jih je zlomil hearts which he has broken (people's) feelings which he hurt bad, lomljenje src breaking (people's) hearts hurting (people's) feelings and nedavno zlomljeno srce recently broken heart pass the test as well
      • marr një vendim take a decision to make a decision variants like vendimi që u mor (the decision that was made), marrja e vendimit (decision-making) pass the test as well.
      • донети одлуку doneti odluku to bring a decision to make a decision passes the test → variants like одлука је донета odluka je doneta a decision has been made and доношење одлука donošenje odluka decision making pass the test as well

      Test S.2 - [1DEP] - Single dependent

      Does the VMWE contain exactly one lexicalized (functional) syntactic dependent d of the head verb v?

      • Apply the VID-specific tests
        • لسانهالقطأكل the cat ate his tongueused to talk about someone who was known to talk a lot, then suddenly we see him silenttwo dependents,لسانه his tongue and القط the cat
        • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced persontwo dependents, на стар краставичар (PP) and краставици (NP)
          прочитам от корица до корица to read from cover to covertwo dependents, от корица (PP) and до корица (PP)
          правя (нечий) живот черен make someone'l life black to ruin someone's lifetwo dependents, (нечий) живот (NP) and черен (small clause)
        • die Katze aus dem Sack lassen to let the cat out of the bag → two dependents die Katze and aus dem Sack
        • κάνω την καρδιά μου πέτραkano tin kardia mu petra make the heart mine stone two dependents, την καρδιά and πέτρα
          δίνω τόπο στην οργήdino topo stin orγi give place to anger to hold in one's anger two dependents, τόπο and στην οργή
        • to make ends meettwo dependents, ends and meet
          to let the cat out of the bagtwo dependents, the cat and out of the bag
        • dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting moretwo dependents, con la miel and en los labios
          dar gato por liebre to_give cat for hare to rip off, to take for a ridetwo dependents, gato and por liebre
        • odolkiak ordainetan eman black-puddings in-exchange give to do something as a response to something somebody has done to oneself (similar to 'what goes around comes around')
        • ići glavom kroz zid to go with head through the wall to be stubborn → two dependents glavom and kroz zid
        • mettere il carro davanti ai buoi to_put the cart in front of the oxen put the cart in front of the horse → two dependents carro and davanti ai buoi
        • pūst miglu acīs to blow mist into eyesto lie, to talk nonsense → two dependents, miglu and acīs
        • een kat in de zak kopen to buy a pig in a poke → two dependents kat and in de zak
        • chować głowę w piasek to hide head in sandto pretend not to see a problem → two dependents, głowę head and w piasek in sand
          bać się własnego cienia to fear SELF one's own shadowto be very timid → two dependents, się SELF and własnego cienia own shadow
        • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat → two dependents
        • a da bir cu fugițiito give tribute with fugitives theto disappeartwo dependents, bir and cu fugiții
          a- i ieși ochii din cap to his come out eyes the from head to starethree dependents, i, which is a non-RCLI, ochii, and din cap
        • skrivati glavo v pesekto hide head in sand to pretend not to see a problem → two dependents, glavahead and v pesekin sand
          vlečeš me za nosyou are pulling my nose you're pulling my leg → two dependents, meme and za nosmy nose
        • I hedh benzinë zjarrit I throw gasoline on the fire To make a situation worse (aggravate a problem) Two dependents: benzinë and zjarrit
        • ићи линијом мањег отпора ići linijom manjeg otpora go down the line of less resistanceto take the path of least resistance → two dependents, линијом linijom line and мањег отпора manjeg otpora less resistence
          продати рог за свећу prodati rog za sveću to sell a horn for a candle to deceive somebody on purpose → two dependents, рог rog horn and za sveću за свећу for a candle
        • att sätta sig upp mot någon to sit oneslef up against someone To defy someonetwo dependents, sig and upp
      • Continue to the next test
        • مثلاً ضرب hit an example to give examlpe the single dependent is a noun phrase,مثلاًexample
        • ритам камбаната kick the bell to diethe single dependent is a noun phrase, камбаната
          ставам на кайма turn into mince to be destroyedthe single dependent is a prepositional phrase, на кайма
          одирам жив skin alive to make someone sufferthe single dependent is an small clause (adjective), жив
        • eine Fratze ziehen a grimace pull to make a face → the single dependent is a noun phrase, Fratze
          , in Betracht ziehen to take into consideration → the single dependent is a prepositional phrase, in Betracht
          er gibt auf he gives up → the single dependent is a particle auf
        • παίρνω σκληρά μέτραperno sklira metra take hard measures take strict measures → the single dependent is a noun phrase, μέτραthe single dependent is a noun phrase
          φέρω βαρέωςfero vareos bring heavily to resent the single dependent is an adverb, βαρέως
        • to make a facethe single dependent is a noun phrase, face
          to take into accountthe single dependent is a prepositional phrase, into account
          to take turnsthe single dependent is a noun, turns
          to give upthe single dependent is a particle, up
        • hacer muecas to_make grimmaces to make facesthe single dependent is a noun phrase, muecas
          tener en cuenta to_have in account to take into accountthe single dependent is a prepositional phrase, en cuenta
        • min eman pain give to hurt (somebody)the single dependent is a noun phrase, min
          kontuan hartu into-account take to take into accountthe single dependent is a noun phrase with a postpositional suffix, kontuan
        • bain triail get trial trythe single dependent is a noun, éirigh as rise out of quitthe single dependent is a particle
        • περὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → the single dependent is a prepositional phrase
          τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → the single dependent is an NP
        • imati osjećaj to have a feeling → the single dependent is a noun, osjećaj
        • fare le linguacce to_make the grimaces to make a face → the single dependent is a noun phrase linguacce
          prendere in considerazione to take into consideration → the single dependent is a prepositional phrase, in considerazione
          egli lo fa fuori he kills him → the single dependent is a particle fuori
        • atstiept kājasto stretch one's legsto diethe single dependent is a noun phrase, kājas
        • opgeven to give up → the single dependent is a particle, op
        • bić na alarm to strike on alarmto raise the alarm → the single dependent is a prepositional phrase, na alarm on alarm
          cholera wie cholera knowsI have no idea→ the single dependent is the nominal subject cholera
        • cometer um crime to commit a crime → one dependent
        • a face fațăto make faceto to deal withthe single dependent is a noun phrase, față
          a ieși înaintethe single dependent is an adverb, înainte
        • gre za it is about → the single dependent is a particle, za
          smejati se to laugh → the single dependent is a reflexive clitic, se
          imeti mačka to have a hangover → the single dependent is a noun, maček
        • hedh poshtë Throw down To reject or dismiss the single dependent: poshtë (adverb)
        • ићи као алва ići kao alva go like halva to sell well → the single dependent is a prepositional phrase, као алва kao alva as halva
          језик прегризао bite off your tonguedo not foresee bad things→ the single dependent is the NP језик jezik tongue
        • att ge upp to give up → the single dependent i s the particle upp

      The test covers only lexicalized dependents. There may be other, non-lexicalized dependents, which the test ignores. We explicitly call the non-verbal elements dependents instead of arguments or complements because argument-adjunct distinction is irrelevant. The outcome of the test is positive if the verb has a single lexicalized dependent, which can be the subject, the direct or indirect object, but also an adverbial complement, adverb, particle, relative clause, etc.

      Test S.3 - [LEX-SUBJ] - Lexicalized subject

      Is the single lexicalized (functional) syntactic dependent d of the head verb v its subject?

      • Apply the VID-specific tests
        • أوزارها الحرب وضعت the war put its weights the war is over الحرب is the subject of وضعت
        • чашата преля the glass overflowed this is the last straw чашата is the subject of преля
        • ein kleines Vöglein hat mir gezwitschert a little bird told me
        • μου είπε ένα πουλάκιmu ipe ena pulaki me told a little-bird a little bird told me → a little bird is the subject of told
        • a little bird told someone a little bird is the subject of told
        • ha llegado tu hora has arrived your time your time has come tu hora is the subject of ha llegado
          me lo ha dicho un pajarito it to_me has told a little_bird a little bird has told me un pajarito is the subject of ha dicho
        • txoritxo batek esan txoritxo batek is the subject of esan
        • ptičica mi je šapnula a little bird whispered to me ptičica is the subject of šapnula
        • me lo ha detto l'uccellino a little bird told me l'uccellino is the subject of ha detto
        • galva kūp the head is steamingto do something with great mental effort
        • boontje komt om zijn loontje he that mischief hatches, mischief catches
        • licho wie devil knowsI have no idea
        • a sua hora chegou your time has arrived your time has come
          um passarinho me contou que ... a little-bird me.DAT told that ... little bird told me that...
        • a șoptit o păsăricăwhispered a bird little a little bird told someone
        • srce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something srce heart is the subject of pade falls , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky sekira hatchet is the subject of pade falls
        • zuri koka My head caught me I got a headache Koka (head) is the single lexicalized dependent, functioning as the subject of the verb zuri (caught).
        • иде некоме карта ide nekome karta the card goes for someone to have luck карта is the subject of иде
          пасти некоме камен са срца pasti nekome kamen sa srca a stone falls from one's hearth to feel relieved карта is the subject of пасти
      • Continue to the next test
        • زيارة ب قام he did with visit to make a visitزيارة is the object of قام
        • обичам чашката love the glass to be an alcoholic
          вземам назаем take in loan to borrow
          намирам се find SELF to be situated
        • κάνω μια ευχήkano mia efchi do a wish to make a wish μία ευχή is the object of είπε
        • to make a wish a wish is the object of make
        • pedir un deseo to_ask a wish to make a wish un deseo is the object of pedir
        • hitz eman hitz is the object of eman
        • λόγοις χράομαιlogois khraomai word.DAT use.1SG I speak λόγοις is the object of χράομαι
        • napraviti prekršaj to make an offense prekršaj is the object of napraviti
        • dare spettacolo to_make a scene spettacolo is the object of dare
        • een toespraak houdentoespraak is the object of houden
        • bać się fear SELFto be afraid
          chodzić prostą drogą to go (on) a straight road.INST to avoid complications
          zacznać od zera to start from zero to start from scratch
        • plouă cu găleata rains with bucket-the It rains heavily cu găleata is the adverbial of plouă
        • imeti glavo na ramenih to have head on shoulders to be sensible glava head is the object of imeti have
        • marr hua take loan to borrow hua (loan) is the single lexicalized dependent, functioning as the object of the verb marr (take).
        • тврдити пазар tvrditi pazar to secure shopping to pretend not to be interested in order to gain more пазар is the object of тврдити
          обрати бостан obrati bostan to pick melon to be ruined бостан is the object of обрати

      This test captures the fact that VMWEs with lexicalized subjects always belong to the VID category. Note that the test should be applied to the neutral form of a VMWE. This is required because there may be no verb or the verb may not be the syntactic head in a non-neutral variant.

      Test S.4 - [CATEG] - Category of the dependent

      What is the morphosyntactic category of the (functional) dependent d that co-occurs with the head verb v?

      • Reflexive clitic - apply IRV tests. If the outcome is negative, discard the VMWE candidate.
        • Arabic does not have IRV expressions
        • страхувам се fear myself.REFL to be afraid
          радвам се feel joy myself.REFL to feel joy
        • sich wundern to wonder, sichschämen to be ashamed
        • . Modern Greek does not have IRV expressions
        • help yourself to the apples
          I found myself in a difficult situation
        • suicidarse to suicide, quejarse to complain
        • n.a.
        • se suicider to suicide, s'évanouir to faint
        • This category does not apply to Ancient Greek.
        • čuditi se to wonder, penjati se to climb
        • suicidarsi to suicide, vergognarsi to be ashamed
        • zich vergissen to be mistaken, zich schamen to be ashamed
        • bać się fear SELFto be afraid
        • suicidar-se to suicide, queixar-se to complain
        • a se sinucide to commit suicide with obligatory ACC reflexive clitic
          a se holba to stare with obligatory ACC reflexive clitic
        • čuditi se to wonder, smejati se to laugh, onesvestiti se to faint
        • mërzitem bore myself get bored kujtohem remember myself remember
        • знојити се znojiti se sweat SELFto sweat
          откравити се otkraviti se to melt SELFto relax, to cheer up
      • Particle (as opposed to an adposition) - apply IVPC tests. If the outcome is negative, discard VMWE candidate.
        • Bulgarian does not have VPC expressions
        • anfangento begin, er fängt anhe begins, er hat angefangen he has begun → in German, VPCs may occur separated or within one word, we annotate all occurrences!
          ich schlage vor I propose
        • παίρνω μπροςperno bros take forward to get started
        • to give up, to look forward to
        • n.a.
        • n.a.
        • This category does not apply to Ancient Greek.
        • biti na to be onto to look like
        • far fuori to_make out to kill, lo fa fuorihe kills him , lo ha fatto fuori he killed him
        • aanvangento begin, iets vangt aansth begins → in Dutch, VPCs may occur separated or within one word, we annotate all occurrences!
          ik stel voor I propose
        • Polish does not have IVPC expressions
        • jogar fora to-throw outside to discard, throw away
        • Romanian does not have VPC expressions
        • n.a.
        • Albanian does not have VPC expressions.
        • n.a.
      • Verb with no lexicalized dependent - apply MVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • не искам и да чуя don't want to even hear to oppose strongly и да чуя is a VP
        • will sagen want to say that is to say
        • έχω να κάνωhave to doconcern
        • to let go
          to make do
        • querer decir to_want to_say to mean
        • n.a.
        • laisser tomber let fall to give up
          vouloir dire want say to mean
        • τυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to go
        • pustiti koga živjeti to let someone live not to bother someone, znati raditi to know to work to be capable
        • lasciar andare to_let go to unhand
          voler dire want say to mean
        • wil zeggen want to say that is to say
        • dać komuś żyćto let someone livenot to bother someone
          można wytrzymaćone can standthe situatiion is reasonably good
        • querer dizer want say to mean
          ouvir falar hear speak to know/remember vaguely
        • n.a.
        • n.a.
        • може бити može biti can beit is possible though unlikely
      • Adposition (preposition or postposition, as opposed to a particle) - in step 3 of the annotation process adpositions are not annotated unless they introduce a lexicalized dependent. Adpositions are covered optionally and experimentally in the post-annotation step (step 4), following the inherently adpositional verb (IAV) guidelines.
        • разчитам на to rely on
          излизам със to come out with
        • . Modern Greek does not have IAV expressions
        • to come across
          to rely on
        • confiar en to_trust in to trust in entender de to_understand of to know about
        • n.a.
        • This category does not apply to Ancient Greek.
        • izlaziti s kim to go out with someone
        • confidare su to_trust in to trust in intendersi di to_understand of to know about
        • behoren tot to belong to
        • conta pe count on
        • n.a.
      • Extended nominal phrase (possibly including modifiers, prepositions, postpositions or case markers) - apply LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • زيارة ب قام make a visit ب زيارة is a noun phrase composed of preposition and a noun
        • ритам камбаната kick the bell to dieкамбаната is a noun phrase composed of a single noun
          давам зелена светлина give green light to allowзелена светлина is a noun phrase composed of an adjective and a noun
          ставам на кайма turn into mince to be destroyedна кайма is a prepositional phrase composed of a preposition governing a noun
        • die Nase rümpfen the nose wrinkle turn up one's nose at sth. die Nase is a noun phrase composed of a determiner and a noun
          in Kraft treten into
        • κάνω μία ευχήkano mia efchi make a wish to make a wish μία ευχή is a noun phrase composed of a determiner and a noun
          δίνω εξηγήσειςdino exigisis give explanations to explain εξηγήσεις is a noun phrase composed of a single plural noun
        • to make a wish a wish is a noun phrase composed of a determiner and a noun
          to take turns turns is a noun phrase composed of a single plural noun
        • pedir un deseo un deseo is a noun phrase composed of a determiner and a noun
          entrar en vigoren vigor is a prepositional phrase composed of a preposition and a noun
        • kontuan hartu into-account take to take into accountthe NP, kontuan, is composed of a noun (kontu), a determiner (a) and a postposition (-n)
          urratsak egin steps do to take stepsthe NP, urratsak, is composed of a single plural noun (urrats+ak)
        • τὴν ἴσην χάριν αποδίδωμιtēn isēn kharin apodidо̄mi the same gratitude.ACC give.1SG I show the same gratitude → τὴν ἴσην χάριν is an NP composed of a DP and an adjective
        • doći do zaključkato come to conlusion, to concludedo zaključka in doubt is a prepositional phrase composed of a preposition governing a noun
        • prendere in considerazione take into account in considerazione is a prepositional phrase composed of a preposition and a noun
          rompere il silenzio to break the silence il silenzio is a noun phrase composed of an article and singular noun
          mettere radici radici is a noun phrase composed of a single plural noun
        • een wandeling maken to take a walk een wandeling is a noun phrase composed of a determiner and a noun
          te koop zetten to put for sale te koop is an extended noun phrase composed of a preposition and a noun
          in aanmerking komen in comment come to qualify in aanmerking is an extended noun phrase composed of a preposition and a noun
        • podjąć decyzjęto take a decisiondecyzję decision is a nominal phrase composed of a single noun
          chodzić prostą drogą to go (on) a straight road.INST to avoid complications prostą drogą(on)a straight road is a noun phrase composed of an adjective and a noun in (instrumental)
          bujać w obłokach to swing in the cloudsto fantasizew obłokach in the clouds is a prepositinal phrase composed of a preposition and a noun
        • tomar banho to take a shower banho is a noun phrase composed of a single noun
        • a rupe tăcerea to break silence the to start talking tăcerea is a noun phrase composed composed of a single noun
          a face baie to do bathto take a shower baie is a noun phrase composed of a single noun
        • biti v dvomih to be in doubts to doubtv dvomih in doubts is a prepositional phrase composed of a preposition governing a noun, klicati jelene to call cerfs to vomit jeleni cerfs is a noun phrase composed of a single plural noun
        • узети маха узети маха to take swing/moment to spreadмаха maha swing/moment is a nominal phrase composed of a single noun
          дати часну реч dati časnu reč to give an honorable word to promose firmly часну реч časnu reč (honorable word is a noun phrase composed of an adjective and a noun in (accusative)
          пасти на ум некоме pasti na um nekome to drop on one's mind to get an ideaна ум na um on mind is a prepositinal phrase composed of a preposition and a noun
      • (Hindi-specific) Adjective which is morphologically identical to an eventive noun: Apply the LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • Adjective: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • излизам сух от водата to come out dry from the water to avoid taking responsibility
          одирам жив skin alive to make somone suffer
          гоня дивото chase the wild.ADJ to take risks дивото is a substantive
        • rot sehen to see red
        • τα βάφω μαύρα them-NE.PL.ACC paint-1.SG black-NE.PL.ACC be very sad
        • to stand firm, to see red
        • me las vi negras me the saw black I saw myself in trouble
          ponerse negro put.self black to get/become irritated
          poner verde put green to criticise (someone)
        • zuriak eta beltzak aditu white and black hear to hear all sorts of things
        • voir rouge to see red to be very angry
        • ostati svoj to stay one's own to be consistent
        • vedere nero to see black
        • blauw zien van de kou to be blue/perished with the cold
          zwartrijden black drive to take a ride without a ticket
        • zrobić swojeto do one's ownto do what one is supposed to do
        • pensar grande to think big
        • a vedea roșu to see red
          a o face lată to CL.ACC make wideto party
        • narediti svojeto do one's ownto do what one is supposed to do
        • бити зелен biti zelen to be greento be young, unexperienced
      • Adverb: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • изваждам наяве take out in the open to uncover
          хващам натясно catch in a tight place to coerce, to pressure
        • φέρω βαρέωςfero vareos bring heavily to resent
        • to get well
        • caer bien fall well to be liked by
        • alferrik galdu uselessly get-lost to ruin, to spoil
        • καλῶς εἶχενkalо̄s eikhen beautifully have.IMPF.3SG he was well
        • dobroproći to go well to be successful
        • fare passi avanti to_make steps forward to make progress
        • beter worden to get well
        • chcieć dobrze to want wellto have good intentions
          robić komuś dobrze to do someone.DAT wellto please someone
          źle/marnie skończyć badly finishto come to a bad end
        • cair bem fall well to be appropriate
        • a se face bine to himself make well to get well
          a face bine to make well to help
        • obrniti se na bolje to turn for better to be better, iti predaleč to go to far to demand to much or to do something inappropriate
        • добро доћи dobro doći to come wellto be useful
          боље рећи bolje reći to say better to say in other words, more precisely
      • Pronoun: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • мързи ме (it feels) lazy me.ACC to be lazy
        • τα καταφέρνωta kataferno them achieve to make it
          την πατάωtin patao her step-on to fail
        • to make it
        • jugársela play.self.it to risk it
        • elkar hartu each-other take to get on with somebody, to agree
        • suarekin jolasean ibili with-fire playing be to play with fire
        • le faire it make to be enough/successful
        • farcela to make it to manage
        • het maken it make to be successful
        • No example found in Polish
        • dá-lhe João! give to him/her, João! show them what you got, João!
        • a o coti CL.ACC.F.3SG turn to turnwith the non-anaphoric feminine clitic 'o' functioning as an expletive
        • imeti ga pod kapo to have him under one's hat to be drunk, mahniti jo to hit her to start going (somewhere)
        • n.a.
      • Verb with lexicalized dependents including fully lexicalized clauses: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
        • السيف العذل سبق The sword preceded the blamesaid when someone do something without thinking and regret it
        • не мога да кажа две думи на кръст cannot say two words on a cross to not be able to speak or express oneself
          правя сам да си говори make someone talk to himself to drive someone crazy
        • ανοίγω τον ασκό του Αιόλουopen the bag of Aeolus open the bag of Aeolus to open the floodgates
          και οι τοίχοι έχουν αυτιάke i tichi echun aftia and walls have ears everyone might be listening
        • to make ends meet, to know on which side the bread is buttered

        • hacer de tripas corazón make of intestines heart to pluck up the courage
          dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
          dar gato por liebre to_give cat for hare to rip off, to take for a ride
        • n.a.
        • okretati se kako vjetar puše to turn how the wind blows to be inconsistent
        • sbarcare il lunario to_land the living to make ends meet
          non avere peli sulla lingua do not have hair on the tongue to be outspoken
        • lachen als een boer die kiespijn heeft laughing on the other side of his/her face/mouth
        • wiedzieć, co w trawie piszczy to know what in the grass squeaks to know what is going on, to be well informed
        • vedeti, koliko je ura to know what the time it is to realize the truth
        • знати у ком грму лежи зец знати у ком грму лежи зец I know in which bush the rabbit lies to know what is going on, to be well informed
      • Other: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.

      The aim of this test is to determine which category-specific identification tests should be applied. Note that the test should be applied to the neutral form of a VMWE candidate. This is required because there may be no verb or the verb may not be the syntactic head in non-neutral variant.


      Section 5.2

      Light verb constructions (LVC)

      Light verb constructions (LVC) constitute a universal category. We retain the following key characteristics:

      1. They are formed by a verb v and a (single or compound) noun n, which either directly depends on v (and possibly contains a case marker or a postposition), or is introduced by a preposition.
        In case of Hindi, the noun can be replaced by an adjective which is morphologically identical to an eventive noun. If you annotate Hindi, everywhere is this page when the noun is referred to, you should read the noun or the adjective.
        • إتخذ إجراء make action → verb+direct object noun
          قام بزيارة make a visit → verb+prepositional-object noun
          أدى التحية العسكرية do the military salutesalute →verb+ composed noun
        • вземам решение to make a decision
          държа под контрол to keep under control
        • zum Einsatz kommen to the use come to be called into action
          eine Rede halten a speech hold to give a speech
        • (OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)
        • παίρνω μία απόφαση perno mia apofasi make a decision to decide verb + direct-object noun
          δίνω στα νεύραdino sta nevra give to-the nerves cause to be nervous verb + prepositional-object noun
          έχω στην κατοχή μουecho stin katochi mu have.1SG to-the possession my to possess verb + prepositional-object noun
        • to give a lecture → verb + direct-object noun
          to come into bloom → verb + prepositional-object noun
          to make a high five → verb + compound noun
        • hacer una promesa make a promise to make a promise
          poner en peligro put in danger endanger, jeopardise→ verb + prepositional-object noun
          tener dolor de cabeza have pain of head to have a headache → verb + compound noun
        • lan egin work do to work, aurrera egin front-to do to go ahead
        • faire une présentation make a presentation → verb + direct-object noun
          procéder à une analyse proceed to an analysis to make an analysis → verb + prepositional-object noun
          faire un faux pas make a faux-pas → verb + compound noun
        • ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry
          τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish
        • stupiti na snagu step into force come into force
          držati predavanje to hold a speech to give a speech
        • chiamare in causa to_call in cause to single out
          fare una passeggiata to_make a walk to have a walk
        • een toespraak houden a speech hold to give a speech→ verb + direct-object noun
          in bloei staan in bloom stand to be in bloom→ verb + prepositional-object noun
        • odnieść sukces carry-away success to be successful
          mieć wyrzuty sumienia to have reproaches of conscience to blame oneself
          wykonać rzut karny to perform a penalty kick
        • fazer um aborto to make an abortion → verb + direct-object noun
          estar com fome be with hunger to be hungry → verb + prepositional-object noun
          fazer uma mesa redonda make a table round to have a round table (discussion) → verb + compound noun
        • a duce dorul to carry yearning.the to miss somebody
          a da divorț to give divorce to divorce
          a da în clocot to give in boil to come to the boil
          a da în fiert to give in boil to come to the boil
        • biti v dvomih to be in doubts → verb + prepositional-object noun, to doubt
          imeti predavanje to give a lecture → verb + direct-object noun
        • дати на знање dati na znanje give on knowledge to inform
          поднети жалбу podneti žalbu to submit an appeal to file a complaint
      2. The (single or compound) noun n is predicative and refers to an event (e.g. decision, visit) or a state (e.g. fear, courage). Predicative nouns are nouns that have semantic arguments, that is, they express predicates whose meaning is only fully specified by their semantic arguments:
        • قرار أخذ make a decision →noun refers to an event , there are 2 argument : a decider and decision
          كلمةألقى to give a word → noun refers to an event , there are 2 arguments : the talker and the speech
        • вземам решение to make a decision → noun refers to an act or event
          давам съгласие to give permission → noun refers to an act or event
          имам притеснения to have concerns → noun refers to a feeling or state
          имам готовност to be ready → noun refers to a feeling or state
        • eine Entscheidung treffen to make a decision → noun refers to an event
          Angst habento have fear→ noun refers to a state
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W)
        • παίρνω μία απόφασηperno mia apofasi take decision to decide → noun refers to an event
          κάνω βόλταkano volta make walk to walk → noun refers to an event
          έχω αγωνίαecho agonia have anxiety to be anxious → noun refers to a state
          κάνω κουράγιοkano kuragio make courage to be courageous → noun refers to a state
        • to make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
          to pay a visit → noun refers to an event, there are 2 arguments: a visitor and a visited place/person
          to have fear→ noun refers to a state, there are 2 arguments: somebody who is afraid and something frightening
          to have courage → noun refers to a state, there is 1 argument: the courageous person
        • dar un consejo give an advise to give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
          tener valor to have courage→ noun refers to a state, there is 1 argument: the courageous person
        • negar egin cry do to cry → noun refers to an act or event
          lo egin sleep do to sleep → noun refers to a state
        • donner un conseil give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
          avoir du courage to have courage→ noun refers to a state, there is 1 argument: the courageous person
        • μου εἰς τὴν γνώμην εἰσῄειmou eis tēn gnо̄mēn eisēei I.GEN into the opinion.ACC come.into.IMPF.3sg it came to my mind noun refers to a state
          ἐξέτασιν ποιέομαιexetasin poieomai inspection.ACC do.1SG I inspect noun refers to an event
        • donijeti odluku to bring a decision to make a decision → noun refers to an event
          imati osjećajto have feeling→ noun refers to a state
        • fare una domanda → noun refers to an event
          avere paura, avere coraggio → noun refers to a state
        • een beslissing nemen to make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
          moed hebben to have courage→ noun refers to a state, there is 1 argument: the courageous person
        • prowadzić rozmowy to lead conversations to lead negotiations→ the noun refers to an event
          mieć rację to have rightto be right→ the noun refers to a state
        • fazer uma prece to make a prayer → noun refers to an event, there are 2 arguments: the prayer and the thing she/he prays for
          ter sintomas to have symptoms → noun refers to a state, there are two arguments: the person having symptoms and the disease causing these symptoms
        • a lua o decizie to make a decision, a face o vizită to pay a visit→ noun refers to an event
          a avea curaj → noun refers to a state
        • biti v dvomih to be in doubts to have doubts → noun refers to a state
          imeti predavanje to give a lecture → noun refers to an event
        • kam frikë
          kam kurajë
        • донети одлуку doneti odluku to bring a decision to make a decision (to decide) → the noun refers to an event
          имати право imati pravo to have rightto be right→ the noun refers to a state
      3. We retain two sub-categories of verbs, which define two sub-categories of LVCs:
        • The verb v is "light" in that it contributes to the meaning of the whole only by bearing morphological features: person, number, tense, mood, as well as morphological aspect. This implies that v's syntactic subject is n's semantic argument. In this case, we annotate the construction as LVC.full.
          • نصيحةأسدى to weave an advice to give advice
            تاريخالصنع fabricate the history to make history
            إستراتيجية ال وضع put a strategy to make a strategy
          • давам изявление give a statement to make a statement
            нанасям щети spread damages to cause damages
          • (OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)
          • κάνω μία παρουσίασηkano mia parusiasi make presentation to present
            κάνω επίσκεψηkano episkepsi make visit to pay a visit, to visit
            παίρνω απόφασηperno apofasi take decision to decide
          • to make a presentation
            to pay a visit
            to have rights
            to have a headache
            to carry out a destruction
          • dar un paseo give a walk to go for a walk
            tener valor to have courage
            tener dolor de cabeza have pain of head to have a headache
          • faire une présentation to make a presentation
            faire une visite to make a visit
            avoir le droit to have the right
            avoir un mal de tête to have a headache
          • ἐλπίδα / ἐλπίδας ἔχωelpida / elpidas ekhо̄ hope.SG / hope.PL have.1SG I have hope(s)
          • napraviti pogrešku to make a mistake
          • fare una presentazione to make a presentation
            fare una visita to make a visit
            avere il diritto to have the right
            avere un mal di testa to have a headache
          • een presentatie geven to give a presentation
            een bezoek brengen to make a visit
            onder stress staan under stress stand to be stressed
          • odnieść sukces carry-away success to be successful
            mieć rację to have rightto be right
            cierpieć na anemię to suffer from anemia
          • realizar uma apresentação to make a presentation
            fazer uma visita to make a visit
            ter um direito to have a right
            ter dor de cabeça have pain of head to have a headache
          • a face o prezentareto make a presentation
            a face o vizită to pay a visit
          • imeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion, biti v pomoč to be in help to be helpful, delati razlike to make differences to differentiate
          • jap një shfaqje
            kam dhimbje koke
          • вршити претрес vršiti pretres to do a search to conduct a search
            имати право imati pravo to have rightto be right
        • The verb v is "causative" in that it indicates that the subject of v is the cause or source of the event or state expressed by n. In other words, the noun has semantic arguments expressed as non-subject elements in the sentence, and the subject of the verb brings an additional information, indicating the cause of source of the event/state. In this case, we annotate the construction as LVC.cause. These constructions are expected to be less idiomatic than other VMWEs and can be understood as complex predicates with a causal support verb.
          • حربالأعلن to declare war
            حقوق أعطى to give rights
            أملأعطىto give hope
          • давам възможност to give an opportunity
            нося късмет to bring luck
          • (OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)
          • δίνω ικανοποίησηdino ikanopiisi give satisfaction to satisfy
            προκαλώ καταστροφήcause distruction
            δίνω χαράdino chara give joy to make happy
          • to grant rights
            to give a headache
            to provoke a reaction
          • dar derecho to grant the right
            dar vértigo give vértigo to make dizzy
            causar un accidente to provoke an accident
          • donner le droit to grant the right
            donner le vertige give the vertigo to make dizzy
            provoquer un accident to provoke an accident
          • ἐλπίδα / ἐλπίδας παρέχωelpida / elpidas parekhо̄ hope.SG / hope.PL give.1SG I make hope(s)
          • dati mogućnost to give an opportunity
          • dare il diritto to grant the right
            dare le vertigini to_give the vertigo to make dizzy
            causare un incidente to provoke an accident
          • rechten verlenen to grant the right
            een ongeluk veroorzaken to provoke an accident
          • to sprawia nam kłopot this causes us trouble
            nakłada obowiązek na użytkowników put a duty on the users
            dać prawo to give the rightto grant the right
            narazić na straty expose to losses
            stawiać komuś celto put an aim to someone to set a goal to someone
          • dar o direito to grant the right
            dar tontura give vertigo to make dizzy
            provocar um acidente to provoke an accident
          • a da dureri de cap to give pains of head to give a headache
          • dati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)
          • provokoj një debat
            bëj aksident
          • изнети мишљење izneti mišljenje to take out one's opinion to state one's opinion
            задати главобољу zadati glavobolju to cause a headacheto give a headache

      The following decision tree should be applied to decide whether a candidate should be annotated as a LVC.full, LVC.cause or none.

      LVC-specific decision tree:

      • Apply test LVC.0 - [N-ABS: Is the noun abstract?]
        • It is not an LVC, exit
        • Apply test LVC.1 - [N-PRED: Is the noun predicative?]
          • It is not an LVC, exit
          • Apply test LVC.2 - [V-SUBJ-N-ARG: Is the subject of the verb a semantic argument of the noun?]
            • Apply test LVC.3 - [V-LIGHT: The verb only adds meaning expressed as morphological features?]
              • It is not an LVC, exit
              • Apply test LVC.4 - [V-REDUC: Can a verbless NP-reduction refer to the same event/state?]
                • It is not an LVC, exit
                • It is an LVC.full
            • Apply test LVC.5 - [V-SUBJ-N-CAUSE: Is the subject of the verb the cause of the noun?]
              • It is not an LVC, exit
              • It is an LVC.cause

      Note: test 10 [N-SEM] from the previous version of the guidelines (1.0) was considered unnecessary and has been abandoned in the current version of the guidelines.

      Note: LVC tests are often hard to apply. If you hesitate at some intermediary test, continue to the next one, since the last tests of LVC.full and LVC.cause will help you reach your final decision.

      Test LVC.0 - [N-ABS] Noun is abstract

      Is the noun n abstract?

      • continue to next test
        • ... قرار decision ، علم science ، أمل hope ، إجتماع meeting
        • проблем problem, възможност opportunity, изявление statement, план plan
        • (OEG) 𓈖𓂋𓃭𓅱 nr.w fear fear (PT § 302c-d, W)
        • απουσίαapusia absence
          θυμόςthimos anger
          αγάπηaγαpi love
          δυσκολίαδiskolia difficulty
          υπόσχεσηiposchesi promise
          παρουσίασηparusiasi presentation
          εμφάνισηemfanisi appearance
        • priority, anger, love, opinion, difficulty, speech, presentation, birth
        • paseo walk, derecho right, ilusión excitement, fe faith, duelo grief
        • pas step, édition edition, discours speech, explication explanation, lute fight
        • ὀργή orgē anger anger
          τιμωρίαtimо̄ria punishment punishment
          πίστιςpistis trust trust
        • problem problem, mogućnost opportunity, ideja idea
        • priorità priority, rabbia anger, amore love, opinione opinion, difficultà difficulty, discorso discourse, presentazione presentation,
        • 所有possession, 検討examination, 名誉会長honorary chairman
        • liefde love, mening opinion, strijd fight
        • kłopot problem, wysokość height, praca work, prawo right, zysk profit
        • prioridade priority, festa party, fé faith, nascimento birth, distinção distinction, problema problem, gol goal (soccer)
        • răspuns answer, prezentare presentation
        • dvom doubt, mnenje opinion, ime name, vloga role, odločitev decision
        • dëshirë, mendim, vështirësi, fjalim, përparësi, zemërim
        • мишљење mišljenje opinion, претрес pretres search, побуна pobuna rebellion, одлука odluka decision
      • it is not an LVC
        • طاولة table، ورقة paper، شخص person ، يد hand
        • правя торта to make a cake → a cake is a physical entity (not abstract)
          давам пари to give money → money is a physical entity (not abstract)
          подавам ръка to give out handto help in a difficult situation → hand is a physical entity (not abstract)
        • (OEG) 𓊹 nčr god god (PT 460a-b, W)
        • καρέκλα karekla chair , τραπέζι trapezi table , χέρι cheri hand , άνθρωπος anθropos human
        • chair, keyboard, hand, person
        • mesa table, silla chair, mano hand, foto picture,
        • aulki, teklatu, esku, pertsona
        • chaise chair, clavier keyboard, main hand, personne person
        • παῖςpais child child
          οἶκοςoikos house house
          ἀγορά agora market square market square
        • stol table, ruka hand, kruna crown
        • sedia chair, tastiera keyboard, mano hand, persona person
        • house, car, 家族family
        • stoel chair, hand hand, persoon person
        • złożyć kartkę to fold a sheet→ a sheet is a physical entity (not abstract)
          złożyć broń to lay down arms→ arms is a physical entity (not abstract)
          bić pianę to beat foamto exaggerate about a problem→ foam is a physical entity (not abstract)
          wystawić fakturę to issue a bill→ a bill is a physical entity (not abstract)
          mieć brata to have a brother→ a brother is a physical entity (not abstract)
        • cadeira chair, teclado keyboard, mão hand, pessoa person, pedra rock
        • scaun chair, pian piano
        • oseba person, mačka cat, kapa hat, avtomobil car, roka hand
        • karrige, tastierë, dorë, njeri
        • изнети јело izneti jelo to take out a dish→ a dish is a physical entity (not abstract)

      Some concrete nouns may be predicative (test LVC.1). For instance, a relational noun such as daughter is semantically incomplete without its argument: daughter of X, so daughter is predicative. However, concrete predicative nouns should not pass test LVC.0.

      Some nouns may have both concrete and abstract interpretations. For instance, money is concrete when it refers to banknotes (paper money, bills): I didn't have money so I paid by credit card. However, money is abstract when referring to a conventional value used in transactions between people: He spent a lot of money in the mall. If one cannot be sure that the noun is used in its concrete interpretation, test LVC.0 passes.

      Test LVC.1 - [N-PRED] Noun is predicative

      Does the noun n have at least one semantic argument, implying that it is a predicative noun?

      • continue to next test
        • إجتماععقد tie a meeting to lead a meeting→ event with 2 arguments the meeting and the person that organize the meeting
          حوار أجرىmake a dicussion→ event with 2 argument the discussion athe the person who contribute the discussion
        • поставям акцент to emphasize → event, with two arguments: the agent and the object being emphasized
          имам право → property, with one semantic argument: the possessor of the property
        • einen Besuch abstatten to pay a visit → event, with two arguments: the visitor and the visitee
          Angst haben to have fear → property with one semantic argument: the entity having fear
          einen Blick auf etwas werfen a glance at sth. throw to take a glance at sth → an event with two arguments the entity glancing and the entity glanced at
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → property with one semantic argument: the entities having fear.
        • κάνω μία επίσκεψη kano mia episkepsi to-make a visit pay a visit, visit → event, with two arguments: the visitor and the visitee
          έχω τη δυνατότητα echo ti δinatotita have.1SG the ability to be able → property, with two core semantic arguments: the entity having the ability and the object of the ability
          έχω μίσος echo misos have hate to hate → state, with two arguments: the entity being in the state of hating and the entity hated
          βγάζω λόγο vγazo loγo take-out.1SG speech to make a speech → event, with one obligatory argument: the entity making the speech
          παίρνω απόφασηperno apofasi take decision to decide event, with two arguments: the entity taking the decision and the decision
        • pay a visit → event, with two arguments: the visitor and the visitee
          have strength → property, with one semantic argument: the entity having strength
          take a glance at something → event, with two arguments: the entity glancing and the entity glanced at
          make a contribution → event, with two arguments: the contributor and the beneficiary (notice that contribution could refer to both the event and the thing being contributed, but we always prefer the former reading when possible)
        • hacer una visita make a visit to pay a visit → event, with two arguments: the visitor and the visitee
          tener valor to have courage → property, with one semantic argument: the entity having courage
          echar un vistazo a algo give a glance to something to take a quick look at something → event, with two arguments: the entity glancing and the entity glanced at
        • bisita egin visit do to pay a visit event with two arguments: the visitor and the visitee
          itxaropena ukan hope have to hope, to have hope event with one single argument: the person who hopes
        • avoir du courage to have courage→ state(property), with one argument: the entity having courage
        • προσέχω τὸν νοῦνprosekhо̄ ton noun hold.to.1SG the thought I pay attention (to sth/sb) → an event with two arguments the entity paying attention and the entity paid attention to
          ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry → property with one semantic argument: the entity being angry
        • imati osjećaj to have a feeling → property with one semantic argument: the entity having feeling
          otići u posjet to go to a visit to someone to pay a visit → event, with two arguments: the visitor and the visitee
        • fare una visita → event, with two arguments: the visitor and the visitee
          avere forza → property, with one semantic argument: the entity having strength
          dare uno sguardo a qualcosa → event, with two arguments: the entity glancing and the entity glanced at
        • 評価するevaluation.makeevaluate
          評価を得るevaluation.acc obtainobtain an evaluation
        • een bezoek brengen to pay a visit → event, with two arguments: the visitor and the visitee
        • złożyć wizytę to submit a visitto pay a visit→ event, with two arguments: the visitor and the visitee
          złożyć skargę to submit a complaintto make a complaint → event, with two arguments: the complaining person and the one he/she complains about
          mieć prawo to have the right→ state, with two arguments: the person having the right and the thing (s)he has the right to
          budzić zastrzeżenia to wake-up reservations to provoke reservations → state, with two arguments: the person having reservations and the object of the reservations
        • ter fome to have hunger to be hungry → property, with one argument: the entity that is hungry
          ter idade para fazer algo to have age (to do something) to be old enough (to do something) → state, with one argument: the entity that is old enough
          In PT, we consider that the following classes of predicative nouns pass the test: diseases (gripe, trombose, infarto), physical sensations (fome, sede, sono), emotions (medo, paixão, nojo), cognitive entities internal to the cognizer (ideia, opinião, preocupação), characteristics (coragem, teimosia, fraqueza), relations (contato, conflito, amizade) and nouns expressing communication or speech acts (conversa, discussão, briga, conselho).
        • a face o vizită to make a visit to pay a visit → event, with one argument: the entity that visits
          a avea curaj to have courage → property, with one semantic argument: the entity having courage
        • imeti predavanje to give a lecture → event, with two arguments: a lecturer and the people who are attending the lecture
        • jap një kontribut
          kam fuqi
          i hedh një shikim
        • поднети жалбу podneti žalbu to submit an appeal to file a complaint → event, with two arguments: the complaining person and the one he/she complains about
          имати право imati pravo to have the right → state, with two arguments: the person having the right and the thing (s)he has the right to
      • it is not an LVC
        • كتابه أحمد أعطى gave Ahmed his book Ahmed gave his book → the nounكتاب is a physical entity that does not pass test LVC.0, even though أحمد could be considered its semantic argument
          إعصارًا أحمدشهد Ahmed experienced a tornado→ the noun إعصارًا tornado is an event, but has no semantic arguments
        • Иван хвърли боклука Ivan threw out the garbage → physical entity (not event/state)
        • Joe macht einen Kuchen→physical entity (not event/state), even though Joe could be considered a semantic argument
        • (OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire (PT 376b, W)
        • Ο Γιάννης παίρνει τα ρούχα τουO Yanis perni ta rucha tu The John take.3SG the clothes his → the noun is a physical entity (not event/state) that does not pass test LVC.0
          Ο Γιάννης έχει ωραίο σπίτιO Γianis echi oreo spiti The John has nice house → the noun is a physical entity (not event/state) that does not pass test LVC.0
        • Joe makes a cake → the noun is a physical entity that does not pass test LVC.0, even though Joe could be considered its semantic argument
          Joe experienced a tornado → the noun is an event, but has no semantic arguments
          Joe has a lot of money → the noun is abstract and Joe could be considered its semantic argument, but we consider that money (as well as other goods such as car and bananas) can exist independently of a possessor, so the possessor (owner) should not be considered as semantic argument of money
        • Ana tiene una bicicleta Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
          Ana hace una foto Ana takes a picture → noun is not abstract, so it does not pass test LVC.0
        • pastela egin cake make to make a cake> → physical entity (not event/state)
        • Anna a un vélo Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
          Anna affronte la tempête Anna faces the storm → noun is abstract but has no arguments
        • ἔχει δύναμιν καὶ πεζὴν καὶ ἱππικην καὶ ναυτικήνekhei dunamin kai pezēn kai hippikēn kai nautikēn have.3SG force.ACC and on.foot.ACC and on.horseback.ACC and naval.ACC he has an (army force) on foot, on horseback, and at sea → the noun is a physical entity (not event/state)
        • Ivan ima olovku Ivan has a pencil → noun is not abstract, so it does not pass test LVC.0
        • Joe fa un dolce → physical entity (not event/state), even though Joe could be considered its semantic argument
          Joe ha vissuto un tornado → event, but has no semantic argument
        • Jan maakt een taart→physical entity (not event/state), even though Jan could be considered a semantic argument
        • przetrwać burzę to survive a storm burza storm has no semantic arguments although it is abstract
        • quebrar a cabeça to break one's head to rack one's brain → physical entity, does not pass test LVC.0
          In PT, we consider that the following classes of abstract nouns do not pass this test: informational content that do not require agents (informações, notícias), natural phenomena (chuva, neve, tornado).
        • Joe a făcut o prăjiturăJoe made a cake → physical entity (not event/state), even though Joe could be considered its semantic argument
        • Janez ima avto → the person that has a car could be considered as a semantic argument, but the car is not an event or a state
        • Joe bën një ëmbëlsirë
          Joe ka shumë para
        • преживети земљотрес preživeti zemljotres to survive the earthquake земљотрес zemljotres earthquake has no semantic erguments although it is abstract

      We only retain nouns n that have at least one semantic argument, which we define as a semantically mandatory and specific participant of the event or state expressed by the predicative noun.

      Sometimes, it might be useful to consider verbs and adjectives derivationally related to the noun to reason about its semantic arguments.

      Test LVC.2 - [N-SUBJ-N-ARG] Verb's subject is noun's semantic argument

      Is the subject of the verb a semantic argument of the noun? In other words, is the verb linking the predicative noun to one of its semantic arguments that occurs as the subject of the verb?

      • continue to next test
        • لصديقه نصيحة أحمد قدم gave Ahmed an advice to his friend Ahmed gave an advice to his friend أحمد Ahmed is the subject of the verb and a semantic argument (Advicer ) of the noun
        • Иван изнесе доклад Ivan presented a report → Иван is the subject of the verb and a semantic argument (agent) of the activity
          Президентът получи покана за посещение в Германия The president received an invitation to visit Germany → Президентът president is the subject of the verb and a semantic argument (the receiver) of the invitation
          Президентът получи награда Тhe president received an awardПрезидентътpresident is the subject of the verb and a semantic argument (the receiver) of наградаaward
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → Unas is the subject of the verb and a semantic argument (the contender) of the noun.
        • ο Γιάννης έκανε μία παρουσίαση στο αφεντικό τουO Yanis ekane mia parusiasi sto afentiko tu The John made a presentation to-the boss his John made a presentation to his boss ο Γιάννης is the subject of the verb and a semantic argument (the presenter) of the noun παρουσίαση
          Ο Γιάννης πρόβαλε αντίσταση στις αρχέςo γianis provale antistasi stis arches The John presented resistance to the authorities John resisted to the authorities
        • John made a presentation to his boss → John is the subject of the verb and a semantic argument (the presenter) of the noun
        • María dio un paseo María went for a walk → María is the subject of the verb and a semantic argument (the walker) of the noun
        • Max fait une promenade Max takes a walk → Max is the subject of the verb and a semantic argument (the walker) of the noun
        • Κῦρος ἐξέτασιν ποιεῖται τῶν Ἑλλήνων καὶ τῶν βαρβάρωνKuros exetasin poieitai tо̄n Hellēnо̄n kai tо̄n barbarо̄n Cyrus inspection.ACC do.1SG the.GEN Greeks.GEN and the.GEN barbarians.GEN Cyrus inspected the Greeks and the barbarians
        • Helena je otišla u posjet prijateljici Helena payed a visit to a friend → Helena is the subject of the verb and a semantic argument (the visitor) of the visit
          Susjed jedobio dozvolu za gradnju Neighbour received a permission for construction → Neighbour is the subject of the verb and a semantic argument (the receiver) of the permission
        • 彼が聴衆から高い評価を受けた(こと)he.nom audience.source high evaluation.acc received (the fact)He received a high evaluation from the audience → The subject is the recipient of praise
          聴衆が彼を高く評価した(こと)audience.nom he.acc highly evaluation.made (the fact)The audience gave him high praise → The subject is a 'praiser'
        • Max maakte een wandeling Max takes a walk → Max is the subject of the verb and a semantic argument (the walker) of the noun
        • Jan złożył wizytę Marii Jan payed a visit to Maria → Jan is the subject of the verb and a semantic argument (the visitor) of the visit
          Piotr dostał pozwolenie and budowę Piotr received a permission for construction → Piotr is the subject of the verb and a semantic argument (the receiver) of the permission
          Beata ma marzenia o spokoju Beata has dreams about peace → Beata is the subject of the verb and a semantic argument (the possessor) of the dreams
          wyborcy ponoszą za to winę the electorate bears the responsibility for this→ wyborcy electorate is the subject of the verb and a semantic argument (the agent) of the guilt
          ustawa budzi zastrzeżenia the law wakes-up reservationsthe law raises reservationsustawalaw is the subject of the verb and a semantic argument (the theme) of zatrzeżeniareservations
        • Felipe tomou dois banhos Felipe took two showers → Felipe is the subject of the verb and a semantic argument (the person taking a shower) of the noun
        • Ion i-a făcut o prezentare șefului său Ion made a presentation to his boss→ Ion is the subject of the verb and a semantic argument (the presenter) of the noun
        • In Janezovo predavanje o slovenski kulturi za študente prevajalstva, the 3 syntactic arguments are expressed as a modifier with a possessive marker (Janezovo Janez's) and prepositional phrases (o slovenski kulturi on Slovene culture and za študente prevajalstva for students of translating )
        • Бранко је добио постављење Branko je dobio postavljenje Branko was appointed to a position → Branko is the subject of the verb and a semantic argument of the appointment (receiver)
          Јелена је Бранку узвратила посету Jelena je Branku uzvratila posetu Jelena returned Branko's visit. → Jelena is the subject of the verb and a semantic argument of the visit (visitor)
      • Go to test LVC.5
        • خطاب المراسل ال قاطع The journalist has interrupted the speech المراسل The journalist that is , the subject of the verb, is not a semantic argument of خطاب the speech , since a speech does not necessarily have an interrupter
        • Приятелят на Мария прекъсна нейния доклад Maria's friend interrupted her report→ Maria's friend, that is, the subject of the verb, is not a semantic argument of the report, since a report does not necessarily have an interrupter
        • (OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → the passive verb form (w)ṭ(.w) is linking its subject (śnčr) with an adverbial argument (ḥr śč̣.t)
        • το αφεντικό του Γιάννη διέκοψε την παρουσίασή του John's boss interrupted his presentation → το αφεντικό του Γιάννη (John's boss), that is, the subject of the verb διέκοψε, is not a semantic argument of the noun predicate παρουσίαση presentation, since a presentation does not necessarily have an interrupter
        • John's boss interrupted his presentation → John's boss, that is, the subject of the verb, is not a semantic argument of the presentation, since a presentation does not necessarily have an interrupter
          The report provides information about the economy → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
        • El periodista interrumpió el discurso The journalist interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
          El informe facilita información clave the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
        • Le journaliste a interrompu le discours The journalist has interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
          Le rapport fournit des informations cruciales the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
        • ὁ δὲ ἐμπιμπλὰς ἁπάντων τὴν γνώμην ho de empimplas hapantо̄n tēn gnо̄mēn he satisfy.PTC all.GEN the.ACC expectation.ACC he, having satisfied everyone’s expectation → the subject of the verb (he) is not the subject of the noun (all)
        • Učenici su prekinuli le predavanjeStudents have interrupted the lecture → Students, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
        • 演奏が彼に聴衆の高い評価をもたらした(こと)performance.nom he.dat audience.gen high evaluation.acc brought (the fact)His play brought him a high evaluation from the audience
        • De journalist heeft de toespraak onderbroken The journalist has interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
        • Marek dał mi prawo wyboru Marek gave me the right to choose→ Marek is the subject of the verb and but not a semantic argument of the right (a right usually does not need to be grated)
          Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is the subject of the verb and but not a semantic argument of the confidence
          komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is the subject of the verb but not a semantic argument of wybory vote, which only requires the voters and the matter of the vote
        • O jornalista interrompeu a inauguração The journalist has interrupted the inauguration → The journalist, that is, the subject of the verb, is not a semantic argument of an inauguration, since an inauguration does not necessarily have an interrupter
          O relatório traz informações polêmicas the report provides polemic information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
        • To define a predavanje lecture one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a lecture implies the existence of its arguments.
        • Демонстранти су прекинули говор Demonstranti su prekinuli говор Protesters interrupted the speech→ Protesters are the subject of the verb but not a semantic argument of the speech (a speech does not necessarily have an interrupter)
          комисија је спровела гласање komisija je sprovela glasanje the committee carried out the voteкомисија komisija committee is the subject of the verb but not a semantic argument of гласање glasanje vote, which only requires the voters and the matter of the vote

      It is not always easy to determine if the verb's subject is an argument of the noun. You can use the former syntactic version of this test to verify your intuitions.

      Test LVC.3 - [V-LIGHT] Verb with light semantics

      Is v semantically light, that is, is the semantics that v adds to n restricted to: (i) what stems from its morphological features (e.g. future, plural, perfective aspect, etc.), (ii) pointing at the semantic role of n played by v's subject?

      • continue to next test
        • قرار أخذ take a decision أخذ makeadds no meaning to قرار decision besides that of performing an activity
          معروف قدم present a favor to give a favor قدم to give adds no meaning to معروف favorbesides that of performing activity
          زيارةبقام to do a visit to pay a visit قام to do adds no meaning to visit زيارة besides that of performing an activity
        • вземам решение make a decision вземам adds no meaning to решение decision besides that of performing an act
          държа реч to make a speech държа adds no meaning to реч besides that of performing an act
          поемам отговорност to take responsibility поемам adds no meaning to отговорност besides that of having a property
        • eine Entscheidung treffen a decision meet to make a decision treffen adds no meaning to Entscheidung besides that of performing an activity
          Angst haben to have fear haben adds no meaning to Angst besides that of having a property.
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → (w)ṭ.n adds no meaning to fear (nr.w) besides that of performing an action.
        • κάνω μία βόλτα take a walkκάνωmake adds no meaning to βόλτα walkbesides that of performing an activity
          παίρνω μία απόφαση παίρνω take adds no meaning to απόφαση decision besides that of performing an activity
          δίνω μία απάντηση δίνω give adds no meaning to the noun απάντηση besides that of performing an activity
          διενεργώ έλεγχο perform a check διενεργώ perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
          διαπράττω ένα έγκλημα διαπράττω commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
          ασκώ δριμεία κριτική ασκώ commit adds no meaning to the noun κριτική besides that of performing a cognitive activity
          νιώθω πολύ άγχος νιώθω feel adds no meaning to άγχος besides that of being in a mental state
          έχω άγχος have anxiety έχω have adds no meaning to άγχος anxiety besides that of being in a mental state
          προβαίνω σε καταγγελία to make a complaint, to complaint προβαίνω make adds no meaning to καταγγελία complaint besides that of performing an activity
        • take a walk take adds no meaning to walk besides that of performing an activity
          make a decision make adds no meaning to decision besides that of performing an activity
          have fear have adds no meaning to fear besides that of having a property
          perform a check perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
          commit a crime commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
          pay a visit → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
          deliver a speech → the verb in its usual sense means 'to move from one place to another', but here it is not used in this sense and does not add any semantics to the "speech" event
          undergo a surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
        • dar un paseo to take a walk dar adds no meaning to paseo besides that of performing an activity
          tomar una decisión to make a decisiontomar adds no meaning to decisión besides that of performing an activity
          tener miedo to have fear tener adds no meaning to miedo besides that of having a property
        • usain egin smell do to smell, to sniffthe verb egin adds no meaning to the noun usain besides that of performing an activity
          lo egin sleep do to sleepthe verb egin adds no meaning to the noun lo besides that of performing an activity
        • ils ont du courage they have some courage have adds no meaning to courage besides that of having a property
          ils reçoivent l’ordre de partir they receive the order of leavingthey are ordered to leave receive adds no meaning to order besides indicating that the subject is the recepient of the order
          il a subi une intervention chirurgicale he has undergone an intervention surgery he underwent surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
        • γνώμην ἔχεινgnо̄mēn ekhein opinion.ACC have.INF to have an opinion → ἔχειν adds no meaning to γνώμην besides that of having a property.
          τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → ποιέομαι adds no meaning to τιμωρίαν besides that of performing an activity
        • imati hrabrost to have courage imati have adds no meaning to hrabrost courage besides that of having a property
          donijeti odluku to make a decision donijeti in its usual sense means 'to bring', but here it is not used in this sense and does not add any semantics to event
        • fareuna passeggiata fare adds no meaning to passeggiata besides that of performing an activity
          prendere una decisione prendere adds no meaning to decisione besides that of performing an activity
          avere paura avere adds no meaning to paura besides that of having a property
          eseguire un controllo eseguire is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
          commettere un crimine commettere is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
          fare una visita → the verb in its usual sense means 'make', but here it is not used in this sense and does not add any semantics to the "visiting" event
          fare un discorso → the verb in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to the "speech" event
        • 子が親に愛情を持つ child.nom parent.dat affection.acc have The child has affection for his parent(s)持つ does not add meaning to 愛情 besides that of having a property
        • een beslissing nemen a decision take to make a decision nemen adds no meaning to beslissing besides that of performing an activity
          een wandeling maken to take a walk maken adds no meaning to wandeling besides that of performing an activity
          schrik hebben to have fear hebben adds no meaning to schrik besides that of having a property
        • oddać hołd to give-back tributeto pay tribute oddać give-back adds no meaning to hołdtribute besides that of performing an activity
          wystąpić z wnioskiem to stand out with a proposal to put forward a motion wystąpić z stand out with adds no meaning to wniosekmotion besides that of performing an activity
        • mover uma ação judicial to move a lawsuit to sue to move adds no meaning to lawsuit besides that of performing an activity
          apresentar uma lesão present a lesion to have a lesion to present adds no meaning to lesion besides that of having a property
          estar com medo be with fear to be afraid to be with adds no meaning to fear besides that of being in a state
        • a avea curaj to have courage avea adds no meaning to curaj besides that of thaving a property
          a lua o decizieto make a decision lua adds no meaning to decizie besides that of performing an activity
        • Janez ima predavanje Janez lectures → Janez is the subject of the verb and a semantic argument of the noun (the lecturer)
        • одати почаст odati počast give away tributeto commemorate/pay tribute одати odati give away adds no meaning to почаст počast tribute besides that of performing an activity
          изрећи казну izreći kaznu to pronounce a sentence изрећи izreći to pronounce adds no meaning to казну kaznu sentence besides that of performing an activity
      • it is not an LVC
        • إنتباه شد grab attention to get attention شد to grab / to attract indicates that the attention starts
        • започвам играта start the game, start playing започвам start adds an aspectual meaning to the noun
        • eine Rede beginnen to begin a speech beginnen adds an aspectual meaning to the noun Rede
        • (OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → (w)ṭ(.w) expresses the action of setting incense.
        • ξεκινάω μία προσπάθειαxekinao mia prospaθια start a trial → ξεκινάω adds an aspectual meaning to the noun
        • to start a walk start adds an aspectual meaning to the noun
        • comenzar un discurso to begin a speech comenzar adds an aspectual meaning to the noun discurso
        • oinez hasi foot-by start to start walkingthe verb hasi adds an aspectual meaning to the noun
        • donner du courage to give courage donner indicates the source of the courage (this would not pass test LVC.2)
          donner son avis to give one's opiniondonner adds the information that the opinion is communicated
          Ce fait attire l'attention de la justice This fact attracts the attention of the justice attirer indicates the attention starts
        • ἄρχειν τοῦ λόγουarkhein tou logou start the speech to begin speaking → ἄρχειν adds an aspectual meaning to the noun λόγου
          πολέμου παύσασθαιpolemou pausasthai war end to stop fighting → παύσασθαι adds an aspectual meaning to the noun πολέμου
        • početi igru start the game početi start adds an aspectual meaning to the noun
        • cominciare un ballo to start a dance cominciare adds an aspectual meaning to the noun ballo
        • 子が手に荷物を持つ child.nom hand.loc luggage.acc have The child holds luggage in his hand(s)持つ indicates the act of holding an object ; it alternates with other verbs of holding, such as 抱える
        • een toespraak beginnen to begin a speech beginnen adds an aspectual meaning to the noun toespraak
        • wymierzyć sprawiedliwośćto measure justiceto do justicewymierzyćmeasure adds an aspectual meaning to sprawiedliwośćjustice, this expression still passes VID tests
          przejść na emeryturęto cross to retirementto take retirementprzejść adds an inchoative (change-of-state) meaning to the noun
          dopełnić obowiązkuto fulfill one's dutydopełnićfulfill adds a fulfillment meaning to obowiązekduty
        • entrar com uma ação judicial to enter with a lawsuit to file a lawsuit to enter adds an aspectual meaning to the noun
          dar uma opinião to give an opinion to giveadds the meaning of communication which is not present in the name itself (one can ter uma opinião to have an opinion without ccommunicating it).
        • a începe muncato start work the to start working începe adds an aspectual meaning to the noun
        • Študent je prekinil njegovo predavanje The student has interrupted his lecture → The student, that is, the subject of the verb, is not a semantic argument of the lecture, since a lecture does not necessarily have an interrupter
        • отићи у пензију otići u penziju to leave to retirement to take retirement отићи otići adds an inchoative (change-of-state) meaning to the noun
          испунити дужност ispuniti dužnost to fulfill one's duty испунити ispuniti fulfill adds a fulfillment meaning to дужност dužnost duty

      Note that this light semantics of the verb is either usual for that verb (i.e. the verb is a pure syntactic operator, like commit, perform), or occurs in the context of the particular noun (e.g. for pay in to pay a visit). Both types of verbs pass the test.

      In our view of LVCs, we do not require a light verb to be "bleached", as it is sometimes described in the literature. We simply do not take into account the relation between the verb's use as a light verb and its other uses. While the specific meaning added by light verbs to the predicative nouns have been extensively studied and described (e.g. by Miriam Butt and Tafseer Ahmed), we do not adopt any fine-grained classification here. If you have a doubt about a verb's "lightness", proceed to the next test: if you can evoke the same event/state without using the verb, then it is considered light.

      Test LVC.4 - [V-REDUC] - Verb reduction

      Try to build an NP without the verb, in which v's subject s becomes n's dependent. You might need to test several prepositions (of, by, for, from), possessives (my, her, somebody's), postpositions, case markers, as long as you use no verb. Can this verbless NP refer to the same event or state as the candidate v+n construction does?

      • annotate as LVC.full
        • دورا يلعب أحمد Ahmed plays a role دور أحمد Ahmed's role
          تحقيق أحمد ب قام Ahmed made an inquiry تحقيق أحمد Ahmed's inquiry
        • Иван пое отговорност Ivan took responsibility отговорността на Иван — both refer to the same property/event
          Иван взе решение Ivan made a decision решението на Иван — both refer to the same property/event
        • Paul hat eine Rede gehalten Paul has given a speech Paul's speech both refer to the same speech event
          Ich habe ihm einen Besuch abgestattet I have paid him a visit mein Besuchmy visit both refer to the same visiting event
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W) → (*) nr.w ⸗f m ꞽb ⸗śn His fear (is) in their hearts — both refer to the same fearing event.
        • Ο Γιάννης έκανε μία παρουσίασηO Yanis ekane mia parusiasi John made a presentation John's presentation --> both refer to the same presenting event
          Η Μαρία έδωσε μία υπόσχεσηI Maria edose mia iposchesi Maria gave a promise Maria promised Η υπόσχεση της Μαρίας --> --> both refer to the same promising event
        • Paul had a walk Paul's walk — both refer to the same walking event
          I paid him a visit my visit to him — both refer to the same visiting event
          Hester gave birth to Pearl Pearl's birth to Hester — both refer to the same birthing event (note that the key criterion is that Hester, the subject of the verb, is a (prepositional) dependent of birth in the paraphrase)
          The party gave priority to senior members → the priority of senior members for the party — both refer to the same prioritization event
        • Pedro dio un paseo Pedro gave a walk Pedro took a walk el paseo de Pedro Pedro's walk— both refer to the same walking event
          El capitán da la orden de partir The captain gives the order to leave The general orders to leave la orden del capitán de partir The general's order to leave
        • Pellok bisita egin zidan → Pelloren bisita -- both refer to the same visiting event
        • Paul a fait une enquête Paul made an inquiryL'enquête de Paul Paul's inquiry
          Paul procède à une perquisition Paul makes a searchLa perquisition de/par Paul the search of/by Paul
          Le général donne l'ordre de partir The general gives the order to leave The general orders to leave l'ordre du général de partir The general's order to leave
          Les soldats reçoivent l'ordre de partir The soldiers receive the order to leave The soldiers are ordered to leavel'ordre aux soldats de partir The order to the soldiers to leave
          Jean souffre de troubles psychiques John suffers from psychic troubles Les troubles psychiques de Jean John's psychic troubles
          Jean présente une hypersensibilité John presents a hypersensibility John has a hypersensibilityl'hypersensibilité de Jean John's hypersensibility
          Paul reçoit des menaces de (la part de) Pierre Paul receives threats from (the part of) Peter Paul is threatened by Peterles menaces de Pierre à Paul Peter's threats to Paul
          Ce médicament présente un risque This medicine presents a risk This medicine poses a risk le risque de ce médicamentthis medicine's risk
          Ce fait attire l'attention de la justice This fact attracts the attention of the justice l'attention de la justice pour/sur ce fait the attention of the justice on/about this fact
        • Κῦρος ἐξέτασιν ποιεῖταιKuros exetasin poieitai Cyrus inspection.ACC do.3SG Cyrus inspected → ἐξέτασιν (τοῦ Κύρου) refers to the same event
        • Istraživač je donio zaključak The researcher made a conclusion njegov zaključak his conclusion both refer to the same event
        • Paolo ha fatto una conquistaPaul made a conquerla conquista di Paolo
          Il generale da l' ordinedi partire. The general gives the order to leaveThe general orders to leave L'ordine di/da parte del generale di partire
          Paolo riceve delle minacce da (parte di) Piero le minacce di Piero a Paolo
        • 聴衆が彼を高く評価する audience.nom he.acc highly evaluation.makeThe audience higly praised him 聴衆の彼の高い評価 the high evaluation of him by the audience
          子が親に愛情を持つ child.nom parent.dat affection.acc have The child has affection for his parent(s) 子の親への愛情 child.gen parent.dir.gen affection
        • Paul heeft een toespraak gehouden Paul has given a speech Paul's toespraak both refer to the same speech event
        • Obecni oddali hołd poległym The present gave-back tribute to the fallen The audience payed tribute to the fallenhołd obecnych the tribute of the audience
          Jan miał na myśli Marię Jan had on thought Maria Jan meant Mariamyśl JanaJan's thought
          Jan otrzymał wymówienieJan received a dismissalwymówienie dla Jana dismissal for Jan
          Inwestycja przynosi zyski the investment brings profitzyski z inwestycji profit from the investment
        • João cometeu um deslize o deslize do João — both refer to the same event
          O jogador cobrou um pênalti the player charged a penalty kick the player took a penalty kick o pênalti do jogador the player's penalty kick — both refer to the same event
          João tem consciência do perigo John has conscience of the danger John is aware of the danger a consciência do João sobre o perigo John's awareness of the danger — both refer to the same state
          João recebeu a remuneração John received the remuneration a remuneração do João John's remuneration — both refer to the same event
          O paciente recebeu a visita dos familiares The patient received the visit of the relatives a visita dos familiares ao paciente the visit of the relatives to the patient — both refer to the same event
          João apresenta lesões John presents lesions as lesões do João John's lesions — both refer to the same state
        • Paul a făcut o plimbarePaul had a walk plimbarea lui Paul Paul's walk — both refer to the same walking event
          i-am făcut o vizită I paid him a visit vizita mea — both refer to the same visiting event
        • imeti dvome to have doubts to doubt imeti have adds no meaning to dvomi doubts besides that of having a property
          delati razlike to make differences to differentiate delati in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to event
        • Професор држи предавање Profesor drži predavanje The professor is holding a lectureпрофесорово предавање profesorovo predavanje The professor's lecture
          Овај лек представља ризик Ovaj lek predstavlja rizik this drug presents a risk this drug poses a risk ризик од овог лека rizik od ovog leka risk of this drug this drug's risk
      • it is not an LVC
        • في عام 2001 النور رأى أفاد التقرير بأن برنامج الصحة The report states that the Health Programme saw the light in 2001 The report states that the Health Programme began with its current components in 2001 نور برنامج الصحة# the light of health program — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP ( نور برنامج الصحة the light of health program ) fails to refer to the original event ( رأى برنامج الصحة النور ) the health program saw the light ( started )
        • Иван хвърли поглед на вестника Ivan threw a glance at the newspaper #погледът на Иван върху вестника — different semantics; and requires a different preposition
        • Paul hat einen guten Eindruck gemachtPaul has made a good impression #Paul's Eindruck auf seine Freunde Paul's impression on his friends has a different semantics
        • (OEG) 𓂧𓈖 𓃹𓈖𓇋𓋴 𓌴𓐙𓂝𓏏 (w)ṭ.n Wnꞽś mꜣꜥ.t Unas set Right Unas set Right (PT 265c, W) → (*) mꜣꜥ.t Wnꞽś 'Unas's Right' fails to refer to the original event (Unas set Right).
        • ο Παύλος πήρε νέα από τον αδερφό του O Pavlos pire nea apo ton aδerfo tu The Paul take.3PST news from his brother → #Τα νέα του Παύλου από τον αδερφό του Paul's news from his brother -- one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (τα νέα του Παύλου) fails to refer to the original event (Ο Παύλος πήρε νέα)
        • Paul got news from his brother #Paul's news from his brother — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Paul's news) fails to refer to the original event (Paul got news)
        • Juan recibió la noticia de su hermano Juan got the news from his brother #La noticia de Juan — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (la noticia de Juan) fails to refer to the original event (Juan recibió una noticia)
        • Hizlariak interesa piztu zuen Speaker interest switched-on The speaker awakened interest#Hizlariaren interesa, #the speaker's interest -- different semantics
        • Son comportement porte une atteinte grave à l'honneur des soldats His behaviour seriously jeopardises the soldiers' honnour #l'atteinte de son comportement the jeopardy of his behaviour
        • ἡ γυνὴ πίστιν ἔλαβεhē gunē pistin elabe the woman assurance get.AOR.3SG the woman got an assurance → πίστις τῆς γυναικός ‘the woman’s assurance’ fails to refer to the original event (the woman got an assurance)
        • Petar je dobio poruku od direktora Petar received message from his boss #Petar's news from his boss — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Petar's message) fails to refer to the original event (Petar received message)
        • Paul kreeg nieuws van zijn broer Paul got news from his brother #Pauls nieuws van zijn broer — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Pauls nieuws) fails to refer to the original event (Paul kreeg nieuws)
        • Michael Phelps pobił rekord sprzed 2 tysięcy latMichael Phelps broke the record from 2 thousand years ago→ #Michael Phelps' record
          Ulica nosi imię sławnego poety The street carries the forename of a famous poet The street carries the name of a famous poet.imię ulicy the forename of the street
          Adam jest tego samego zdania Adam is of the same opinion Adam has the same opinion #zdanie Adama Adam's opinion refers to the contents of his opinion, not to the fact of having an opinion
        • O jogador cobrou uma falta the player charged a foul the player took a free kick a falta do jogador the player's foul — the focus changes from taking a free kick to being one of the parts involved in a foul (it's a VID)
          O jogador provocou uma lesão the player provoked a lesion a lesão do jogador the player's lesion — In the reduced NP, the focus changes from hurting somebody else to getting hurt
          O músico apresenta suas composições the musician presents his compositions as composições do músico the musician's compositions — the reduced NP does not keep the sense of presenting, it is not refer to the same event as the verbal construction
        • Paul a făcut o impresie bunăPaul made a good impression #Impresia lui Paul despre soția sa Paul's impression on his wife— different semantics
        • to začeti predavanje to begin a lecture začeti to begin adds an aspectual meaning to the noun
        • Бранко је оборио рекорд у трци на 100 метара Branko je oborio rekord u trci na 100 metara Branko broke the record in 100m race→ #Бранков рекорд #Brankov rekord

      This test has a simple formulation but its application has some important subtleties which are central to our definition of the LVC.full category. The goal of this test is to keep only constructions in which the predicative noun is an event or state, excluding "gray-zone" predicates.

      First, if it is not possible to build an acceptable NP where the verb v's subject s becomes a dependent of the noun n, e.g. using any preposition, postposition and/or case marker, this means that the verb is not light, and the construction cannot be annotated as LVC.full. This may remove constructions in which there is control, that is, both the noun and the verb share the same subject. However, control is not sufficient to characterize an LVC.full. In other words, LVC.4 fails, the verb is not completely light, and you cannot annotate the construction as LVC.full, even if intuitively it resembles an LVC.full due to control:

      • العمل قرار أحمد أخذ Ahmed a pris une decision de travail قرار أحمد بالعمل the decision of Ahmed for work is unacceptable
      • Paul a l'air de dormir Paul has the air of to-sleep Paul seems to be sleeping *l'air de dormir de Paul is unacceptable
        Paul a eu l'occasion de dormir Paul has had the oportunity to sleep Paul had the oportunity to sleep *l'occasion de Paul de dormir is unacceptable
      • Zdravnik je postavil diagnozo The doctor made a diagnosis njegova diagnoza His diagnosis both refer to the same event
        Politik jedal napoved The politician made a forecast njegova napoved his forecast both refer to the same event

      Second, the fact that the NP is acceptable does not suffice to characterise an LVC.full. Furthermore, the NP version in which the verb was omitted, if acceptable, must evoke the same event or state as the LVC. Here are some tricky examples and some recommendations about how to interpret them:

      • جديدة اجراءت الشركة أخذت the company took new procedures → the NP الاجراءت الجديدة new procedures is ok, the "الاجراءت " "procedures " seem to refer to new procedures, so ok to annotated as LVC.full
      • Имам по-голям брат I have an elder brother моят брат my brother refers to one member of the relation, and not to the state of brotherhood between both actants
        отправих покана към приятелите си I sent an invitation to my friendsпокана invitation can be interpreted both as the act of inviting and as its contents; for the first reason we count this candidate as LVC.full
      • Η Μαρία έχει έναν αδελφό I Maria echi enan aδelfo Maria have.3SG a brother Ο αδελφός της Μαρίας is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
        Η Μαρία έστειλε ένα γράμμα Maria send.03.SG a letter Το γράμμα της Μαρίας refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
        Η Μαρία έχει την άποψηi maria echi tin apopsi Maria has the opinion Maria believes and more generally, cases of έχω + a noun refering to the state of having a mental content (άποψη, γνώμη, πεποίθηση) → η άποψη της Μαρίας is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
        Η Μαρία έδωσε την υπόσχεσηi maria eδose tin iposchesi the maria give.3.PST the promise Maria promised and more generally, cases of δίνω + a noun refering to a speech act (υπόσχεση, διαταγή, απάντηση, κατάθεση) → Η υπόσχεση της Μαρίας refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
        Η Μαρία πήρε μία απόφαση I maria pire mia apofasi The Maria take.03.PR a decision Maria decided απόφαση can refer to the deciding event (μία δύσκολη απόφαση) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
      • Mary has a brother Mary's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
        Mary sent a letter Mary's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
        Mary has an opinion and more generally, cases of have + a noun refering to the state of having a mental content (opinion, belief) → Mary's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
        Mary made a speech and more generally, cases of make + a noun refering to a speech act → Mary's speech refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
        Mary made a decision decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
      • María tiene un hermano María has a brother el hermano de María María's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
        María envió una carta María sent a letter La carta de María María's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
        María dio un discurso María made a speech and more generally, cases of dar + a noun refering to a speech act → el discurso de María refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
        María tomó una decisión María made a decision decisión decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
      • la compagnie a pris des mesures d'économie the company took some measures of savingthe company took cost-saving measures → the NP les mesures d'écononmie de la compagnie is ok, the semantic equivalence is difficult to judge, the "measures" seem to refer to cost-saving actions, so ok to annotated as LVC.full
      • εἶχε τὴν ἀδελφὴν Σιτάλκηςeikhe tēn adelphēn Sitalkēs have.3SG the sister.ACC Sitalkes Sitalkes had a sister → ἀδελφὴν is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
        οὐκ ἂν ἐπιστολὴν ἔπεμπονouk an epistolēn epempon not PRT letter.ACC send.3pl they would not have sent a letter → ἐπιστολήν refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
        τὴν γὰρ γνώμην εἶχεtēn gnо̄mēn eikhe the thus opinion have.3SG he thus held the opinion and more generally, cases of have + a noun referring to the state of having a mental content → γνώμην is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
        ὁ δὲ Σιτάλκης πρός τε τὸν Περδίκκαν λόγους ἐποιεῖτοho de Sitalkēs pros te ton Perdikkan logous epoieito the Sitalkes to also the Perdikkas speech.ACC do.3SG Sitalkes spoke to Perdikkas and more generally, cases of make + a noun refering to a speech act → λόγους refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
      • Marie neemt een beslissing decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
      • mam starszego brata I have an elder brother mój brat my brother refers to one member of the relation, and not to the state of brotherhood between both actants
        Maria wysłała wiadomość Maria sent a message wiadomość Marii Maria's message refers to the contants of the message sent by Maria, rather than to the sending event itself
        Maria jest zdania, że Mary has the opinon that... zdanie Marii Mary's opinion refers to the content of the opinion, and not to the state of having an opinion
        miał na celu awans He had promotion on the aim His aim was a promotion jego cel refers to the aim inself, and not to the state of having a aim
        ta partia w wyborach miała większość this party had a majority in the elections#większość tej partii the majority of the party provokes a considerable shift in meaning
        złożył zeznania na policji he gave testimony on the police officejego zeznania can be interpreted both as the act of testimony and as its contents; for the first reason we count this candidate as LVC.full
      • Mojca jedala Tini priložnost Mojca gave Tina an opportunity #Mojčina priložnost Mojca's opportunity has a different meaning; if the verb is removed, the original meaning is lost, so the verb is not light.
      • Марија је послала поруку Marija je poslala poruku Marija sent a messageМаријина порука Marijina poruka Mariјa's message refers to the contants of the message sent by Maria, rather than to the sending event itself

      Finally, some nouns, especially nominalisations, are ambiguous between events and their participants. For instance, a costruction may be an event (the construction of the bridge took 2 years) or its result (this bridge is a spectacular construction). In that case, if the verbless NP can refer to the event, then you should prefer this reading over the "participant" interpretation. For example, in John made a construction, you may ask if John's construction refers to the construction event or to its result. In this case, it can refer to the event, so it should be annotated as LVC.full.

      Test LVC.5 - [V-SUBJ-N-CAUSE] Verb's subject is noun's cause

      Is the subject of the verb expressing the cause of the predicate expressed by the noun? In other words, does the verb bring an additional participant to the scene, representing the source or cause of the event or state referred to by the noun?

      • annotate as LVC.cause
        • حقوق أعطى to give rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
        • Иван даде възможност на Мария да представи картините си Ivan gave Maria the opportunity to present her paintings→ Ivan is not a semantic argument of възможност opportunity but he is the cause of the opportunity
        • (OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W) → The subject of the verb is the cause of the event reffered to by the predicative noun.
        • δίνω ικανοποίησηδino ikanopiisi give satisfaction to satisfy → the subject of the verb δίνω is the cause for the emotion denoted by the predicative noun ικανοποίηση and experienced by its complement
        • to grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
          to give a headache → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
          the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provoke, which is a prototypical causative verb. Here, the subject is not the agent of destruction, but its cause. Notice that if the sentence was the explosion provoked the destruction of the building, then the construction would be an LVC.full
          residents seek to build consensus on the development of the territory → the semantic argument of consensus is the topic on which everybody agrees, the subject of build consensus expresses an external participant responsible for the consensus to exist.
        • otorgar derechos to grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
          dar dolor de cabeza → X has a headache, the cause of the headache, indicated as the subject of dar is not a semantic argument
          la nueva ley provocó la destrucción del edificio the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provocar to provoke, which is a prototypical causative verb. Here, the subject is not the agent of destrucción destruction, but its cause. Notice that if the sentence was la explosión provocó la destrucción del edificio the explosion provoked the destruction of the building, then the construction would be an LVC.full
        • τιμωρίαν ποιέωtimо̄rian poieо̄ punishment.ACC do.1SG I inflict punishment → the subject of the verb is the cause of the event referred to by the noun
        • zadati glavobolju to give a headache→ X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
        • 質の高い演奏が彼に聴衆の高い評価をもたらした(こと)quality.gen high performance.nom he.dat audience.gen high evaluation.acc brought (the fact)His high-quality play earned him high praise from the audience → The subject is the cause of the 'high praise' from the audience
        • hoofdpijn geven → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
        • Marek dał mi prawo wyboru Marek gave me the right to choose→ Marek is not a semantic argument of prawo right but he is the cause of the right
          dać podstawy prawne to give legal foundation
          nakładać na kogoś powinność to put a duty on sb.
          narazić kogoś na straty to expose someone to losses
          stawiać komuś cel to set an aim to someone
          ślady krwi wzbudziły podejrzenia policji the traces of blood raised suspicion to the police
        • Bombardamentul a provocat moartea multor civili. The bombing provoked the death of many civilians.Many civilians (mulți civili) died and their death (moarte) was provoked by the bombing (bombardamentul)
        • Борко је Марији задао бриге Borko je Mariji zadao brige Borko gave to Marija worries Borko worried Marija → Marija has a headache, the cause of the headache, indicated as the subject of задао zadao give is not a semantic argument of бриге brige worries
      • it is not an LVC
        • إنطباع أعطى → the subject of أعطى to give is not what is causing إنطباع the impression
        • Този инцидент подрони авторитета на кандидата This incident undermined the authority of the candidate→ Инцидентът incident is neither a semantic argument of the authority nor its cause
        • (OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → The subject of the passive verb form is not the cause of the event.
        • παίρνω απάντησηperno apantisi to take an answer to receive an answer → the subject of παίρνω is not what is causing a reply
        • to relieve a headache → the subject of relieve is not what is causing a headache
          to give birth → tricky case, since the subject of give actually is a semantic argument of birth, so it cannot be its cause. This construction must be annotated as VID (it does not pass test LVC.4 either).
          excessive heat provokes fire → even though provoke prototypically expresses a cause, in this case fire is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
        • calmar un dolor de cabeza to relieve a headache → the subject of calmar to relieve is not what is causing a headache
          dar a luz to give birth→ tricky case, since the subject of dar to give actually is a semantic argument of a luz, so it cannot be its cause. This construction must be annotated as VID (it does not pass test VPC.4 either).
          un calor excesivo provoca incendios excessive heat provokes fires→ even though provocar prototypically expresses a cause, in this case incendios is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
        • συγγνώμης τυγχάνειsuggnо̄mēs tugkhanei pardon.GEN get.3SG he gets pardoned → the subject of the verb is not the cause of the event referred to by the noun
        • de hoofdpijn verlichten → the subject of verlichten is not what is causing a headache
        • Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is neither a semantic argument of the confidence nor its cause (it is the opposite of the cause)
          komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is neither a semantic argument of wybory vote not its cause
          mocny zapach uśpił czujność psów the strong scent lulled the vigilance of the dogs → the scent is the opposite of the cause of vigilance
        • căldura excesivă provoacă incendii → even though provocaprovoke prototypically expresses a cause, in this case incendiufire is not a predicate and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
        • Marija ima brata Marija has a brother Marijin brat Marija's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
          Marija je poslala pismo Marija sent a letter Marijino pismo Marija's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
          Marija ima mnenje Marija has an opinion and more generally, cases of imeti to have + a noun refering to the state of having a mental content (mnenje, predstava, dvom opinion, idea, doubt ) → Marijino mnenje Marija's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
          Marija je postavila vprašanje/trditev Marija posed a question/statement and more generally, cases of postaviti make + a noun refering to a speech act → Marijino vprašanje Mary's question refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
        • Борко ће Марију ослободити брига Borko će Mariju osloboditi briga Borko freed Marija of her worries. → Borko, the subject of ослободити osloboditi relieve is not what is causing бриге brige worries

      Constructions annotated as LVC.cause involve:

      1. verbs that are typically used to express the cause of predicative nouns in general (e.g. cause, provoke), or
      2. verbs that are only used to express the cause of particular predicative nouns (e.g. grant in to grant a right).

      When the construction involves a typically causative verb (e.g. cause, provoke), it might seem counter-intuitive to annotate it as VMWE because it looks perfectly regular, not presenting any VMWE idiosyncrasy. However, it turned out difficult to distinguish idiosyncratic from regular LVC.cause, so both should be annotated, like for LVC.full. In other words, some LVC.cause constructions are compositional and can be understood as complex predicates with a causal support verb, regardless of their compositionality.

      Typically causative verbs (e.g. cause, provoke) can sometimes be light. In this case, according to the LVC decision tree, LVC.full has priority over LVC.cause. For instance, the announcement provoked an unexpected reaction should be annotated as LVC.full and not LVC.cause, although provoke is a typically causative verb. Indeed, reaction has two arguments (reaction of X to Y), one of which is the subject of the verb (test LVC.2 passes). In other words, typically causative verbs may be used in either LVC.full or LVC.cause, depending upon whether the cause subject of the verb is a normal, canonical argument to the predicative noun (LVC.full) or an "external" non-canonical cause (LVC.cause).

      Some verbs could be considered causative, but their interpretation goes beyond purely indicating the cause of the event/state. Therefore, you should NOT annotate as LVC.cause constructions involving:

      • verbs which encode a manner of causation:
        • to call a meeting entails communication to schedule the meeting
          to hold a meeting entails leadership
          to organize classes entails preparation
        • συνἠγαγεν ἐκκλησίανsunēgagen ekklēsian lead.together.3SG meeting.ACC he held a meeting entails leadership
      • verbs which encode modality:
        • to allow dialogue entails permission
          to foster dialogue entails assistance
          to require dialogue entails necessity
      • aspectual verbs whose subject is a semantic argument of the noun:
        • αρχίσαμε τη συζήτησηarchisame ti syzitisi we started the conversation
          τελειώσαμε τη συζήτησηteliosame ti sizitisi finished.01.PST the conversation We finished the conversation
        • we started the meeting
          we ended the meeting
          we continued the meeting
        • ἄρχειν τοῦ λόγουarkhein tou logou start the speech to begin speaking
        • we begonnen de vergadering we started the meeting

      Problematic cases and remarks

      Syntactic variants

      The (single or compound) noun n functions as a regular syntactic dependent, so LVCs exhibit regular syntactic variants.

      • قرار أخذ make a decision المدير أخذه الذي القرار the decision that made by the director
      • взема решениерешението, което президентът взе the decision that the president made
      • eine Entscheidung treffen → die Entscheidung die der Direktor zu treffen hatte.
      • παίρνω μία απόφαση → η απόφαση που πρέπει κάποιος να πάρει.perno mia apofasi → i apofasi pu prepi kapios na pari take a decision → the decision one has to take to make a decision, the decision I have to make
      • make a decision → the decision that the director has to make.
      • tomar una decisión → la decisión tomada por la directora.
      • erabaki bat hartu decision one take to make a decision→ zuzendariak hartutako erabakia director taken decision the decision (which was) made by the director
      • prendre une décision → la décision prise par la directrice.
      • δόξαν ἣν ἔνιοι ἔχουσι περὶ τῆς Νικοφήμου οὐσίαςdoxan hēn enioi ekhousi peri tēs Nikophēmou ousias opinion which some have.3PL about the Nicophemos’ property the opinion which some hold about Nicophemus' property is a syntactic variant
        δόξαν ἔχουσιdoxan ekhousi opinion.ACC have.3PL they hold an opinion is the canonical form
      • donijeti odluku to make a decisionodluka koju je morao donijeti direktor the decison that the director had to make
      • prendere una decisione → la decisione che il direttore ha dovuto prendere.
      • een beslissing nemen → de beslissing die de directeur moet nemen.
      • wziąć udział to take participation.ACCto take part wzięcie udziału taking.GER participation.GENtaking part, biorący udział taking.PART participation.ACCtaking part
      • tomar banho take shower → o banho que eu tomei estava bom the shower which I took was good
      • a lua o decizieto make a decisiondecizia pe care directorul trebuie să o ia the decison that the director has to make.
      • dati ime nekomu to give (somebody) a name to name (somebody) → the object receives a name and this action implies that as a result he/she is named. Therefore person who gives a name causes that something is named. The subject of the verb is not its semantic argument.
        narediti konec nečemu to make an end (to something) to end (something) → the result of this action is that something is finished, which is caused by the subject of narediti to make
      • задати некоме бриге zadati nekome brige to give worries to sb. to worry sb. бриге које је Борко задао Марији brige koje je Borko zadao Mariji The worries Borko gave to Marija

      All LVC tests should be applied to a neutral form. If there is the neutral form is not totally syntactically unmarked (for instance it must be in passive voice), this is an indication that the target construction might not be an LVC, but a verbal idiom instead.

      Selection of the verb

      In many cases of LVCs, it can be said that there is some degree of selection of the verb by the noun.

      • جولة ب قام make a walk vs سباق ب قام make a race
      • вземам решение to make a decision vs *вземам отговорност to take responsibility
        имам право to be right vs *притежавам право
      • eine Entscheidung treffen a decision meet make a decisionvs.*eine Entscheidung machen a decision make vs. *einen Beschluss treffen a resolution meet
      • κάνω διάλειμμα vs. #παίρνω διάλειμμα
        παίρνω απόφαση vs.#κάνω απόφαση
      • have a walk vs *have a race
        run a race vs *run a walk
      • tomar una decisión take a decision make a walk vs *dar una decisión give a decision but darse/tomar una ducha give.self/take a shower
      • pauso eman step give to take a step vs. ?pauso egin step do
        bisita egin visit do to pay a visit vs. bisita eman visit give
      • faire une marche make a walk take a walk vs *procéder à une promenade perform a walk but faire/procéder à une enquête make/perform an inquiry
      • χάριν δίδωμιkharin didо̄mi gratitude.ACC give.1SG I show gratitude #χάριν ποιέομαι
      • postaviti pitanje to put a question to pose a question vs *postaviti odgovor
      • prendere una decisione take a decision make a decisionvs.*fareuna decisione make a decision vs. *prendere una conclusione take a conclusion
      • een wandeling maken a walk make to take a walk vs.*een race maken a race make
      • wziąć udział to take participation vs. *pobrać udział
        mieć rację to have rightto be right vs. *posiadać rację to possess right
      • fazer uma prece to make a prayer vs. *dar uma prece to give a prayer but fazer/dar uma caminhada to make/give a walk
      • a da divorț to give divorce to divorce vs. *a oferi divorț
      • dati nasvet to give an advice → the subject of dati give cannot cause an advice
      • имати право imati pravo to have right to be right vs. *поседовати право *posedovati pravo to possess right

      Yet some regularities exist. For example, large classes of nouns function with have (e.g. +property) or commit (+negative achievement). Therefore, we chose not to retain the selection of the verb as a criterion for LVC categorization. Instead, the decision tree should be applied to decide whether a candidate should be annotated as LVC.

      Scope of annotation vs. literature on LVCs

      Many authors distinguish support verbs from light verbs, still others differentiate between true light verbs and vague action verbs.

      On the one hand, we take a narrower scope than what is usually considered in the literature by ignoring aspectual support verbs (except when aspect is morphological). We believe that aspectual verbs do contribute an additional (change of state) meaning to the expression, and most of the time they are completely productive, not forming interesting VMWEs. For instance, for the predicative noun walk, we will consider the light verb to have, but not the aspectual verbs to start, to pursue, to stop a walk. Thus, to have a walk is an LVC.full. Note that for some nouns such as bloom, which are in itself inchoative, we do consider to come into bloom as LVC.full, as both the verb and the noun are inchoative, so the verb does not add any semantics to the noun.

      On the other hand we take a broader scope than what is usually considered in the literature by taking in cases in which the verb has light semantics per se (it only bears morphology, such as the tense and mood, in any case), which hence cannot be described as "bleached" as is usually said of support verbs. For instance, whereas to pay does not have its usual meaning in to pay a visit, it cannot really be said that commit does not have one of its meanings in commit a crime (note that commit can be used with any negatively charged achievement noun, e.g. suicide, crime, fraud, felony...). Nonetheless, we annotate to commit a crime as LVC.full since it passes all tests.

      Verb and adjective paraphrase

      One test often used in the literature is the existence of a morphologically related verb or adjective that means the same as the LVC. For instance, to make a visit is equivalent to to visit, to have an illness is equivalent to to be ill. Note however that it is neither sufficient nor compulsory:

      • some LVCs have no derivationally-related equivalents, such as to have a flu, to have faith and to commit a crime;
      • some constructions that are not LVCs do have a derivationally-related equivalent such as to write an email and to email;
      • some LVCs have derivationally-related equivalents that do not mean the same as the LVC, such as to make a face and to face, or that have different argumental structure from the LVC, such as to have a problem and to be problematic.

      Nonetheless, it might be useful to reason about the derivationally-related equivalents to decide whether a noun is predicative in test LVC.1. Therefore, here are some useful questions that might help deciding about the predicative nature of the noun in the LVC candidate

      Verb paraphrase Is the abstract noun derivationally related to a verb with the same semantics? Then, there is probably a semantic argument, which coincides with the subject of the verb, so test LVC.1 passes:

      • القرار أحمد أخذ Ahmed made a decision = أحمد قرر Ahmed decided
      • вземам решение to make a decision = решавам to decide
        правя грешка to make a mistake = греша/сгрешавам to make a mistake
      • ο Γιάννης παίρνει μία απόφαση John makes a decision = ο Γιάννης αποφασίζει John decides
        ο Γιάννης κάνει ένα ταξίδι John makes a trip = o Γιάννης ταξιδεύει
        ο Γιάννης έχει θάρρος John has courage = ο Γιάννης είναι θαρραλέος John is courageous → and, more generally, characteristics and attributes
        ο Γιάννης έχει πείνα/δίψα John has hunger/thirst = ο Γιάννης πεινάει/διψάει John is hungry/thirsty → and, more generally, physical sensations
        ο Γιάννης έχει πάθος/φόβο/θυμό John has passion/fear/anger = ο Γιάννης παθιάζεται/φοβάται/θυμώνει John is passionate/afraid/angry → and, more generally, feelings, emotions, states
      • John makes a decision = John decides
        John has a walk = John walks
      • Juan toma una decisión Juan makes a decision = Juan decide Juan decides
        Juan da un paseo Juan takes a walk = Juan pasea Juan walks
      • Jonek erabakia hartu du = Johen erabaki du John decision-the taken has = John decided has John has made a decision = John has decided
      • πορείαν ποιέομαιporeian poieomai march.ACC do.1SG I march = πορεύομαι
      • Ivan donosi odluku Ivan takes decision = Ivan odlučujeIvan decides
        Janica jeodnijela pobjedu Janica carried away a win = Janica je pobijedila Janica won
      • John neemt een beslissing John makes a decision = John beslist John decides
      • Jan podejmuje decyzję John takes decision = Jan decyduje John decides
        Ewa odniosła zwycięstwo Eva carried away a victory = Ewa zwyciężyła Eva won
      • Ion ia o decizie John makes a decision = Ion decide John decides
      • postaviti vprašanje to pose a questionvprašanje, ki ga je moral postaviti the question that he had to pose
      • Марко је донео одлуку Marko je doneo odluku Marko brought a decision Marko made a decision = Марко је одлучио Marko je odlučio Marko decided
        Марко је узео учешће Marko je uzeo učešće Marko took participation = Марко је учествовао Marko je učestvovao Marko participated

      Adjective paraphrase: Is the abstract noun derivationally related to an adjective with the same semantics? Then, there is probably a semantic argument, which coincides with the noun that is modified by the adjective, so test LVC.1 passes.

      • شجاعة أحمد ال يملك Ahmed has the courage = شجاع أحمد Ahmed is courageous
      • имам смелост to have courage = съм смел to be courageous
        нямам търпение to not have patience = съм нетърпелив to be impatient
        нося отговорност to carry responsibility = съм отговорен to be responsible
      • ο Γιάννης έχει θάρρος = ο Γιάννης είναι θαρραλέοςo Yanis echi θaros = o Yanis ine θaraleos → and, more generally, characteristics and attributes
        Ο Γιάννης έχει δύναμη = Ο Γιάννης είναι δυνατόςO Γianis echi δinami = O Γianis ine δinatos → and, more generally, characteristics and attributes
      • John has courage = John is courageous → and, more generally, characteristics and attributes
        John has hunger/thirst = John is hungry/thirsty → and, more generally, physical sensations
        John has passion/fear/anger = John is passionate/afraid/angry → and, more generally, feelings and emotions
        John has problems/difficulties = Something is problematic/difficult for John → and, more generally, states
      • Juan tiene miedo Juan has fear = Juan es miedoso Juan is easily scared → and, more generally, characteristics and attributes
        Juan tiene hambre Juan has hunger = Juan está hambriento Juan is hangry → and, more generally, physical sensations
      • Anek itxaropena du = Ane itxaropentsu dago Ane hope has = Ane hopeful is Ane has hope = Ane is hopeful → and, more generally, characteristics and attributes
        Anek = Ane gosetuta Ane hunger has = Ane hungry is Ane has hunger = Ane is hungryand, more generally, physical sensations
      • νοῦν ἔχωnoun ekhо̄ sense.ACC have.1SG I am sensible = ἔννοος
      • imati strpljenja to have patience = biti strpljiv to be patient
        nositi odgovornost to carry responsibility = biti odgovoran to be responsible
      • John heeft moed John has courage = John is moedig → and, more generally, characteristics and attributes
      • mieć odwagę to have courage = być odważnym to be courageous
        mieć straty to have losses = być stratnym to have lost sth
        mieć sens to have a sense to make sense = być sensownym to be reasonable
      • avea curaj to have courage = fi curajosto be courageous
      • имати храбрости imati hrabrosti to have courage = бити храбар biti hrabar to be courageous

      Synonym verb/adjective paraphrase: Does the abstract noun have a synonym/hypernym derivationally related to a verb or adjective with the same semantics? Then, the questions above can be applied to the synmonym verb/adjective.

      • Иван и Мария постигнаха консенсус Ivan and Maria reached a consensus = Ivan and Maria agreed consensus has no corresponding verb or adjective, but agreement is a synonym
      • έχω τη γνώμη echo ti gnomi I have the opinion I think = πιστεύωγνώμη has no corresponding verb or adjective, but πίστη,άποψη are synonyms
      • John and Mary reach a consensus = John and Mary agree consensus has no corresponding verb or adjective, but agreement is a synonym
        John has a chance to do something = John is likely to do something chance has no corresponding verb or adjective, but likelihood is a synonym
      • Anek min eman dio Joni = Anek Jon mindu du Ane pain given has to-Jon = Ane Jon hurt has Ane has hurt Jon
      • Radnici i uprava postigli su konsenzus workers and managment reached consensus = Radnici i uprava su se dogovorili workers and managment agreedkonsenzus consensus has no corresponding verb or adjective, but dogovor agreement is a synonym
      • mieć 190 cm wzrostu to have 190 cm of height to be 190 cm tall = mierzyć 190 cm tp measure 190 cm to be 190 cm tall
        dokonać inwazji to perform an invasion = wtargnąć to invade
      • da voie=permite
      • Маја има шансе да нешто уради Maja ima šanse da nešto uradi = Маја може нешто да уради Maja može nešto da uradi шанса šasna has no corresponding verb or adjective, but моћи/могућност moći/mogućnost is a synonym

      The existence of a related verb is not a definitive tests, but a hint that the noun is probably predicative. Since determining whether a noun is predicative is tricky, we advise language teams to provide additional documentation and examples for borderline cases.

      Checking if the subject is an argument with syntactic tests

      The previous version of the guidelines had a syntactic test which you can still use to verify if the verb's subject is an argument of the noun. However, this test was considered hard to apply in the previous guidelines, and is not mandatory anymore.

      The syntactic test consists in trying to add the semantic argument as a complement of the noun in the presence of the verb. In other words, does the noun n, in the presence of v, prohibit at least one syntactic argument a which it normally licensed in the absence of v?

      An alternative formulation for this test is the following: Let s be the subject of v, and let r be the semantic role that s plays with respect to the noun n. Is it prohibited for r to be realized both by s and by a syntactic argument a of n, except when a is in the whole–part relation with s?
      • الميزانية قرار   الوزير  أخذ   + قرار الحكومة في الميزانية في الميزانية قرارالحكومة أخذ الوزير الوزير the decider cannot be a modifier of decision قرار
      • Петър Стоянов взе решението да подпише договора Ivan made the decision to sign the contract + решението на президента да подпише договора*Петър Стоянов взе решението на президента да подпише договора — the noun cannot be modified by the person performing the act/event (which is the subject)
      • Die Königin hat dem Premierminister einen Besuch abgestattet the Queen has paid a visit to the Prime Minister + ein Besuch der Dame beim Premierminister a visit of the Lady to the Prime Minister *Die Königin hat einen Besuch der Dame beim Premierminister abgestattet*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
        Paul hat eine Entscheidung über das Budget getroffen Paul made a decision on the budget + die Entscheidung des Rates über das Budget the council's decision on the budget*Paul traf die Entscheidung des Rates über das Budget *Paul made the committee's decision on the budget — the decision maker cannot modify decision
      • ο πρωθυπουργός έκανε επίσημη επίσκεψη στον Αμερικανό πρόεδροo proθypurgos ekane episimi episkepsi ston amerikano proedro + η επίσκεψη του πρωθυπουργού στον Αμερικανό πρόεδρο
        ο πρωθυπουργός έκανε επίσημη επίσκεψη του υπουργού στον Αμερικανό πρόεδρo proθypurgos ekane episimi episkepsi tu ypurgu ston amerikano proedro — the visitor cannot be a modifier of επίσκεψη
      • The Queen paid a visit to the Prime Minister + a visit of the Lady to the Prime Minister*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
        Paul made a decision on the budget + the committee's decision on the budget*Paul made the committee's decision on the budget — the decision maker cannot modify decision
        Paul had a discussion with Mary+ Peter's discussion*Paul had Peter's discussion with Mary
        Bjarnson scored a goal + Arnason's goal*Paul scored Arnason's goal but Paul scored the goal of Iceland — the scoring entity can only modify goal in the last case, when they are part of the Iceland team
      • La reina hizo una visita al primer ministro The Queen paid a visit to the prime minister + una visita de la primera dama al primer ministro a visit of the first Lady to the prime minister*La reina hizo una visita de la primera dama al primer ministro The Queen paid a visit of the first lady to the first minister— the visitor cannot be a modifier of visita
        Pablo tomó una decisión con respecto al presupuesto Pablo made a decision on the budget + la decisión del comité con respecto al presupuesto the committee's decision on the budget*Pablo tomó la decisión del comité con respecto al presupuesto Pablo made the committee's decision on the budget— the decision maker cannot modify decisión
      • Ikasleek arreta jarri zioten irakasleari +lagunen arreta The-students attention put to-the-teacher + friends' attention The students paid attention to the teacher + their friends' attention*Ikasleek lagunen arreta jarri zioten irakasleari The students paid their friends' attention to the teacherthe person paying attention cannot be a modifier of arreta
      • La ministre a rendu une visite aux victimes + la visite de la ministre aux victimes*La ministre a rendu une visite du président aux victimes — the visitor cannot be a modifier of visite
        Bjarnson a marqué un but + le but d'Arnason*Paul a marqué le but d'Arnason but Paul a marqué le but de l'Islande — the scoring entity can only modify but (goal) in the last case, when they are part of the Iceland team
      • Učiteljica je donijela odluku u vezi s izletom The teacher made a decision regarding the excursion + učenikova odluka u vezi s izletom pupil's decision regarding the excursion*učiteljica je donijela učenikovu odluku u vezi s izletom — the decision maker cannot modify decision
      • Il primo ministro ha preso la decisione di dimettersi the Prime Minister decided to resign + le dimissioni del governo the resignation of the government*Il primo ministro ha preso la decisione del governo di dimettersi — the resigner cannot be a modifier of resignation
      • De koningin heeft de premier een bezoek gebracht the Queen has paid a visit to the Prime Minister + een bezoek van de dame aan de premier a visit of the Lady to the Prime Minister *De koningin heeft een bezoek van de dame aan de premier gebracht*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
      • Paweł złożył rezygnację ze stanowiska dyrektora Paweł submitted a resignation from the position of the director Paweł tendered his resignation from the director position + rezygnacja Piotra *Paweł złożył rezygnację Piotra ze stanowiska dyrektora Paweł tendered Piotr's resignation from the director position - the resignation cannot be modified by the resigning person
        Paweł prowadzi rozmowy *Paweł prowadzi rozmowy Piotra Paweł leads Piotr's talks , Paweł prowadzi rozmowy komisji Paweł leads the talks of the commission - the discussing entity komisjacommission can only modify rozmowytalks if Paweł belongs to the commission.
        Jan otrzymał wymówienieJan received a dismissal + wymówienie dla Pawła dismissal for Paweł *Jan otrzymał wymówienie dla Piotra
      • João está tomando banho John is taking shower + o banho do Pedro Pedro's shower*João está tomando o banho do Pedro — the bath cannot be modified by a bath taker
        Pedro sofreu prejuízo com a compra Pedro suffered finantial loss with the purchase + o prejuízo do José José's finantial loss*Pedro sofreu o prejuízo do José com a compra — the financial loss cannot be modified by the affected entity
        A Maria fez um aborto Maria made an abortion + o aborto da Joana Joana's abortion#A Maria fez o aborto da Joana — the noun cannot be modified by another patient
        O médico realizou o parto com sucesso The doctor performed the childbirth with success + o parto do Dr. Pedro Dr. Smith's childbirth*O médico realizou o parto do Dr. Pedro com sucesso — the childbirth could be modified by the mother (patient) but not by another doctor (agent).
      • Paul a dat sfaturi surorii salePaul gave advice to his sister + sfatul lui Petre Peter's advicePaul a dat sfatul lui Petre surorii sale Paul gave Peter's advice to his sistersfatul the advice cannot be modified by its author
      • Aleš si dela skrbi Aleš makes worries Aleš has worries = Aleš je zaskrbljen Aleš is worried → and, more generally, feelings and emotions
      • Борко је водио расправу с Маријом Borko je vodio raspravu s Marijom Borko led a discussion with Marija Borko had a discussion with Marija + Петрова расправа +Petrova rasprava Borko's discussion*Борко је водио Петрову расправу с Маријом *Borko je vodio Petrovu raspravu s Marijom Borko had Peter's discussion with Marija

      The rationale for this tests is that a semantic argument n cannot be realized as its syntactic dependent, since it is already realized as v's syntactic dependent instead (usually as v's subject). For instance the noun visit takes two semantic arguments, the visitor and the visited entity, as in the visit of the Queen to the Prime Minister. When used in to pay a visit, the visitor semantic argument is realized as the subject of to pay (The Queen paid a visit to the Prime Minister), and cannot be realized at the same time within the NP headed by visit (*The Queen paid a visit of the Lady to the Prime Minister).

      Note that the syntactic formulation may be tricky to apply. It is sometimes possible to add the semantic argument as a complement of the noun in the presence of the verb, if we change the interpretation of the argument (and thus its thematic role). For instance, even though the construction John took Luke's decision may be acceptable, the interpretation would be comparative (John took a decision that Luke should have taken). Therefore, the test passes since the verb is still connecting a predicate (decision) to its argument (John, the decider).


      Section 5.3

      Verbal idioms (VID)

      Verbal idioms constitute a universal category. A verbal idiom (VID) has at least two lexicalized components including a head verb and at least one of its dependents. The dependent can be of different types. Here are some examples:

      • Subject
        • أسهم ال ارتفعت stock soared
        • броят му се ребрата be counted someone's (possessive pronoun) ribs (someone) to be very thin and skinny
        • ein kleines Vöglein hat mir gezwitschert a little bird told me
        • (OEG) 𓄫 𓄣 𓎡 𓆓𓏏𓇿 ꜣw ꞽb ⸗k č̣.t Your heart (ꞽb) shall-be-long (ꜣw) eternally (č̣.t). You shall be glad eternally. (Borchardt 1907: 80, fig. 55)
        • μου είπε ένα πουλάκιmu ipe ena pulaki me told a little-bird a little bird told me
          κόβει το μάτι μου kovi to mati mu cut the eye my to notice
        • a little bird told someone
        • tu hora ha llegado your time has arrived your time has come
        • ἐὰν θεὸς ἐθέλῃean theos ethelē if god want.3SG if possible
        • un uccellino disse a qualcuno
        • がつく mind.nom touch notice/realise
          気分が晴れる feeling.nom clear feel better
        • galva kūp the head is steaming knowsto do something with great mental effort
        • boontje komt om zijn loontje he that mischief hatches, mischief catches
        • licho wie devil knowsI have no idea
        • a sua hora chegou your time has arrived your time has come
        • a șoptit o păsăricăwhispered a bird little a little bird told someone
        • srce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky
        • ђаво да некога носи đavo da nekoga nosi may the Devil carry someone to hell with someone
          Бог некога погледао Bog nekoga pogledao God looked at someone to be lucky
          ђаво је умешао прсте đavo je umešao prste the Devil mixed in his fingers an unfavorable outcome
          пао некоме мрак на очи pao nekome mrak na oči darkness fell on someone's eyes to blow a fuse
      • Direct object
        • إجتماع   أحيى revived a meeting to lead a meeting
        • гушна букета hug the bunch of flowers to die
        • er hat den Schuss nicht gehört he did the shoot not hear it takes him a long(er) time to understand sth
        • (OEG) 𓐣𓂝𓏝 𓃹𓈖𓇋𓋴 𓌃𓅱𓏝 𓈖 𓋹𓈖𓐍𓅱 wč̣ꜥ Wnꞽś mṭw n ꜥnḫ.w Unas (Wnꞽś) shall-separate (wč̣ꜥ) the word (mṭw) for (n) the living (ꜥnḫ.w). Unas shall judge the living (PT 273b, W)
        • κάνω σεφτέkano sefte
          λαμβάνω μέροςtake part
          κρατάω τα μπόσικαkratao ta bosika
        • to kick the bucket
        • estirar la pata to strech the leg kick the bucket
        • δίκην λαμβάνωdikēn lambanо̄ justice.ACC take.1SG I punish
        • tirare le cuoia
        • 空気を読む atmosphere.acc read read the situation
        • atstiept kājas to strech one's legs to die
        • het ijzer smeden als het heet is iron forge while hot make hay while the sun shines; strike while the iron is hot
        • udać Greka to pretend to be a Greekto pretend not to understand
        • bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)
        • bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)
        • ustreliti kozla to shoot the goat to say or do something stupid
        • дизати нос dizati nos to raise one's nose to be haughty
          добити ногу dobiti nogu to get a leg to get dumped
          држати банку držati banku to hold a bank to dominate the conversation
      • Circumstantial or adverbial complement
        • الحديد وهو ساخن ضرب hit the iron and it is hot strike while the iron is hot
        • удрям в гръб hit in the back to stab in the back
          правя сам да си говори make (someone) to talk to himself to drive (someone) crazy
        • etwas wie warme Semmeln verkaufen sth. like warm bread rolls to sell sth. fast and easy
        • (OEG) 𓁹 𓂋 𓄣 𓎡 ꞽr (⸗ꞽ) r ꞽb ⸗k (I) (⸗ꞽ) shall-do (ꞽr) according-to (r) your (⸗k) heart (ꞽb). I will do what you want. (Duell 1938: pl. 162)
        • φέρω βαρέωςfero vareos bring heavily resent
        • to take something with a pinch of salt, to sell like hotcakes, to strike while the iron is hot, to come off with flying colors
        • coger algo con pinzas to hang something with pegs take something with a pinch of salt
        • εἰς χεῖρας ἐλθεῖνeis kheiras elthein into hand.PL go.INF to surrender
        • prendere qualcosa con le pinze
          battere il ferro finché è caldo
        • になる mind.dat become be on one's mind
        • palaist vējā to let go in the windto waste
        • iets met een korreltje zout nemen to take something with a pinch of salt
        • wiercić komuś dziurę w brzuchu to drill a hole in one's bellyto intrusively solicit someone, to insist too much
        • levar em conta to bring in account to take into account
          ir ao ar go to the air to go on air
        • a lua în considerare to bring in account to take into account
        • spati kot ubit to sleep like dead to sleep soundly
        • продаје се као алва prodaje se kao alva to sell like halva to sell well
          ударити на велика звона udariti na velika zvona to bang on big bells to spread the news
          бити као запета пушка biti kao zapeta puška to be like a tense rifle to be ready for action

      It is often challenging to distinguish VIDs from other VMWE categories if only one dependent of the head verb is lexicalized. The VMWE categorization depends on the category of this dependent:

      • Reflexive clitic or particle: the VMWE is either an IRV (reflexive pronoun) or an IVPC (particle), never a VID.
      • Verb with no lexicalized dependent: fine-grained tests need to be applied in order to discriminate between a MVC and a VID. See the section on Structural tests.
      • Extended nominal phrase: fine-grained tests need to be applied in order to discriminate between an LVC and a VID. See the section on Structural tests.

      With a dependent of any other category, the VMWE is always a VID, including the following:

      • Adjectival phrase
        • постигам своето to achieve one's ownto have it my way
        • schwarz fahren to drive black to take a ride without a ticket
        • κάνω αρπαχτή
          κρατάω πισινή
        • to come clean, to stand firm
        • jugar sucio to play dirty to play dirty
        • uscirne puliti
        • うまくいく good.ly go go well
        • panākt savu to achieve one's ownto have it my way
        • zwartrijden to drive black to take a ride without a ticket
        • zrobić swoje to do one's ownto do what one is supposed to do
          tykać cudze to touch someone else'sto take something that does not belong to you
          dopiąć swego to button up one's ownto fulfill one's plans
        • to jogar sujo to play dirty
        • a juca murdar to play dirty
        • biti zelen od zavisti to be green with envy
        • бити зелен biti zelen to be green to be a greenhorn/to be inexperienced
      • Verb with lexicalized dependents
        • не мога две думи на кръст да кажа I cannot say two words crossing each other to be unable to speak or express oneself две думи на кръст да кажа is a clause
          правя сам да си говори make someone talk to himself to drive someone crazy сам да си говори is a clause
        • έπεσε να πεθάνει
        • to make ends meet
        • far quadrare i conti
        • de handen ineenslaan hands joined hit join forces
        • састављати крај с крајем sastavljati kraj s krajem to join one end to another to make ends meet крај са крајем kraj sa krajem is a clause
      • Relative clause
        • ще видиш откъде изгрява слънцето you will see where the sun rises from(angrily) you will get what you deserve, you will be punished
        • wissen wo es langgeht to know where things are heading to know on which side one's bread is buttered
        • δεν ξέρω πούν παν τα τέσσερα
        • to know on which side the bread is buttered
        • saber de qué pie cojea to know of which foot (he/she) limps to know someone inside out
        • non sapere da quale parte stare
        • parādīt, kur vēži ziemo to show where crayfish hibernateto cause someone the unpleasantness he deserves
        • de klok horen luiden, maar niet weten waar de klepel hangt to hear the bell ring, but not know where the clapper hangs ≈to not know the details of something
        • wiedzieć, skąd wieje wiatr to know where wind blows fromto know on which side your bread is buttered, to know how to take advantage of the situation
        • saber onde pisar know where to-step to know the way to succeed in something
          mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revenge
        • a ști cu ce se mănâncă to know with what CL.Refl. eats to knwo what it is about
        • vedeti koliko je ura to know what time it is to realize the truth
        • знати у ком грму лежи зец znati u kom grmu leži zec to know in which bush the rabbit lies to know what the main problem is
          не знати где је некоме глава ne znati gde je nekome glava not to know where one's head is to be out of one's mind
          дај шта даш дај šta daš give what you give be satisfied with anything that is given to you
      • Non-reflexive pronoun
        • втасахме я we proved it.FEM (as in bread: raise in volume due to yeast) to fall into a difficult situation
        • es gibt it gives there is
        • τα καταφέρνωta kataferno
          την πατάωtin patao
        • to make it
        • l'emporter to take it away to win
        • le ha prese
        • Ej tu galīgi!Go you ultimately! Go to hell!
        • het eens zijn it agreed be to agree
        • Polish does not seem to have this type of VMWEs
        • dá-lhe João! give to him/her, João! show them what you got, João!
        • a o șterge to her delete to fly the coop
          a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletive
        • ucvreti jo to escape her to escape something/someone by running
        • n.a.

      Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of VIDs.

      • تجري الرياح بما لا تشتهي السفن Winds blow counter to what ships desire
      • краставите магарета се надушват отдалече the itchy donkeys smell each other from afaralike people are attracted to each other
      • Rom wurde nicht an einem Tag erbaut Rome was not build in a day wer A sagt muss auch B sagen who says A must also say B you must finish what you start
      • στο σπίτι του κρεμασμένου δεν μιλάνε για σχοινίin-the house the.GEN hunged-man.GEN not speak.03.PL about rope
      • Rome was not built in a day
        Fortune favors the bold
        The pleasure is mine
        I beg your pardon!
      • Roma no se construyó en un día Rome was not build in a day donde dije digo, digo Diego where said.I said, say.I Diego to do or give something and then take it back, to retract oneself
      • συνῄδη οὐδὲν ἐπισταμένῳsunēdē ouden epistamenо̄ know.PLP.1SG nothing know.PTC I know that I know nothing
      • Roma non è stata costruita in un giorno
        La fortuna aiuta gli audaci
        Il piacere è mio
      • Rīga nekad nebūs gatava Riga will never be ready (made)
      • de klok horen luiden, maar niet weten waar de klepel hangt to hear the bell ring, but not know where the clapper hangs ≈to not know the details of something
      • trafiła kosa na kamień met the scythe a stonesomeone rude/dishonest came across someone else who used similar methods against him/her
      • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
      • Urciorul nu merge de multe ori la apă Pitcher-the not goes of many times at water The pitcher goes so often to the well that it is broken at last
      • Počasi se daleč pride more haste less speed
        Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to late
      • нашла крпа закрпу našla krpa zakrpu a rag found a patch to find one's other half
        било па прошло bilo pa prošlo happened and it's done let bygones be bygones
        рекла казала rekla kazala said and told hearsay

      If more than one dependent of the head verb is lexicalized, then the candidate VMWE is always classified as a VID.

      • لسانه القط أكل the cat ate his tongue
      • заравям глава в пясъка to hide head in sandto pretend not to see a problem
      • die Katze aus dem Sack lassen to let the cat out of the bag
      • βάζω λάδι στη φωτιά vazo ladi sti fotia put oil to-the firemake things worse
        κάνω τη ζωή ποδήλατοkano ti zoi poδilato make.1SG the life bicycle to torture
      • to let the cat out of the bag, to cut a long story short, to call it a day
      • hacer de tripas corazón make of intestines heart to pluck up the courage
        dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
        dar gato por liebre to_give cat for hare to rip off, to take for a ride
      • se faire des idées to make SELF ideas to imagine something false,s'en aller to go SELF from there to leave,il y a it has there there is
      • 大きくする story.acc big make exaggerate
      • pirkt kaķi maisā to buy a cat in a bagto agree to something without knowing the necessary information
      • een kat in de zak kopen to buy a pig in a poke → two dependents kat and in de zak
      • chować głowę w piasek to hide head in sandto pretend not to see a problem
      • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat
      • a da bir cu fugiții to give tribute with fugitives.the to back away
      • beseda mi je ostala v grlu word got stuck in my throat I am speechless
      • бежати главом без обзира bežati glavom bez obzira to run away mindlessly to bolt
        бежати као ђаво од крста bežati kao đavo od krsta to run away like Satan from a cross to run like a bat out of hell
        забити главу у песак zabiti glavu u pesak to stick your head in the sand to bury your head in the sand
        ићи линијом мањег отпора ići linijom manjeg otpora to go with the line of least resistance to take the path of least resistance
      • att sätta sig upp mot någon to sit oneself up against someone to defy someone
        att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)

      Cases when there is no single clearly identifiable head verb, because of coordinated verbs or of an irregular syntactic structure, are also covered by the VID category.

      • اصبر تنل be patient you get be patient
      • цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
      • leben und leben lassen live and let live
      • έδωσε πήρε
      • to drink and drive
      • coser y cantarto_sew and to_singeasy as pie, a piece of cake
      • ἠντεβόλει καὶ ἱκετεύεēntebolei kai hiketeue supplicate.3SG and beseech.3SG he begged and beseeched
      • leven en laten leven live and let live
      • pluć i łapać spit and catch to be lazy, to do nothing useful
        coś kogoś ani ziębi, ani grzeje something neither cools nor warms someonesomeone is indifferent to something
        badż tak dobry i zrób cośbe so good and do somenthingbe so good as to do something
      • pintar e bordar paint and knit to abuse
      • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
        seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
      • živi in pusti živeti live and let live
      • ни лук јео ни лук мирисао ni luk jeo ni luk mirisao neither ate nor smelled an onion to be innocent
        нити смрди нити мирише niti smrdi niti miriše neither stinks nor has a nice scent neither good nor bad
      • n.a.
      • to voice act
        to pretty-print
        to short-circuit
        to tumble dry
      • n.a.
      • court-circuiter to short-circuit
      • n.a. there are no cases of compound hyphenated verbs in RO
      • n.a. there are no cases of compound hyphenated verbs in SL
      • рекла-казала rekla-kazala said and told hearsay

      In case of several lexicalized dependents, special care must be taken to identify and also annotate embedded VMWEs.

      • страхувам се от собствената си сянка to fear SELF from one's own shadowto get easily scared → contains the IRV страхувам се to fear SELFto be afraid
      • einen Plan aufstellen to set up a plan to draw up a plan → contains the VPC aufstellen to set up
      • to let the cat out of the bag → contains the VPC to let out
      • hacerse ilusiones make.self hopes to get your hopes up → contains the IRV hacerse
      • se faire des idées to make SELF ideas to imagine something false → contains the non-VMWEs se faire and faire des idées
      • een plan opstellen to set up a plan to draw up a plan → contains the VPC opstellen to set up
      • bać się własnego cienia to fear SELF one's own shadowto be very timid → contains the IRV bać się to fear SELFto be afraid
      • virar-se nos trinta turn-RCLI in-the thirty to get by contains the synonymous IRV virar-se to get by ≠ virar to turn/become
      • a da cărțile pe față to give cards.the on face to reaveal one's true intentions → contains the ID a da pe față to reveal
        a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IRV has to be annotated as well - a three-level embedding
      • delati se norca iz koga to make RCLI fool of someone to make fun of someone → contains the IRV delati se to make oneself to pretend
      • бојати се сопствене сенке bojati se sopstvene senke to fear SELF one's own shadow to be afraid of one's own shadow → contains the IRV бојати се to fear SELF to be afraid

      Idioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.

      • съм с единия крак в гроба be with one leg in the graveto be close to death idiom because #с единия крак в гроба with one leg in the grave loses the meaning
        съм на червено be on redto be in debt → non-VMWE because the copula can be omitted, as in в края на месеца винаги оставам на червеноat the end of the month I always get into debt
      • sei kein Frosch be no frog be no chicken → idiom because #kein Frosch no frog loses the meaning
      • to be dying for → idiom because #dying loses the meaning of wanting something
        to be somebody → idiom because #somebody loses the meaning of being important or successful
        it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutch
      • ser un pelota to be a ball to suck/butter up → idiom because un pelota a ball loses its original meaning
      • οἷον τ`ἦνhoion t’ēn of.what.sort and was.3SG it was possible
      • ??? sprake zijn van there is some talk
      • być jedną nogą na tamtym świecie to be with one leg in the other worldto be close to death idiom because #jedna noga na tamtym świecie one leg in the other world loses the meaning
        być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
        być w trakcie (czegoś) to be in the road (of sth)to be doing sth → non-VMWE because the copula can be omitted, as in wyszedł w trakciezebraniahe went out during the meeting
      • ser alguém na vida to be somebody in life to be somebody → idiom because #alguém na vida loses the meaning
        não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
        isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando grego
      • a fi ușă de biserică to be door of church to be honest → idiom because #ușă de biserică loses the meaning
        a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaning
      • biti trn v peti komu to be a thorn in somebody's heel to be a big problem, obstacle → idiom because #trn v peti loses the meaning
      • бити неко и нешто biti neko i nešto to be someone and something to be a somebody →idiom because #некоsomebody and #нештоsomething loses its meaning of being important or succesful
        бити једном ногом у гробу biti jednom nogom u grobu to be with one leg in the grave to be close to death →idiom because #једном ногом у гробу with one leg in the grave loses its meaning
        бити зелен biti zelen to be green to be a greenhorn/to be inexperienced → idiom because #зелен green loses its meaning

      Note that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.

      Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings. Some authors argue though that partial semantic compositionality can be obtained via decomposability, e.g. to spill the beans is compositional provided that to spill is paraphrased as to reveal and the beans as a secret

      VID-specific decision tree:

      In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a VID. Note however that this tree is to be applied only after it was referred to by the generic decision tree containing structural tests.
      • Apply test VID.1 - [CRAN: Candidate contains cranberry word?]
        • It is a VID, exit.
        • Apply test VID.2 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
          • It is a VID, exit.
          • Apply test VID.3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
            • It is a VID, exit.
            • Apply test VID.4 - [MORPHSYNT: Regular morphosyntactic change ⇒ unexpected meaning shift?]
              • It is a VID, exit.
              • Apply test VID.5 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
                • It is a VID, exit.
                • It is not a VID, exit

      Test VID.1 - [CRAN] - Cranberry word

      Does the candidate expression contain a cranberry word?

      • it is a VID
        • хващам натясно catch in a tight place to coerce, to pressureнатясно is only used in MWEs
          правя на бъзе и коприва to turn into elder and nettle to scold, to tell off бъзе is an old word, very rarely used independently
          вземам предвид, имам предвид to предвид (as adverb) is only used in MWEs
          стоя диван чапраз to stay upright as in Osman council to stay ready to serve чапраз is an old word, very rarely used independently
        • sich um etw. scharen to gather around something scharen is not a stand-alone word
        • μάλλιασε η γλώσσα μου maliase i glosa mu is-full-of-hair-3SG the-SG.NOM tongue-SG.NOM my-SG.GEN.POSS to repeat the same thing again and again μάλλιασε is not a stand-alone word
        • to go astray astray is not a stand-alone word
        • sin decir ni chus ni mus chus is not a stand-alone word without to_say neither chus nor mus without saying a word
          no decir ni chus ni mus chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
          hacer algo a troche y moche troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
        • txintik ere ez esan 'txint' neither no say not even say a word →the word 'txint' is not used out of this expression
        • prendre la poudre d'escampette to escape escampette is not a stand-alone word
        • μηδεμίαν ὤρην ἔχεινmēdemian о̄rēn ekhein no worry have.INF not to be worried
        • mangiare a ufo to eat without paying a ufo is not a stand-alone word
          fare lo gnorri to play dumb gnorri is not a stand-alone word
          scendere in lizza to enter the lists lizza is not a stand-alone word
        • 一矢を報いるone.arrow.ACC repayto retaliate 一矢 is not a stand-alone word
          矢面に立つ arrow.face.LOC standto face direct attack 矢面 is not a stand-alone word
        • op apegapen liggenbe at one's last gasp apegapen is not a stand-alone word
        • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
          wyjść na jaw to come-out to light to transpire, to become known
        • ir para as cucuias to go wrong cucuias is not a stand-alone word
        • a nu avea habar to have no idea habar is not a stand-alone word
        • biti si kvit to pay up a debt, owe nothing to somebody kvit is not a stand-alone word
        • не би било згорег ne bi bilo zgoreg it wouldn't be for the worse it wouldn't be a bad idea згорег zgoreg is not a stand-alone word
          читати (некоме) вакелу čitati (nekome) vakelu to read somebody a scolding to scold somebody вакела vakela is not a stand-alone word
          имати на претек imati na pretek to have an abundance претек pretek is not a stand-alone word
          не часити ne časiti don't jump the gun часити časiti is not a stand-alone word
        • att komma ihåg to remember ihåg is not a stand-alone word
      • further tests are required
        • правя на сос правя and сос are stand-alone words
        • sich um etw. herum stellen to stand around something → all words are stand-alone words
        • to go away go and away are stand-alone words
        • ir a la universidad to go to university ir, a, la and universidad are stand-alone words
        • unibertsitatera joan university-to go to go to university →both words are stand-alone
        • andare giù to go down andare and giù are stand-alone words
        • hij gaat weg he goes away gaan and weg are stand-alone words
        • wyznać tajemnicę to reveal a secret wyznać and tajemnica are standalone words
        • ir para a escola to go to school ir, para, a and escola are stand-alone words
        • a nu avea idee to have no idea → all words are stand-alone words
        • biti si v sorodu to be related to each other biti si and sorod are stand-alone words
        • бити квит biti kvit to be even бити biti and квит kvitare stand-alone words
        • att komma på to figure out komma and are stand-alone words

      Test VID.2 - [LEX] - Lexical inflexibility

      Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

      • it is a VID
        •  وضعه على الرَّف put it on the shelf  ـ→ وضعه على الطاولة # to put it on table
        • бълвам змии и гущери to spew snakes and lizards#бълвам влечуги (to spew reptiles)
          всяка жаба да си знае гьола every frog to know its own puddle#всяка жаба да си знае локвата
        • die Katze aus dem Sack lassen to let the cat out of the bag#den Hund aus dem Karton lassen #to let the dog out of the box
          eine Entscheidung treffen to meet a decision to make a decision#eine Entscheidung machen/herstellen a decision make/produce #to make/produce a decision
        • (OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ṭp ⸗f ꞽm (⸗ꞽ) (My) lord filled his head with (me).
        • κάνω την πάπια#κάνω τη χήναkano tin papia --> kano tin china make.1SG the duck play dumb
          φέρω βαρέως#φέρω ελαφρώς
          μπαίνει το νερό στ' αυλάκι#μπαίνει το νερό στο ποτάμι
        • to let the cat out of the bag#to allow the feline out of the container
          to go on*to go upon
          to stand firm/fast*to stand hard/rigid/solid
        • meterse en la boca del lobo to_get_into.self in the mouth from_the wolf venture into the lion's den#meterse en el ojo del gato
          tomar una decisiónto_take a decision to make a decision#hacer/coger/producir una decisión to_make/grab/produce a decision #to make/grab/produce a decision
        • erabakia hartu decision take to make a decision →erabakia #sortu/jaso/egin create/receive/do
        • περὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → o # περὶ καλοῦ ποιέομαι
        • non dire gatto se non ce l'hai nel sacco don't say cat if it is not in the sack don't count on something before it happens#non dire cane se non ce l'hai nel sacco#don't say dog if it is not in the sack
          sputare il rospo spit the toad spit it out#sputare la rana#spit the frog
        • 空気を読む atmosphere.acc read read the situation *大気を読む
          生計を立てる means.of.living.acc stand earn an income 生計を*起こす
        • een kat in de zak kopen to buy a pig in a poke#een hond in de zak kopen #to buy a dog in the bag
          een beslissing nemen to meet a decision to make a decision#een beslissing produceren a decision make/produce #to make/produce a decision
        • wiedzieć, co w trawie piszczy to know what in grass squeals to be well informed#wiedzieć, co w trawniku popiskuje
          nie wchodzić w rachubę not to come into count to be out of question#wchodzić w liczenie/rachunek
          wodzić kogoś za nos to lead someone by the nose to cheat on someone#wodzić za nozdrza/ucho/wargi
        • quebrar um galho break a branch to help#danificar um ramo to damage a stem
        • a da cu bâta în baltă to give with bat-the in pond to say sth embarrassing*a da cu bățul în baltă to give with stick-the in pond, *a da cu bâta în lac to give with bat-the in lake
        • imeti mačka to have a cat to have a hangover#imeti psa to have a dog
          iti rakom žvižgat to go whistling to crabs to fail, to die#iti jastogom pet to go singing to the lobsters
        • знати у ком грму лежи зец znati u kom grmu leži zec to know in which bush the rabbit lies to know what the main problem is#знати у ком жбуну лежи кунић #znati u kom žbunu leži kunić to know in which shrub the hare lies
          пустити буву pustiti buvu to let go of the fly to start a rumour/to spread news#пустити вашку #pustiti vašku to let go of the lice
          отети се контроли oteti se kontroli to break away from control to lose control#отети се провери #oteti se proveri to break away from the examination
        • att Plocka russinen ur kakan to pick the raisins out of the cake to choose only the best things#att välja ut nötterna från kakan
      • further tests are required
        • الطائرة أخذ take the plane أخذ الحافلة take a bus
        • изнасям доклад present a report → изнасям урок/лекция/презентация и т.н.
        • den Bus nehmen to take the bus → den Zug/ das Flugzeug, etc nehmen to take the train/plain/etc
        • παίρνω το λεωφορείοperno to leoforio take the bus to take the bus
        • to take a plane → to take a bus/car/boat, etc.
        • coger el autobús to_take the busto take the bus → coger el avión/tren, etc. to take the plain/train/etc.
        • autobusa hartu bus take to take the bus → trena/taxia/hegazkina hartu to take a train/taxi/plane
        • prendere il trenoto take the bus → prendere il bus/aereo/etc to take the bus/plain/etc
        • jqum u joqgħod always moving about
        • de bus nemen to take the bus → de trein, het vliegtuig, enz. nemen to take the train, plane, etc
        • sprawić kłopot to make a troublesprawić przykrość/trudność/niedogodność/problem/zawikłanie/nieprzyjemnośćto make a(n) nuisance/difficulty/inconvenience/problem/complication
        • quebrar um braço to break an arm → quebrar uma perna/costela/falange to break a leg/rib/phalanx
        • a lua o decizieto take a decision to make a decision → a lua o hotărâre to take a decree to make a decision
        • delati težave to make a troubledelati preglavice/probleme/ to make a(n) nuisance/problem
        • изазвати проблеме izazvati probleme to cause problems → изазвати бриге/невоље izazvati brige/nevolje to cause worries/afflictions
        • att ta bussen to take the bus → att ta tåget/flyget, etc to take the train/plain/etc

      Usual modifications for [LEX] include replacing content words in the candidate by synonyms, hypernyms, hyponyms, antonyms, troponyms, meronyms, and related words in general.

      Test VID.3 - [MORPH] - Morphological inflexibility

      Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • it is a VID
        • أخذ الثور من قرنيه to take the bull by his horns أخذ الثور من قرنه# take the bull by one horn
        • хвърлям око throw an eye to throw a glance#хвърлям очи.PLURAL
          хващам бика.DEF за рогата take the bull by the horns#хващам бик.INDEF за рогата
          не мога да си намеря място cannot find a place for myself to be extremely nervous → only exists in negative form
        • ins Gras beißen to bite into the grass to die#in ein Gras beißento bite into a grass #in die Gräser beißen to bite into the grasses, in Kraft treten into force step to come into effect#in Kräfte treten into forces step
        • (OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ꞽb.w ⸗f ꞽm (⸗ꞽ) (My) lord filled his hearts with (me).
        • κάνω του αλατιούkano tu alatiu do the salt #κάνω των αλατιών
        • to kick the bucket#to kick the buckets
          to pretty-print*to prettier-print
          to take turns#to take a turn
        • coger el toro por los cuernos to_take the bull by the horns to take the bull by the horns#coger el toro por el cuernoto_take the bull by the horn #to take the bulls by the horns to_take the bulls by the horns #to take the bulls by the horns
          entrar en vigor to_enter in vigor to come into effect#entrar en vigores to_enter in vigors #to come into effects
        • prendre le taureau par les cornes to_take the bull by the horns#prendre le taureau par une corne to_take the bull by a horn
        • περὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #περὶ πολλῶν ποιέομαι
        • andare a letto con le gallineto go to bed with the hens to go to bed early#andare a letto con la gallina to_go to bed with the hen
          cercare il pelo nell'uovo to look for the hair in the egg to be pedantic #cercare i peli nell'uovo
        •    
        • in de gaten houden keep an eye on#in het gat houden
        • budować zamki na lodzie to build castles on ice to rely on unstable foundations#budować zamek na lodzie to build a castle on ice
          mucha kogoś ugryzła a fly bit someone someone is in a bad temper#mucha kogoś ugryzie a fly will bite someone
          wyciągnąć nogito stretch.PERF legsto die#wyciągać nogi to stretch.IMPERF legs (imperfective aspectual variant prohibited)
        • bater perna hit leg to walk aroundbater a/uma/essas perna/pernas/perninha/pernona to hit the/one/these leg/legs/leg.SMALL/leg.BIG
        • a da colțul to give corner.the to die*a da colţurileto give corners.the
        • klicati jelene to call cerfs to vomit#klicati jelena to call a cerf
        • обећавати куле и градове obećavati kule i gradove to promise towers and towns to promise somebody the moon#обећавати кулу и град obećavati kulu i grad to promise a tower and a town
          бити у свакој чорби мирођија biti u svakoj čorbi mirođija to be the dil in every broth to meddle#бивај у свакој чорби мирођија bivaj u svakoj čorbi mirođija be the dil in every broth
          дође као кец на једанаест dođe kao kec na jedanaest comes as an ace on an eleven an unfavorable outcome#дође као кечеви на једанаест dođe kao kečevi na jedanaest comes as aces on an eleven
        • träda i kraft step in force to come into effect#träda i krafter step into forces
      • further tests are required
        • لعبة صنع to make a toy صنع ألعاب to make many toys
        • хвърлям топка to throw a ball → хвърлям топка/топката/топки/топките
        • einen Kuchen backen to bake a cake → viele/keine/den Kuchen backen/machen many/no/the cake bake/make
        • κάνω κουλούρια → κάνω νόστιμα κουλούρια
        • to make a cake → to make a/many/those/no cake/cakes
        • mover el brazo to_move the arm to move the arm → mover/agitar/levantar/estirar el brazo/la pierna/las manos/las piernas to_move/shake/raise/stretch the arm/the leg/the hands/the legs to move/shake/raise/stretch the arm/the leg/the hands/the legs
        • ἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → ἐπιστολὰς πέμπω
        • fare un dolce → fare un/molti/dei/quei/nessun dolce/dolci
        • een taart bakken to bake a cake → veel/geen/de taarten bakken/maken many/no/the cakes bake/make
        • kształtować opinię to form an opinionkształtować opinie to form opinions
        • bater o braço to hit the arm→ bater o/os/um/esse braço/braços/bracinho hit the/the.PL/a/this arm/arms/arm.SMALL
        • a face o prăjiturăto make a cake → a face multe/aceste prăjiturito make many/these cakes
        • vzeti taksi to take a cab → ne vzeti nobenega taksija/en taksi/dva taksija to take no/one/two/… cab(s)
        • обећавати улагање obećavati ulaganje to promise an investment → обећавати улагања obećavati ulaganja to promise investments
        • att baka en kaka to bake a cake → att baka flera/den där/några/ingen kaka/kakor to bake several/that/some/no cake(s)

      Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, tense, mood, aspect, etc. - depending on the target language's morphology.

      Test VID.4 - [MORPHSYNT] - Morpho-syntactic inflexibility

      Does a regular morpho-syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • it is a VID
        • یده ب أخذ take with his hand to give a hand يده في أخذ# to take in his hand
        • аз ти давам думата си I give you my word#аз ти давам неговата дума (I give you HIS word)
          аз си продавам душата I sell my soul#аз продавам неговата душа (I sell his soul)
        • Ichwerde mein Bestes tun I will my best do I will do my best*Ich werde dein Bestes tun I will do your best, Ich gebe dir mein Wort I give you my word*Ich gebe dir ihr Wort I give you her word
        • (OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ꞽb ⸗k ꞽm (⸗ꞽ) (My) lord filled your heart with (me). The suffix pronoun attached to ꞽb should agree in gender and number with the subject of this MWE.
        • Ο Γιάννης παίζει τα ρέστα του#Ο Γιάννης παίζει τα ρέστα μας
          Ο Γιάννης έριξε μαύρη πέτρα πίσω του#Ο Γιάννης έριξε μαύρη πέτρα πίσω μας
        • I will do my best*I will do your best
          I give you my word for that → #I give you his word for that
          he was pulling my leg#I was pulling my leg
        • te doy mi palabra to_you give_I my word I give you my word#te doy su palabra to_you give_I his/her word I give you his/her word
        • il vide son sac he empties his bag he reveals his secret thoughts#il vide mon sac he empties my bag
        • περὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #περὶ τοῦ πολλοῦ ποιέομαι
        • Iofarò del mio meglio*Io farò del tuo meglio
          Io ti do la mia parola#Io ti do la sua parola
        • 腹を立てるstomach.acc raiseget angry *明日、腹を立てよう cf.明日、旗を立てよう
          帰ら人となる return.NEG person become die*帰らない人となる, *帰らぬ人とな(らない、ろう、る?…)  
        • Ik zal mijn best doen I will my best do I will do my best*Ik zal jouw best doen I will do your best
        • Polish VMWEs do not seem to exhibit this kind of inflexibility
        • ele se suicidou he self.3P.SG suicided*ele me suicidou
          eu perdi meu tempo I wasted my timeeu perdi teu/seu/nosso tempo English allows this, Portuguese doesn't. We say I made you waste your time instead.
        • Îți dau cuvântul meu CL.DAT give.1SG word.the my I give you my word#Îți dau cuvântul luiCL.DAT give.1SG word.the his I give you his word
        • Vlečeš me za nos you are pulling my nose you're pulling my leg *Vlečeš se za nos you're pulling your nose
          Pojdi se solit! to go salt oneself Get lost!*Pojdi ga solit go salt him
        • дати све од себе dati sve odsebe to give one's all#дати све од тебе dati sve od tebe to give everything from you
        • Jag gör mitt bästa I do my best I do my best*Jag gör ditt bästa I do your best
      • further tests are required
        • копая си гроба to dig my graveкопая ти/му/й/им гроба (to dig your/his/her/their grave)
        • er traf seine Entscheidung he made his decision → er traf meine/ihre/unsere/eure Entscheidung he made my/her/our/your decision
        • he did his job → he did my/her/our/your job
        • Ha hecho su trabajo Has_he/she done his/her work He/She has done his/her workHa hecho mi/tu/nuestro trabajo Has_he/she done my/your/our work He/She has done my/your/our work
        • ἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → τὴν ἐπιστολὴν πέμπω
        • hafatto il suo lavoro → ha fatto il mio/tuo/nostro/vostro/loro lavoro
        • hij deed zijn werk → he did my/her/our/your job
        • Polish VMWEs do not seem to exhibit this kind of inflexibility
        • Eu fiz meu trabalho I did my job → Tu/ele/nós fizeste/fez/fizemos meu trabalho You/he/we made my job
        • el își face tema he his does homework.the he does his homework → el îmi/ne/le face tema he my/our/their does homework.the he does my/our/their homework
        • opravil je svojo nalogo he did his jobopravil je mojo/njeno/našo/tvojo nalogo he did my/her/our/your job
        • урадио је свој посао uradio je svojposao he did his jobурадио је мој посао uradio je moj posao he did my job
        • han gör sitt jobb he does his job → han gör mitt/hennes/vårt jobb he does my/her/our job

      Usual modifications for [MORPHSYNT] involve agreement or loss of agreement between some components in the candidate.

      Test VID.5 - [SYNT] - Syntactic inflexibility

      Does a regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • it is a VID
        • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person#продавам краставици на стар краставичар, #краставиците са продадени
          бълвам змии и гущери#бълвам гущери и змии
        • Noun phrase (NP) or prepositional phrase (PP)
        • (OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → ꞽb ⸗f mḥ.w ꞽm (⸗ꞽ) (Urk. I 99, 4) 'His heart was filled with (me)', i.e. 'His trust was earned by me'.
        • speak of the devil the person one is talking about shows up#he was speaking of the devil
          to go bananas to get crazy#bananas are gone
          to drink and drive#drive and drink
          to kick the bucket#the bucket was kicked
        • coser y cantar to_sew and to_sing easy as pie, a piece of cake#cantar y coser to sing and to sew
          perder la cabeza to_loose the head to go bananas#perder las cabezas to_loose the heads
        • περὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #ποιέομαι περὶ πολλοῦ
        • alzare la cresta to lift the crest become cocky#la cresta è stata alzata the crest has been lifted
          andare in malora go to ruin go to ruin #nella malora è andata in ruin was gone
          vivi e lascia vivere live and let live#lascia vivere e vivi let live and live
        • 満足がいくsatisfaction.nom gobe satisfied *満足にいかせる cf. 太郎がいく→太郎にいかせる
        • kleine bedrijven leggen het loodje small companies lay the lead get the short end of the stick#het loodje wordt gelegd
        • kogoś krew zalewa blood foods someone someone gets furious#ktoś jest zalewany przez krew someone is flooded by blood (passive blocked)
          robić bokami to do with-sidesto have serious financial problems→#robić swoją robotę bokami to do one's job with sides (regular modification blocked)
          dobrze komuś z oczu patrzy well someone.DAT from eyes lookssomeone looks like a good person#uprzejmość dobrze komuś z oczu patrzy kindness well someone.DAT from eyes looks (subject prohibited)
          nie zagrzać miejsca w pracy not to warm a place at worknot to stay long at one work #zagrzać miejsce w pracy to warm a place at work (negation is compulsory)
          zdechł pies! died the dog!it is a lost cause#pies zdechł the dog died (a regular word order variability is blocked)
          wziąć w łebto take into headto fail #wziąć porażkę w łeb to take failure into head(direct object prohibited for the normally transitive verb wziąćto take)
        • pisar na bola step on the ball make a mistake#a bola na qual ele pisou the ball on which he stepped
        • a da colțul to give corner.the to die*colțul a fost dat corner.the has been given
        • delati se Francoza to pretend to be French to pretend to be indifferent*delan Francoz made French
        • коцка је бачена kocka je bačena the die has been thrown the die has been cast#коцка се бацила kocka se bacila (blocked passive) the die cast itself
          ведрити и облачити vedriti i oblačiti to brighten and to cloud to call the shots#облачити и ведрити oblačiti i vedriti (regular word order variability is blocked) to cloud and to brighten
          не вредети пишљивог боба ne vredeti pišljivog boba to not be worth a single bean to be worthless#вредети пишљивог боба vredeti pišljivog boba (negation is compulsory) to be worth a single bean
          носити на души nositi na duši to carry something on one's soul to carry the burden of guilt#ношење на души nošenje na duši (nominalization blocked) carrying on a soul
        • det knallar och går it trots and walks it is OK/as usual#det går och knallar
      • It is not a VID, exit
        • продавам неговата кола I sell his car → колата му беше продадена (his car was sold), неговата кола, която тя продаде (his car which she sold), т.н.
        • jemandes Auto waschen to wash one's car → ihr Auto wurde gewaschen her car was washed, das Auto, welches sie wusch the car that she washed, Autowaschen car-washing, etc
        • to wash one's car → her car was washed, the car that she washed, car washing, etc.
        • pisar la arena to step on the sand → la arena que pisaste The sand on which you stepped
        • ἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → πέμπω ἐπιστολὴν
        • lavare la macchina →la sua macchina è stata lavata, la macchina che ha lavato, il lavaggio della macchina, etc.
        • iemands auto wassen to wash one's car → haar auto werd gewassen her car was washed , de auto, die zij waste the car that she washed, autowassen car-washing, etc.
        • kształtować opinię to form an opinion opinia jest kształtowana the opinion is formed
        • pisar na areia to step on the sand → a areia na qual você pisou the sand on which you stepped
          jogar futebol to play football → ?futebol é jogado football is played One may argue that this is a VMWE because passive sounds strange. However, we assume that this sense of jogar does not accept passive. Since this construction is very productive, we do not annotate it as VMWE.
        • a spăla maşinato wash the car→ maşina a fost spălată, maşina pe care a spălat-o, spălarea maşinii etc.the car was washed, the car that he/she washed, car washing
        • narediti film to make a movie → Film, narejen po knjigi a movie based on a book
        • написати књигу napisati knjigu to write a book → књига је написана knjiga je napisana the book is written
        • att tvätta bilen to wash one's carmin bil tvättades my car was washed, bilen som hon tvättade the car that she washed, biltvätt car-wash etc.

      Section 5.4

      Inherently reflexive verbs (IRV)

      Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.

      Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.

      Namely, we will only annotate a REFLV as an inherently reflexive verb (IRV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verbs constitute a quasi-universal category.

      IReflVs are a difficult category to annotate due to various problematic cases. Note in particular that in some languages, e.g. Slavic, the reflexive clitics inflect and should be considered not only in their most frequent case, i.e. accusative.

      We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IRV.

      • Inherently reflexive ⇒ ANNOTATE as IRV
        • The verb without the RCLI does not exist
          • усмихвам се to smile, страхувам се to be afraid
          • stydět se to be ashamed, divit se to wonder
          • sich schämen to be ashamed, sich wundern to wonder
          • (OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).
          • suicidarse to suicide, abstenerse to abstain
          • n.a.
          • s'évanouir to faint, se suicider to suicide
          • suicidarsi to suicide, arrabbiarsi to get angry
          • zich schamen to be ashamed, zich vergissen to be mistaken
          • dowiedzieć się to find out, bać się to be afraid
          • queixar-se to complain, abster-se to abstain
          • a se teme to be afraid with obligatory ACC reflexive clitic
            a își însuși to appropriate with obligatory DAT reflexive clitic
          • sramovati se to be ashamed, bati se to be afraid
          • стидети се stideti se to be ashamed,
            бојати се bojati se to be afraid
          • att försova sig to sleep in
            att gifta sig to get married
        • The verb without the RCLI does exist, but has a very different meaning
          • смея ≠ смея се to dare ≠ to smile, намирам ≠ намирам се to find ≠ to be situated
          • sich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle
          • (OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).
          • to find oneself in a difficult situation
            to to help oneself to the cookies
          • recoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insist
          • n.a.
          • s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to act
          • riferire ≠ riferirsi to report, tell ≠ to refer
          • zich aanstellen ≠ aanstellen to put on airs, to act ≠ to appoint, zich begeven ≠ begeven to proceed ≠ to break down, zich realiseren ≠ realiseren 'to realise (be aware) ≠ to realise (achieve)'
          • znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manage
          • encontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refer
          • a se îndura ≠ a îndura to have the heart ≠ to suffer
            a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IRV (it passes Test15)
          • dati se it is possible (to do something) ≠ dati to give, dobiti se to meet ≠ dobiti to get
          • губити ≠ губити се gubiti ≠ gubiti se to lose ≠ to pass out
          • att känna sig ledsen/arg to feel sad/angry ≠ to touch
      • Reciprocal ⇒ NOT ANNOTATED
        • The RCLI has a sense of mutually:
          • целувам се to kiss each other, срещам се to meet each other
          • líbat se to kiss each other, potkávat se to meet each other
          • sich küssen to kiss each other, sich treffen to meet each other
          • besarse to kiss each other, verse to see each other
          • n.a.
          • s'embrasser to kiss each other, se rencontrer to meet each other
          • baciarsi to kiss each other
          • całować się to kiss each other, spotykać się to meet each other
          • cumprimentar-se to greet each other, ver-se to see each other
          • a se saluta to greet each other
          • poljubljati se to kiss each other, srečati se to meet each other
          • пољубити се poljubiti se to kiss,
            срести се sresti se to meet
      • Reflexive ⇒ NOT ANNOTATED
        • The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
          • мия се to wash oneself, реша се to combe oneself
          • mýt se to wash oneself, drbat se to scratch oneself
          • sich waschen to wash oneself, sich kratzen to scratch oneself
          • (OEG) 𓇋𓅱 𓈖𓐩𓈖 𓇓𓅱 𓃹𓈖𓇋𓋴 ꞽw nč̣.n św Wnꞽś Unas (Wnꞽś) has-protected (nč̣.n) himself (św). Unas has protected himself. (PT 290c, W)
          • mirarse to look at oneself, vestirse to dress oneself
          • n.a.
          • se laver to wash oneself, se parler to talk to oneself
          • lavarsi to wash oneself, vestirsi to dress oneself
          • zich wassen to wash oneself, zich scheren to shave oneself
          • myć się to wash oneself, drapać się po głowie to scratch oneself on the head
          • apressar-se to hurry oneself, vestir-se to dress oneself
          • a se spăla to wash oneself
          • umivati se to wash oneself, praskati se to scratch oneself
          • умивати се umivati se to wash one's face,
            чешати се češati se to scratch oneself
          • att tvätta sig to wash oneself
      • Body part, also called possessive reflexive ⇒ NOT ANNOTATED
        • Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
          • мия си ръцете wash REFL.POSSESSIVE hands wash one's hands
          • mýt si nohy wash RCLI.DAT the feet wash one's feet
          • sich das Bein brechen RCLI the leg break break one's leg
          • (OEG) 𓂜 𓂻𓅱𓈖 𓇋𓋴 𓃹𓈖𓇋𓋴 𓆓𓋴 𓆑 nꞽ ꞽw.n ꞽś Wnꞽś č̣ś ⸗f Indeed (ꞽś), Unas (Wnꞽś), his (⸗f) body (č̣ś), cannot-come (nꞽ ꞽw.n). Indeed, Unas himself cannot come. (PT 333b, W)
          • rascarse el brazo scratch.RCLI the arm scratch one's arm
          • n.a.
          • se gratter la tête RCLI scratch the head scratch one's head
          • grattarsi la testa RCLI scratch the head scratch one's head
          • myć sobie nogi wash RCLI.DAT the feet wash one's feet
          • impossible, uses possessive instead
          • a-şi rupe mâna RCLI.DAT break arm break one's arm
          • umivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's arm
          • сломити си ногу to break RCLI the foot slomiti si nogu to break one own's leg,
            умити си лице umiti si lice to wash RCLI the face to was one own's face
      • Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
        • The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
        • Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
          • книги се пишат трудно books write.PL RCLI difficult it is difficult to write books
          • die Häuser verkaufen sich gut the houses sell RCLI well the houses sell well
          • las casas se venden bien the houses RCLI sell well the houses sell well
          • n.a.
          • les pots se vendent bien the pots RCLI sell well the pots sell well
          • le case si affittano the houses RCLI rent the houses are rented
          • domy dobrze się sprzedają houses sell.PL RCLI well houses sell well
          • as casas se vendem bem the houses RCLI sell well the houses sell well
          • casele se vând bine houses-the RCLI sell well houses sell well
          • hiše se dobro prodajajo the houses sell RCLI well the houses sell well
          • земља се добро продаје zemlja se dobro prodaje the land RCLI well sell the land's selling well
      • Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
        • In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
          • трудно се пишат книги difficult RCLI write.PL books it is difficult to write books
          • se alquilan casas RCLI rent houses people rent houses
          • n.a.
          • si affittano case RCLI rent houses people rent houses
          • dobrze sprzedają się te domy well sell RCLI these houses these houses sell well Polish is a relatively free word-order language and a postverbal subject is a regular (even if stylistically marked) alternation.
          • alugam-se casas rent-RCLI houses people rent houses
          • se vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
            se construiesc locuințe noi RCLI built houses new new houses are built
          • nove hiše se gradijo new houses RCLI built new houses are built
          • добро се продаје ова роба dobro se prodaje ova roba well RCLI sell these goods these goods are selling well
      • Impersonal ⇒ NOT ANNOTATED
        • The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
        • There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
        • The verb is in third person singular, even when the object is plural
          • не се вечеря късно not RCLI have dinner late it is not good to have dinner late
          • hier tanzt es sich gut here dances it RCLI well people dance well here
          • se busca a actores RCLI searches to actors people look for actors
            se trabaja mejor aquí RCLI works better here people work better here
          • n.a.
          • il se dit des bêtises it RCLI says silly things people say silly things
          • si lavora troppo RCLI works too much people work too much
            si affitta molte case RCLI rents many houses people rent many houses
          • za dużo się pracuje too much RCLI works people work too much
            bzdury się opowiada nonsense RCLI tells people tell nonsense
          • dorme-se muito sleeps-RCLI much people sleep a lot
            conta-se histórias tells-RCLI stories people tell stories Transitive impersonal is considered wrong by traditional grammar but it is found in corpora.
          • se lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
            se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
            se aleargă dimineața RCLI run in the morning people run in the morning
          • govori se/govorijo se neumnosti it says/they say RCLI silly things people say silly things
          • ради се превише radi se previše it works RCLI too much there's too much work being done,
            говоре се глупости govore se gluposti they say RCLI nonsense nonsense is being said
      • Inchoative ⇒ NOT ANNOTATED
        • Similar to middle, but the RCLI marks a less productive syntactic alternation:
          • the direct object of the transitive version appears as subject of the REFLV
          • the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
            • вратата се отваря the door opens
            • dveře se otvírají the door opens
            • die Tür öffnet sich the door opens
            • la puerta se abrió the door opened
            • n.a.
            • la porte s'est subitement ouverte the door suddenly opened
            • la porta si apre the door opens
            • drzwi się otwierają the door opens
            • o vaso se quebrou the vase broke
            • mașina s-a stricat the car broke down
              ușa s-a deschis the door opened
            • vrata se odpirajo the door opens
            • врата се отварају vrata se otvaraju the doors are opening
            • dörren öppnar sig the door opens

      IRV-specific decision tree

      • Apply test IRV.1 - [INHERENT]
        • Annotate as IRV
        • Apply test IRV.2 - [DIFF-SENSE]
          • Annotate as IRV
          • Apply test IRV.3 - [DIFF-SUBCAT]
            • Annotate as IRV
              • verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
                • It is not a VMWE, exit
                • Annotate as IRV
              • verb has a subject ⇒ Apply test IRV.5 - [MIDDLE-INCHO]
                • It is not a VMWE, exit
                • Apply test IRV.6 - [REFL]
                  • It is not a VMWE, exit
                    • subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
                      • It is not a VMWE, exit
                      • Annotate as IRV
                    • subject is PLURAL ⇒ Apply test IRV.8 - [RECIPRO]
                      • It is not a VMWE, exit
                      • Annotate as IRV

      Test IRV.1 - [INHERENT] Inherent clitic

      Does the verb only exist with the RCLI and never occurs without it?

      • annotate as IRV
        • страхувам се ⇒ *страхувам to be afraid
          усмихвам се ⇒ *усмихвам to smile
        • sich schämen ⇒ *schämen to be ashamed
          sich wundern ⇒ *wundern to wonder
        • (OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).
        • suicidarse ⇒ *suicidar to suicide
          abstenerse ⇒ *abstener to abstain
        • n.a.
        • s'évanouir ⇒ *évanouir to faint
          se suicider ⇒ *suicider to suicide
        • suicidarsi ⇒ *suicidare to suicide
        • zich schamen ⇒ *schamen to be ashamed
          zich vergissen ⇒ *vergissen to be mistaken
        • dowiedzieć się ⇒ *dowiedzieć to find out
          bać się ⇒ *bać to be afraid
          wydarzyć się ⇒ *wydarzyć to happen
        • queixar-se ⇒ *queixar to complain
          abster-se ⇒ *abster to abstain
        • a se teme ⇒ *a teme to be afraid
          a își însuși ⇒ *a însuși to appropriate
        • sramovati se ⇒ *sramovati to be ashamed
          čuditi se ⇒ *čuditi to wonder
        • бавити се ⇒ *бавити baviti se ⇒ *baviti to deal with,
          дивити се ⇒ *дивити diviti se ⇒ *diviti to admire
      • next test

      Test IRV.2 - [DIFF-SENSE] - Different sense

      Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?

      • annotate as IRV
        • намирам се ≠ намирам to be situated ≠ to find
          радвам се≠ радвам to feel happy ≠ to make happy
        • sich verstehen ≠ verstehen to get along well ≠ to understand
        • (OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).
        • to find oneself in a difficult situation
          to to help oneself to the cookies
        • recogerse ≠ recoger to go home ≠ to pick up, to gather
        • n.a.
        • s'apercevoir ≠ apercevoir to realize ≠ to see
          s'agir ≠ agir to be ≠ to act
        • riferirsi ≠ riferire to refer ≠ to report, to tell
        • zich voordoen ≠ voordoen to arise ≠ to show
        • znajdować się ≠ znajdować to find oneself ≠ to be
          sprawdzić się≠ sprawdzić to prove appropriate ≠ to check
          wybrać się≠ wybrać to go ≠ to choose
        • encontrar-se ≠ encontrar to be ≠ to meet
          referir-se ≠ referir to concern ≠ to refer
        • a se îndura ≠ a îndura to have the heart to ≠ to suffer
        • razumeti se ≠ razumeti to get along well ≠ to understand
        • знати ≠ знати се znati ≠ znati se to know ≠ to know someone,
          забављати ≠ забабљати се zabavljati ≠ zabavljati se to amuse someone else ≠ to amuse oneself to amuse someone ≠ to date someone
      • next test

      Test IRV.3 - [DIFF-SUBCAT] - Different subcategorization frame

      Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same syntactic argument as the RCLI in the REFLV version?

      • annotate as IRV
        • X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses Y
        • X se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Y
        • n.a.
        • X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
          X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
          X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to Vinf
        • X si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot Y
        • X verwondde zich aan Y ⇔ X verwondde Y X wounded/injured RCLI to Y ⇔ X wounded/injured Y
          X toonde zich ADJ ⇔ X toonde NOUN X showed RCLI ADJ ⇔ X showed NOUN ?? elle se trouve grosse want se trouver hier zelfde betekenis als trouver
        • X tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
          X dziwi się Y.dat ⇔ Y dziwi X ⇔ Z dziwi X Y.inst X surprises SELF Y.dat ⇔ Y surprises X ⇔ Z surprises X Z.inst
        • X se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot Y
        • X se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that Y
        • А се објаснио с Б ⇔ А је објаснио Б A se objasnio s B A resolved the issues with B ⇔ A explained something to B
      • next test

      Test IRV.4 - [IMPERS] - Impersonal

      When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?

      • do NOT annotate as verbal MWE
        • не се вечеря късно ⇔ хората не вечерят късно not RCLI have dinner late it is not good to have dinner late
        • hier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well here
        • se duerme mucho ⇔ las personas duermen mucho people sleep a lot
          se busca a actores ⇔ la gente busca a actores people look for actors
        • n.a.
        • il se dit des bêtises ⇔ les personnes disent des bêtises people say silly things
        • si dorme molto ⇔ le persone dormono molto people sleep a lot
          si affitta molte case ⇔ le persone affittano molte case people rent many houses
        • pracuje się za dużo ⇔ ludzie pracują za dużo people work too much
          opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsense
        • dorme-se muito ⇔ as pessoas dormem muito people sleep a lot
          conta-se histórias ⇔ as pessoas contam histórias people tell stories
        • se lucrează până târziu ⇔ lumea lucrează până târziu people work until late
          se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morning
        • govorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsense
        • ради се превише. ⇔ људи раде превише. radi se previše. ⇔ ljudi rade previše. there's too much work being done ⇔ people are working too much.
      • annotate as IRV

      Test IRV.5 - [MIDDLE-INCHO] - Middle or Inchoative

      When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?

      • do NOT annotate as verbal MWE
        • някой отваря вратата ⇒ вратата се отваря somebody opens the door ⇒ the door opens
        • man kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
          jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opens
        • la gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
          alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door opened
        • n.a.
        • on vend bien ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
          quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opens
        • qualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
          qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opens
        • ktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
          ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
          ktoś nasila skargi ⇒ skargi nasilają się somebody increases complaints ⇒ complaints increase
          ktoś rozgrywa mecz ⇒ mecz rozgrywa się somebody plays a game ⇒ the game plays
        • alguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ tell.PL-RCLI stories somebody tells stories ⇒ stories are told
          alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy ⇒ the boy RCLI calmedsomebody calmed the boy down ⇒ the boy calmed down
          o juiz casou João com Maria ⇒ João se casou com Maria the judge married João with Maria ⇒ João RCLI married with Maria the judge married João with Maria ⇒ João got married to Maria
          o juiz casou Maria e João ⇒ Maria e João se casaram the judge married Maria and João ⇒ Maria and João RCLI married the judge married Maria and João ⇒ Maria and João got married
          alguém lembrou João do meu aniversário ⇒ João se lembrou do meu aniversário somebody reminded João of my birthday ⇒ João RCLI reminded of my birthday somebody reminded João of my birthday ⇒ João remembered my birthday
        • cineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
          cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door opened
        • nekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
          nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door opened
        • неко је отварао врата ⇒ врата се отварају neko je otvarao vrata ⇒ vrata se otvaraju someone was opening the doors ⇒ the doors were being opened,
          неко шири гласине ⇒ галасине се шире neko širi glasine ⇒ glasine se šire someone's spreading the rumors ⇒ the rumors are being spread
      • next test

      Test IRV.6 - [REFL] - Reflexive

      When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?

      • do NOT annotate as verbal MWE
        • Павел лекува себе си ⇒ Павел се лекува Pavel heals himself
        • Paul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himself
        • Paul washes only himself ⇒ Paul washes himself
        • Pablo se lava a sí mismo ⇒ Pablo se lava Paul washes himself
        • n.a.
        • Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
          Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himself
        • Paolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
          Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himself
        • Paul wast alleen zichzelf ⇒ Paul wast zich(zelf) Paul washes himself
        • Paweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
          Paweł bogaci tylko siebie ⇒ Paweł bogaci się Paul enriches himself Paul gets rich
          Paweł myje tylko siebie ⇒ Paweł myje się Paul washes himself
        • Paulo só lava a si mesmo ⇒ Paulo se lava Paul washes himself
        • Paul se spală doar pe sine ⇒ Paul se spală. Paul washes himself
        • Pavel praska sam sebe ⇒ Pavel se praska Paul scratches himself
        • Марко лечи сам себе ==> Марко се лечи Marko leči sam sebe ==> Marko se leči Marko is treating himself ==> Marko is getting treated
      • next test

      Test IRV.7 - [REFL-MUTUAL] - Reflexive-mutual

      Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning?

      • do NOT annotate as verbal MWE The test applies only if test 15 has failed. For example, for "X se marie" 'X gets married' in French, it is odd though possible to say 'X and Y marry each other', but this does not mean 'X gets married', because it is only possible if X and Y are marriage officiants
        • Павел се мие ⇔ те се мият един друг they wash each other
        • Paul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each other
        • Pablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each other
        • n.a.
        • Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each other
        • Paolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each other
        • Paul wast zich ⇔ Zij wassen elkaar they wash each other
        • Paweł się myje ⇔ oni myją się nawzajem they wash each other
        • Paulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each other
        • el se spală ⇔ ei se spală unul pe altul they wash each other
        • Pavel se umiva ⇔ umivajo drug drugega they wash each other
        • Марко се забавља ⇔ они један другог забављају Marko se zabavlja ⇔ oni jedan drugog zabavljaju Marko is amusing himself ⇔ they are amusing one another
      • annotate as IRV

      Test IRV.8 - [RECIPRO] - Reciprocal

      Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning? That is:

      • Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
      • Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
      • do NOT annotate as verbal MWE
        • Павел и Елена се целуват ⇔ Павел целува Елена и Елена целува Павел Pavel and Elena kiss
        • Paul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
          die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each other
        • Pablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
          los niños se abrazan ⇔ los niños abrazan a los niños the children hug each other
        • n.a.
        • Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
          les jours se suivent ⇔ les jours suivent les jours the days follow each other
        • Giovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
          i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altro
        • Paweł i Elena całują się ⇔ Paweł całuje Elenę i Elena całuje Pawła, Paweł i Elena całują się nawzajem Paweł kisses Elena and Elena kisses Paweł, Paweł and Elena kiss
        • João e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
          os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each other
        • Ion şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
          participanții se salută ⇔ participanții îi salută pe participanți the participants greet each other
        • Pavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each other
        • М и Н су се пољубили ⇔ М је пољубио Н и Н је пољубила М M i N su se poljubili ⇔ M je poljubio N i N je poljubila M M and N kissed ⇔ M kissed N and N kissed M
      • annotate as IRV

      Problematic cases and remarks

      Polysemy

      Keep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IRV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.

      Clitics position and concatenation

      In some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.

      • In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
        • s'abstenir to abstain
      • In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
        • enamorarse to fall in love
        • alzarsi to get up
      • In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
        • queixar-se-ia would complain
      • In Romanian the clitic and the verb are either separate or have a hyphen between them:
        • se aude un clopot RCLI hears a bell a bell is heard
          s-aude un clopot RCLI-hears a bell a bell is heard

      The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.

      Overlap VID - IRV

      Some idiomatic constructions include reflexive clitics. Two cases are possible:

      • If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the VID:
        • пилците се броят наесен chicken REFL are counted in the autumn the true results can be seen only at the endкокошките се броят the hens REFL counted
        • sich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun N
        • darse cuenta de to realize ⇒ *darse N de for any noun N
          meterse en líos to get in troubleREFLV not annotated in literal equivalents like meterse en una tienda to get in a store
        • n.a.
        • se rendre compte de to realize ⇒ *se rendre N de for any noun N
          s'arracher les cheveux RCLI tear the hair worryREFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nail
        • rendersi conto di to realize ⇒ *si rende N di for any noun N
          si strappa i capelli RCLI tear the hair to worryREFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nail
        • zich uit de voeten maken RCLI out of the feet make to get out of the way ⇒ *zich uit de N maken for any noun N
          zich in de kijker spelen RCLI in the field-glass play to attract attention with one's skills ⇒ *zich in de N spelen for any noun N
        • zdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun N
        • dar-se mal to faildar-se ADV intransitive is acceptable only for antonym bem well
          meter-se numa fria to get-RCLI in a cold to get in troubleREFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabin
        • a-și smulge părul din cap
        • puliti si lase tear RCLI the hair to worryREFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrows
        • китити се туђим перјем kititi se tuđim perjem decorate RCLI someone else's feathers steal someone's thunder; take credit for someone else's accomplishments
      • If the REFLV would be annotated as IRV in syntactically comparable literal constructions, annotate both the IRV and the VID as embedded MWEs (rare):
        • смея се през сълзи laugh REFL through tears to laugh bitterly
        • n.a.
        • rozlatywać się w proch scatter itself into dust disappear
        • virar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/become
        • a i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IRV; both the IRV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of a VID (6.4_R)
          a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to go
        • režati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IRV
        • смејати се као луд smejati se kao lud to laugh like crazy
      Overlap LVC - IRV

      It is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IRV:

      • Fragen stellen to ask questionssich Fragen stellen to doubt/hesitate
      • hacer preguntas to ask questionshacerse preguntas to doubt/hesitate
      • n.a.
      • poser des questions to ask questionsse poser des questions to doubt/hesitate
      • [No example yet]
      • no examples found for RO

      In this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as VID, since there are two syntactic arguments:

      • sich Fragen stellen to doubt/hesitate
      • hacerse preguntas to doubt/hesitate
      • n.a.
      • se poser des questions
      • no examples found for RO

      Notice that annotating only the verb and the RCLI as IRV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IRV:

      • sich stellen to surrender
      • hacerse get used to
      • n.a.
      • se poser to sit/lay down
      Dative clitics and double clitics

      In some languages, e.g. Polish, clitics inflect for case. Most cases of IRV seem to be restricted to the accusative case:

      • страхувам се to be afraid
      • bát se to be afraid
      • n.a.
      • n.a.
      • bać się to be afraid
      • a se sinchisito RCLI.ACC care to care
        a se sfiito RCLI.ACC be.shy to be shy
        a se căito RCLI.ACC repent to repent
      • bati se to be afraid
      • бојати се bojati se to be afraid

      However, other cases can appear in IRV:

      • отивам си to go oneself.DAT to go away
      • poradit si to advise oneself.DAT to manage
      • n.a.
      • n.a.
      • radzić sobie to advise oneself.DAT to manage
      • a-și însuși to-RCLI.DAT appropriateto appropriate - with a Dative clitic
        a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative clitic
      • drzniti si to dare oneself.DAT to dare

      Some expressions can have double clitics. Only the first two words belong to the IRV:

      • надсмивам се над себе си to laugh RCLI.acc at RCLI.DAT to laugh at myself
      • n.a.
      • n.a.
      • przyglądać się sobie to observe RCLI.acc RCLI.DAT to observe each other
        radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneself
      • n.a.
      • nasmehniti se sebi to smile at oneself
      • подсмевати се сам себи podsmevati se sam sebi to make fun of oneself
      Non-reflexive clitics

      This category does not cover other types of pronouns and clitics. They are covered by regular VID tests and should be annotated as such. Examples of constructions that should be annotated as VID rather than IRV include:

      • es gibt it gives there is
      • n.a.
      • l'emporter to take it away to win
        s'en aller to self from-it go to leave
        en avoir marre to have from-it enough to be fed up
        il y avoir it at-it haveto exist
      • prender-ci to take to-it to make the right choice
        prender-le to take it to be beaten
      • dá-lhe João! give to-him/her, João! show them what you got, João!
      • a-i arde to CL.DAT burn to have a desire
        a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
        a-i repugnato CL.DAT loathe to loathe
        a-i priito CL.DATto be favourable to sb.
      • ucvreti jo to escape her to escape something/someone by running
      • мрзи ме/мрзи те/мрзи га/... mrzi ме/mrzi те/mrzi га/... to bother me/to bother you/to bother him/... I cannot be bother/you cannot be bothered/he cannot be bothered/...

      Section 5.5

      Idiomatic verb-particle constructions (IVPCs)

      In the previous versions of the guidelines, this category was called VPC (verb-particle construction).

      Idiomatic verb-­particle constructions (IVPCs), sometimes called (idiomatic) phrasal verbs or phrasal-prepositional verbs, like

      • n.a.
      • um|fahren over|drive to run over,mit|kommen with|come to join,vor|bereiten before|prepare to prepare
      • to put off, to blow up, to do in
      • n.a.
      • n.a.
      • buttare giùn throw down to swallow
      • voor|bereiden before|prepare to prepare
      • n.a.
      • n.a.
      • n.a.

      constitute another quasi-universal category. They have the following general characteristics:

      1. They are formed by a lexicalized head verb v and a lexicalized particle p dependent on v.
      2. The meaning of the IVPC is fully or partly non-compositional.
        • In fully non-compositional IVPC (IVPC.full) the change in the meaning of v goes significantly beyond adding the meaning of p:
          • n.a.
          • die Fische sind eingegangen the fish went in the fish died
          • to do in to kill, destroy, cheat or harm severely
          • n.a.
          • rondkomen round-come to make ends meet
          • n.a.
          • n.a.
        • In semi-non-compositional IVPCs (IVPC.semi), p adds a partly predictable but non-spatial meaning to v
          • n.a.
          • to eat up to eat completely
          • n.a.
          • opeten to eat completely
          • n.a.
          • n.a.

      IVPCs are pervasive in English, German, Swedish, Hungarian and possibly some other languages but irrelevant to or infrequent in Romance and Slavic languages or in Farsi and Greek for instance.

      In some Germanic languages and also in Hungarian, verb-particle constructions can be spelled either as one (multiword) token or separated. Both types of occurrences are to be annotated:

      • n.a.
      • Die Kinder sollen in der Schule aufpassen The children must pay attention at school
        Herr Müller, passen Sie auf! Mr. Müller, be careful
      • n.a.
      • n.a.
      • Ongelukken komen voor Accidents happen
        Ongelukken kunnen voorkomen Accidents can happen
      • n.a.
      • n.a.
      • n.a.

      The first challenge in identifying an IVPC is to properly distinguish the particle from a possibly homographic preposition, e.g.:

      • n.a.
      • to look up the number vs to look up the chimney
      • n.a.
      • n.a.
      • ???
      • n.a.
      • n.a.
      • n.a.

      or a verbal prefix:

      • n.a.
      • um- in um|fahren vs umfahren
      • n.a.
      • n.a.
      • voor- in voor|komen to occur vs voorkomen to prevent
      • n.a.
      • n.a.
      • n.a.

      Namely, a particle, contrary to a preposition, cannot govern a complement. This can be tested depending on the verb's subcategorization frame:

      • For intransitive verbs, the particle can occur without an NP. The fact that there is no NP that could be governed by the particle to form a PP shows that it is a particle rather than a preposition.
      • For transitive verbs, the particle can occur either before or after the direct object. The fact that it is mobile and can go before or after the NP shows that it is a particle rather than a preposition
      • n.a.
      • intransitive: The airplane took off
        transitive The fire did in the whole block or The fire did it in
      • n.a.
      • n.a.
      • ???intransitive: Ongelukken komen voor
        ???transitive Hans is zijn moeder aan het opbellen or Hans is zijn moeder op aan het bellen
      • n.a.
      • n.a.
      • n.a.

      Prefixes, contrary to particles, can never be spelled separately from the verb, nor can the past tense of prefixed verbs be formed with the infix -ge-

      • n.a.
      • *er fuhr den See um
        *er hat den See umgefahren, instead: er hat den See umfahren he drove around the lake but: er hat das Schild umgefahren he run over the sign
      • n.a.
      • n.a.
      • aanbidden to worship *aangebeden
      • n.a.
      • n.a.
      • n.a.

      See the language-specific tests for more details on distinguishing particles from prepositions and verbal prefixes.

      Note that in this shared task we do not account for compositional verb-particle combinations, i.e. those whose meaning can be deduced from the meaning of the preposition and of the verb:

      • n.a.
      • er legt das Buch ab he puts down the book, er kommt ins Haus rein he comes into the house he enters the house
      • to lie down, You may go in now
      • n.a.
      • n.a.
      • hij legt het boek neer he puts down the book, hij komt het huis binnen he comes into the house he enters the house
      • n.a.
      • n.a.
      • n.a.

      Some combinations may have both compositional and non-compositional meanings depending on the context and only the latter should be annotated:

      • n.a.
      • ein Schild aufstellen to put up a sign vs. einen Plan aufstellen to draw up a plan
      • to put up a flag vs. to put up a friend for the night
      • n.a.
      • n.a.
      • apparatuur opstellen to put up equipment vs. een rooster opstellen to draw up a roster
      • n.a.
      • n.a.
      • n.a.

      the following decision tree should be applied to decide whether a candidate should be annotated as a IVPC or not.

      IVPC-specific decision tree:

      • Apply test IVPC.1 - [PART-REDUC: Can the verb without the particle refer to the same event?]
        • It is a IVPC.full.
        • Apply test IVPC.2 - [PART-SPATIAL: Is the particle spatial?]
          • It is not an IVPC, exit
          • Apply test IVPC.3 - [PART-SPATIAL-LIT: Is the particle spatial in a literal reading?]
            • It is a IVPC.semi
            • It is not an IVPC, exit

      Test IVPC.1 - [PART-REDUC] - Verb without the particle refers to the same event/state

      Can a sentence without the particle refer to the same event/state as the sentence with the particle? Special care must be taken when the same construction might or might not be a valid VPC depending on its context.

      • It is an IVPC.full.
        • n.a.
        • Der Lehrling fängt ein Praktikum an the apprentice catches an internship on the apprentice begins an internship does not imply #Der Lehrling fängt ein Praktikum the apprentice catches an internship
          Die Bäuerin hat sich wieder eingefangen the farmer’s wife has herself again catched the farmer’s wife has calmed down again does not imply #Die Bäuerin hat sich wieder gefangen the farmer’s wife has catched herself again
          Der Schüler legt die Prüfung ab the pupil lays the exam off the pupil takes the exam does not imply #der Schüler legt die Prüfung the pupil lays the exam
          Das Schiff legt vom Hafen ab the boat lays from the harbor off the ship leaves the harbor does not imply #das Schiff legt vom Hafen the boat lays from the harbor
        • to do somebody in to kill sb does not imply #to do somebody
          to check in upon arrival does not imply #to check upon arrival
        • n.a.
        • n.a.
        • A meccs után csak az edző nem rúgott be Only the coach did not get drunk after the match A meccs után az edző berúgottThe coach got drunk after the match does not imply #Az edző rúgott the coach kicked
          Nem jött be ez a koktél nekem I didn’t like this cocktail Bejött ez a koktél nekem I liked this cocktail does not imply #Jött ez a koktél nekem this cocktail bumped into me
        • De leerling legt een examen af the pupil lays the exam off the pupil takes the exam does not imply #de leerling legt een examen the pupil lays an exam
        • n.a.
        • n.a.
        • n.a.
      • Go to the next test.
        • n.a.
        • Der Bauer fängt die Hühner ein the farmer catches the chickens in the farmer catches the chickens implies der Bauer fängt die Hühner the farmer catches the chickens
          Der Lehrer legt das Buch auf dem Tisch ab the teacher lays the book on the table apart the teacher puts the book away on the table implies Der Lehrer legt das Buch auf den Tisch the teacher puts the book on the table
          Der Lehrer legt den Mantel ab the teacher lays the coat off the teacher takes off his coat implies Der Lehrer legt den Mantel the teacher puts the coat
        • to look up into the sky implies to look into the sky
          to eat up the cookies implies to eat the cookies
        • n.a.
        • n.a.
        • A csatár nem rúgta be a helyzetét The forward missed its chance to score a goal A csatár berúgta a helyzetét implies A csatár rúgott The forward kicked
          Nem jött be a szobába He did not come into the room (Bejött a szobába he entered the room implies Jött a szobába he came into the room
        • de koekjes opeten to eat up the cookies implies de koekjes eten
        • n.a.
        • n.a.
        • n.a.

      Test IVPC.2 - [PART-SPATIAL] - Spatial particle

      Is the particle spatial in the context of the verb, i.e. does it express direction or position?

      • It is not an IVPC, exit.
        • n.a.
        • to stand up
          to give something back
          to stay up tonight
          You may go in now
          to mix ingredients together
        • n.a.
        • opstaan to stand up
          aankijken look at
          iets optillen to lift something up
          slijm ophoesten cough up phlegm
        • n.a.
        • n.a.
      • Go to the next test
        • n.a.
        • to eat the cookies up
          to mix ideas together
        • n.a.
        • de koekjes opeten to eat up the cookies
        • n.a.
        • n.a.

      Test IVPC.3 - [PART-SPATIAL-LIT] - Spatial particle in a literal reading

      Does the IVPC candidate have a literal counterpart in which the particle is spatial, i.e. expresses direction or position?

      • It is not an IVPC, exit.
        • n.a.
        • to mix ideas together
        • n.a.
        • n.a.
        • n.a.
      • It is a IVPC.semi.
        • n.a.
        • to eat the cookies up
        • n.a.
        • de koekjes opeten to eat up the cookies
        • n.a.
        • n.a.

      Section 5.6

      Multi-verb constructions (MVC)

      Multi-verb constructions (MVC) constitute a quasi-universal category. They are VMWEs composed by a sequence of two adjacent verbs (in a language-dependent order), a functionally governing verb V-gov (also called a vector verb) and a functionally dependent verb V_dep (also called a pole/polar verb), which have the following characteristics:

      1. They usually have the same subject.
      2. They usually denote actions that are closely connected and may be seen as part of the same event.
      3. They function together as a single predicate.
      4. They are unaccompanied by any explicit coordination, subordination, or dependency marker.
      5. They only have a single tense, aspect and polarity value.
      6. They may be idiomatic or indicate successions of events.
      7. The V-gov (vector) verb is semantically delexicalized and the V-dep (polar) verb contains the core meaning of the whole. Note that V-dep might be seen as the head and V-gov as the dependent, in dependency frameworks such as Universal Dependencies, where the principle of the primacy of content words is applied.

      The behavior of MVCs is very heterogeneous across languages. Therefore, most tests for the detection of MVCs are language specific. The current tests were designed for Indonesian, Hindi, Japanese and Chinese. The generalization of these tests cross-lingually is planned as future work.

      MVC-specific decision tree for Hindi

      • Apply Test MVC.1.BASE - [MVC-STRUCT-BASE: V-dep is non finite and V-gov bears inflection?]
        • It is not a VMWE, exit
        • Apply Test MVC.3.KAR - [INS-REDIRECT-KAR: kar or ke appears just after V-dep?]
          • Apply Test MVC.6 - [MANNER: V-gov indicates the manner/means/direction of V-dep?]
            • It is a manner serial verb, not a VMWE, exit
            • Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep?]
              • It is a reason serial verb, not a VMWE, exit
              • Apply Test MVC.8 - [SEQ: V-gov and V-dep bound by temporal sequence?]
                • It is a temporal sequence serial verb, not a VMWE, exit
                • Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
                  • It is a serial verb expressing simultaneous actions, not a VMWE, exit
                  • Continue to the next test
          • Apply Test MVC.10 - [LIGHT: V-gov in the closed list of light verbs?]
            • Annotate as MVC
            • Apply Test MVC.13 - [V-LEX: V-dep refers to the same event/state as V-gov+V-dep?]
              • It is not a VMWE, exit
              • Annotate as an MVC

      MVC-specific decision tree for Chinese

      • Apply Test MVC.2.ASPECT - [INS-DISCARD-ASP: V-gov can take un aspect marker –le or –guo?]
        • It is not a MVC, exit
        • Apply Test MVC.5 - [MODAL: V-gov is a modal or an auxiliary verb?]
          • It is not a MVC, exit
          • Apply Test MVC.6 - [MANNER: V-gov indicates the manner/means/direction of V-dep (or vise versa)?]
            • It is not a MVC, exit
            • Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep (or vise versa)?]
              • It is not a MVC, exit
              • Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
                • It is not a MVC, exit
                • Apply Test MVC.4 - [SHARE-ARGS: V-gov and V-dep share arguments?]
                  • Annotate as an MVC
                  • It is not a MVC, exit

      MVC-specific decision tree for Indonesian and Japanese

        TODO (in the meantime, follow the tests one by one)

      MVC-specific decision tree for any other language

      • Apply directly Test MVC.13 - [COMP: V-dep refers to the same event/state as V-gov+V-dep?]
        • It is not a VMWE, exit
        • Annotate as an MVC

      Test MVC.1 - [MVC-STRUCT] MVC-like structure

      Does the candidate respect the necessary structural (language-dependent) requirements for an MVC?

      Hindi

      Test MVC.1.BASE [MVC-STRUCT-BASE]: Is V-dep non finite and does V-gov carry the tense, aspect and agreement inflections?

      • continue to the next test
        • n.a.
        • n.a.
        • n.a.
      • it is not an MVC
        • n.a.
        • n.a.
        • n.a.

      Japanese

      Test MVC.1.IMORPH: Does the first verb (V-dep) contain the i-morph suffix?

      • continue to the next test
        • n.a.
        • n.a.
        • 焼きついyakitsuisear into one's mind → the first verb 焼き yakiburnis inflected in the i-morph ending
        • n.a.
        • n.a.
      • it is not a MVC
        • n.a.
        • n.a.
        • n.a.

      Any other language

      Go to the next test

      Test MVC.2 - [INS-DISCARD] Insertion which discards

      Does the candidate sequence appear, or could it appear, with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that this candidate is a regular combination and should be discarded?

      Chinese

      Test MVC.2.ASPECT - [INS-DISCARD-ASP]: Can the aspect marker -leperfective or -guoprovide the meaning of the prefix be inserted between between V-gov and V-dep (or the opposite)?

      • it is NOT an MVC
        • I看出来 kànchūlái figure out→ 我看wǒkàn I seele aspect marker出来 chūlái exit→ The insertion of the aspect markerle aspect markeris grammatically sound
      • continue to next test
        • I听说tīngshuō heard → *我听wǒtīng I heard le aspect markershuō say→ The insertion of the aspect markerle aspect marker leads to ungrammaticality in the phrase

      Indonesian

      Test MVC.2.PRON - [INS-DISCARD-PRON]: Can a pronoun like dia he/she be inserted between the first [AS: between V-gov and V-dep or the opposite?] and second verb?

      • it is NOT an MVC
        • n.a.
        • n.a.
        • n.a.
      • continue to next test
        • n.a.
        • n.a.
        • n.a.

      Test MVC.2.CLAUSE - - [INS-DISCARD-CLAUSE]: Can a that-clause like bahwa that, or a whether-clause like apakah whether be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?], where the first verb [AS: V-gov?] is a saying verb like mengatakan say or an asking verb like menanyakan ask?

      • it is NOT an MVC
        • n.a.
        • n.a.
        • n.a.
      • continue to next test
        • n.a.
        • n.a.
        • n.a.

      Test MVC.2.PURPOSE - [INS-DISCARD-PURP]: Can untuk for/to be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?]?

      • it is a purpose serial verb, not an MVC
        • n.a.
        • Saya Ibersiap pergi get ready to go= SayaI bersiap untuk pergi get ready for the purpose of going→ The insertion of untuk for/to is grammatically sound and does not change the meaning of the sentence. Although it is possible to insert untukfor/to between first and second verb, it is usually unnecessary and omitted.
        • n.a.
        • n.a.
      • continue to next test

      Japanese

      Test MVC.2.HONOR - [INS-DISCARD-HONOR]: Is the first verb [AS: V-gov or V-dep?] preceded by the honorific particle お o and is the second verb する/できるsuru/dekiru?

      • it is NOT an MVC, but an honorific construction.
        • n.a.
        • n.a.
        • お-話し-する o-hanasi-suru I humbly talk
        • n.a.
        • n.a.
      • continue to next test

      Any other language

      Go to the next test

      Test MVC.3 - [INS-REDIRECT] Insertion which redirects

      Does the candidate sequence appear with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that a particular test should be applied next?

      Hindi

      Test MVC.3.KAR - [INS-REDIRECT-KAR]: Does conjunctive participle kar or ke appear attached to or immediately after V-dep?

      Any other language

      Go to the next test

      Test MVC.4 - [SHARE-ARGS] Shared arguments

      Do V-gov and V-dep share arguments?

      • it is an MVC
        • n.a.
      • it is not an MVC
        • n.a.
      • Test MVC.5 - [MODAL] Modal or auxiliary verb

        Chinese

        Is V-gov a modal or an auxiliary verb?

        • it is NOT an MVC
          • n.a.
          • n.a.
          • 可以 kéyǐcan, 可能 kěnéngmight, 会 huìwill, 必须 bìxūmust, 需要 xūyàoneed to, 要 yàowant to, 能 néngable to, 应该 yīng gāishould
        • continue to next test
          • n.a.
          • n.a.

        Any other language

        Go to the next test

        Test MVC.6 - [MANNER] Manner verb

        Chinese, Hindi, Indonesian, Japanese

        Does V-gov indicate the manner or means (and possibly a direction) of the action expressed by V-dev (in Chinese: or vice versa)?

        • it is a manner serial verb, not an MVC
          • n.a.
          • n.a.
          • us-ne ciikh-kar mujh-e bulaa-yaa He-erg yell-ConjPpl I-dative call-perf he called me by screaming
          • pulang melalui return-home pass-through go home by passing through (a place)
          • 投げ込み nage komi throw go in throw into
            なぐり殺し naguri korosi punch kill kill by punching
          • 走进来 zǒu jìnláiwalk enter walk into (a place)
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.7 - [REASON] Reason verb

        Hindi and Chinese

        Does V-gov indicate the reason of the action expressed by V-dep (in Chinese: or vice versa)?

        • it is a reason serial verb, not an MVC
          • n.a.
          • n.a.
          • vo melaa jaa-kar khush hu-aa he fair go-ConjPpl happy become-perf he got happy having gone to the fair
          • n.a.
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.8 - [SEQ] Temporal sequence

        Hindi, Indonesian, Japanese

        Are the verbs bound by a temporal sequence?

        • it is a sequential serial verb, not an MVC
          • n.a.
          • n.a.
          • us-ne gilaas banaa-kar bec-aa he-erg glass make-ConjPpl sell-perf having made the glass, he sold it
          • bersiap pergi prepare go prepare in order to go (somewhere) → the first verb must happen before the second verb happens, otherwise the sentence will not make sense.
          • 夫人が最初にfujin ga saisho ni the wife first叩き起こさtataki okosa hit to awakenre verb suffix != #夫人が最初にfujin ga saisho ni the wife first起き叩さtataki okosa hit to awakenre verb suffix→ The two verbs 叩き tataki hitand 起こさ okosa awakenare bound by temporal sequence, such that if the order is switched, the sentence does not make sense.
          • n.a.
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.9 - [SIMULT] Simultaneous actions

        Do the verbs indicate rapid and simultaneous actions (without resorting to a coordination conjunction)?

        • it is a serial verb expressing simultaneous actions, not an MVC
          • n.a.
          • n.a.
          • berlari menuju run head-towards run and go towards
          • n.a.
        • continue to next test

        Test MVC.10 - [LIGHT] Light verb

        Hindi

        Does V-gov belong to a closed list of light verbs: aa come, baiTh sit, chal go, chuk finish, choR leave, Daal throw, de give, ja go, jataa declare, khaa eat, lagaa put, le take, maar hit, paa get/obtain, paRh fall, rakh keep, uTh rise?

        • it is a (light) MVC
          • n.a.
          • n.a.
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.11 - [PREP-LIKE] Preposition-like verb

        Chinese

        [Hongzhi Xu: this test is not very clear and is only specific to one particular MVC (it should probably be deleted in future editions)] Is the second verb in the candidate [AS: V-gov or V-dep?] a preposition-like verb like chéng become?

        • it is a preposition-like MVC
          • n.a.
          • n.a.
          • n.a.
          • 排列成 páiliè chéng arrange become arrange into (something)
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.12 - [NOUN-LIKE] Noun-like verb

        Japanese

        Are any of the components [AS: V-gov or V-dep?] in the candidate noun-like arguments?

        • it is a deverbalized V1/V2 MVC
          • n.a.
          • (JA) 響き渡る hibiki wataru echo spread-widely reverberate → The first verb is a noun-like argument of the second verb [deverbalized V2]
            聞き違え kiki chigae listen be-different mishear/misunderstand → The second verb is a noun-like argument of the first verb [deverbalized V1]
          • n.a.
          • n.a.
        • continue to next test

        Any other language

        Go to the next test

        Test MVC.13 - [V-LEX] Lexical inflexibility

        Does a regular replacement of V-dep by a related verb taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

        • it is an MVC
          • it will make do → #it will make build/solve/construct
          • hello quiere decir "hola" en inglés "hello" want say "hola" in English "hello" means "hola" in English → #"hello" quiere comunicar/gritar "hola" en inglés "hello" want communicate/shout "hola" in English
          • j'ai laissé tomber la présentation I have let fall the presentation I gave up on the presentation → #j'ai laissé commencer/lancer/interrompre la présentation I have let start/launch/interrupt the presentation
            ce mot veut dire autre chose this word wants say other thing this word means something else → #ce mot veut chuchoter/communiquer/crier autre chose this word wants whisper/communicate/scream another thing
          • ik heb ze leren kennen I got to know them→ #ik heb ze leren lezen/schrijven
          • dał jej popalić he let hr smoke he made things hard for her → #dał jej popić/podymić/pofajczyć he let her drink/smoke
          • n.a.
        • it is not an MVC
          • it will make me think → it will make me build/solve/construct
          • quiero leer tu tesis want.I read your thesis I want to read your thesis → quiero adquirir/descargar/imprimir tu tesis want.I acquire/download/print your thesis I want to get/download/print your thesis
          • je l'ai laissé finir la présentation I him have let finish the presentation I let him finish the presentation → je l'ai laissé commencer/lancer/interrompre la présentation I him have let start/launch/interrupr the presentation
            ce garçon veut dire autre chose this boy wants say other thing this boy wants to say something else → ce garçon veut chuchoter/communiquer/crier autre chose this boy wants whisper/communicate/scream another thing
          • ik heb mijn trui laten wassen I had my sweater washedik heb mijn trui laten strijken/verven/maken I had my sweater ironed/dyed/repaired
          • dał jej pospać he let her sleep→ dał jej odpocząć/poleżeć he let her rest/lay
          • n.a.

      Section 5.7

      Inherently adpositional verbs (IAVs)

      Inherently adpositional verb (IAV) is a special optional and experimental category (corresponding to the IPrepV category in the first pilot annotations), and to what is also sometimes called in English prepositional verbs. It consists of a verb or VMWE and an idiomatic selected preposition or postposition that is either always required or, if absent, changes the meaning of the verb of VMWE significantly. Language teams who decide to annotate IAV should do so after annotating other categories (step 4 of the annotation process), since overlapping can be quite frequent with other categories, as detailed below. Language teams are not required to use this category.

      Our definition of inherently adpositional verbs is a generalization (applying to many languages) of the annotation guidelines of the English STREUSLE corpus, which define guidelines for annotating prepositional verbs.

      IAVs are verb+adposition combinations in which:

      • the dependents of the adposition are not lexicalized
        • разчитам на някого/нещо to rely on somebody/something is annotated as IAV because the object is not lexicalised,
          but in the ID
          вземам на мушка някого/нещо take on target to critisise heavily somebody/something cannot be annotated as IAV because мушка is also lexicalized in the ID
        • to stand for something is annotated as IAV because the object is not lexicalized,
          but in the ID to take something for granted, to take for cannot be annotated as IAV because granted is also lexicalized in the ID
        • entender de algo understand of somethingto know about something is annotated as IAV because the object is not lexicalised, whereas entender algo would not be any type of VMWE.
        • n.a.
        • pristati na kaj to land on (something) to agree (with something)is annotated as IAV because the object is not lexicalized,
          but in the ID
          ostati na trdnih tleh to remain on solid ground to remain realistic ostati na to remain on cannot be annotated as IAV because trdnih tleh solid ground is also lexicalized in the ID
      • the adposition is integral, that is, "it cannot be omitted without markedly altering the meaning of the verb"
        • في رغب want to he has a desire to do something رغب في * can occur without the preposition في * in , but it will never have a sense of رغب في
        • считам за to take for *считам can never occur without the preposition за
          разчитам на to rely on разчитам can occur without the preposition, but it will never have a sense of to depend/rely on
        • to rely on *to rely can never occur without the preposition on
          to count on to count can occur without the preposition, but it will never have a sense of to depend/rely on
        • entender de understand of somethingto know about something entender to understandcan occur without the preposition, but it will never have a sense of to be an expert about something
          contar con count withto rely on contar to countcan occur without the preposition, but it will never have a sense of to rely on.
        • n.a.
        • grenzen aan *grenzen can never occur without the preposition aan
          behoren tot behoren can occur without the preposition, but it will never have a sense of behoren tot
        • temeljiti na to be based on *temeljiti can never occur without the preposition na
          biti za to be for to agree with or support (something or someone) biti to be can occur without the preposition, but it will never have a sense of to agree with or to support

      Note that idiomatic adpositional valency, in which the adposition opens a slot for a complement, should not be mistaken for idiomatic verb-­particle constructions. Tests distinguishing particles from prepositions can be used to disambiguate these categories.

      • to wake up somebody cannot be annotated as IAV because up is a particle, and not a preposition.
        Particles can occur after the object:
        to wake somebody up but prepositions cannot *to come a new restaurant across
      • n.a.
      • n.a.
      • zet de radio aan cannot be annotated as IAV because aan is a particle, and not a preposition.
      • n.a.

      Not only single verbs but also VMWEs may be inherently adpositional. This is why IAV annotation needs to be the last step, after all other VMWEs in a sentence have been identified and categorized. In case of overlap between another category and IAV, the whole VMWE annotation needs to be repeated with the addition of the lexicalized adposition, and the whole is annotated as an IAV.

      • to put up with bears 2 annotations:
        1.
        to put up is annotated as VPC
        2. the whole sequence
        to put up with is annotated as IAV
      • atenerse a abide.self to to abide by bears 2 annotations:
        1.
        atenerse is annotated as IRV
        2. the whole sequence
        atenerse a is annotated as IAV
      • n.a.
      • ubadati se z to deal RCLI with to deal with bears 2 annotations:
        1.
        ubadati se to deal RCLI is annotated as IRV, since the verb without the RCLI does not exist
        2. the whole sequence
        ubadati se z to deal RCLI withis annotated as IAV, since the verb also does not exist without the preposition

      Test IAV.1 - [CIRCUM-QUEST] Circumstantial question with no adposition

      This is an adaptation of STREUSLE's guideline on prepositional verbs by Nathan Schneider and Meredith Green.

      In response to a declarative sentence with the verb+adposition combination, is there a natural way to query the circumstances of the verbal event using the verb, but not the adposition?

      • it is not an IAV
        • - I care about the environment.
          - Why do you care?
          to care about is not annotated as IAV
        • - me preocupo por su salud. me worry.I for his/her health I'm worried about his/her health
          - ¿Por qué te preocupas?why you worry.you? Why are you worried?

          preocuparse por is not annotated as IAV
        • n.a.
        • - Lahko se zanesem na pomoč svojih prijateljev. I can rely on my friends' help
          - Se lahko zaneseš, da ti bo kdo pomagal? Can you rely that someone will help you?Can you rely on that someone will help you?

          zanesti se to rely on is not annotated as IAV
      • annotate as an IAV
        • - I came across a nice restaurant downtown.
          - #When did you come?
          to come across is annotated as IAV
        • - Ana entiende de música clásica. Ana understands of music classic Ana knows about classical music
          - #¿Desde cuándo entiende? Since when understands.she?Since when does she know?
          entender de is annotated as IAV
        • n.a.
        • Gre za enakovrednost. It goes about equality. It is about equality.
          - #Kaj gre? #What goes?
          gre za is annotated as IAV

      Section 6

      Tests for nominal MWEs (NMWEs)

      If the DIST test has allowed us to decide that the MWE candidate has a nominal distribution, the status of this candidate (as NID, PronID, NV or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.

      The role of the first 3 tests, NMWE.1, NMWE.2 and NMWE.3 is to eliminate a candidate if it is a named entity (or a definite description).

      The tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions. In practice, however, it turns out that specific tests are often not applicable to some NMWE classes, and more generic tests (e.g. LEX) are required. As a consequence, generic tests, appearing towards the end of the list, may end up being used quite frequently.

      Decision tree for nominal MWE candidates

      • Apply test NMWE.1 - [SPECIF-REF: Candidate refers to a specific entity?]
        • Apply test NMWE.2 - [NAMING-CONV: Naming convention applies to the whole class?]
          • Apply test NMWE.3 - [SEM-TYPE: Person, organization, location, product or event?]
            • It is a proper name or a definite description, not an MWE, exit
            • It is not a proper name, continue to test NMWE.4
          • It is not a proper name, continue to test NMWE.4
        • It is not a proper name, continue to test NMWE.4
      • Apply test NMWE.4 - [DEVERBAL: Candidate derives from a VMWE?]
        • It is an NV.VID, NV.LVC.full, etc., depending on the outcome of the VMWE tests, exit.
        • Apply test NMWE.5 - [PRON: Candidate on the list of MWE pronouns?]
          • It is a PronID, exit.
          • Apply test NMWE.6 - [CRAN: Candidate contains a cranberry word?]
            • It is an NID, exit.
            • Apply test NMWE.7 - [IRREG-STRUCT: Irregular syntactic structure?]
              • It is an NID, exit.
              • Apply test NMWE.8 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
                • It is an NID, exit.
                • Apply test NMWE.9 - [MODIF: Modification of a component prohibited?]
                  • It is an NID, exit.
                  • Apply test NMWE.10 - [COORD: Coordination prohibited?]
                    • It is an NID, exit.
                    • Apply test NMWE.11 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
                      • It is an NID, exit.
                      • Apply test NMWE.12 - [HEAD: Semantic head is hypernym?]
                        • It is an NID, exit.
                        • Apply test NMWE.13 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
                          • It is an NID, exit.
                          • It is not an MWE, exit

      Test NMWE.1 - [SPECIF-REF] - Specific reference

      In the given context, does the candidate refer to one or more specific entities, rather than being used generically?

      • It might be a proper name, go to test NMWE.2
        • (OEG) 𓋾𓈎𓄿𓄣𓃀 Ḥḳꜣ-ꞽb The ruler (Ḥḳꜣ) of the heart (ꞽb) Heqaib (Urk. I 132, 4) → A proper name.
        • John Smith showed up unexpectedly - John Smith refers here to a single specific person
          Many Johns Smiths live in London Johns Smiths refers to several specific persons
          He used the cold weapon hidden under his coat cold weapon refers to a specific weapon
          The two cold weapons were found at the place of the crime cold weapon refers to several specific weapons
          The theory of relativity was proposed by Einstein → there is only one theory of relativity, so it must be single and specific
          the UN Secretary-General visited Greece → at the moment of writing there is only one UN Secretary-General (so he/she must be single and specific)
          Universal Dependencies is a collection of treebanks - Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
          I ate a cold lunch - cold lunch refers to a specific meal
        • René Descartes est un philosophe français R. D. is a French philosopher René Descartes refers to a single specific person
          Le (café) Descartes the (café) Descartes le (café) Descartes refers to a specific place
          Il cachait une/l' arme blanche sous le manteau He was hiding a/the cold weapon under his coat arme blanche has a(n) (in)definite specific reference
          Le Secrétaire général de l'ONU est en visite officielle en Grèce The secretary general of the UN is in visit official in Greece The UN Secretary-General is officially visiting Greece → 'Secrétaire général' de l'ONU is specific at the moment of writing
        • Ξενοφῶν ἈθηναῖοςXenophōn Athēnaios Xenophon, the Athenian Xenophon.NOM.sg.m Athenian.NOM.sg.m
        • Il Mar Nero si trova tra Europa e Asia. The Black sea is between Europe and Asia Black Sea Mar Nero refers to a specific geographical entity
          Il Segretario Generale dell’ONU ha rilasciato la dichiarazione. Segretario Generale refers to a specific person
          La teoria della relatività fu formulata da Einstein. teoria della relatività refers to a specific theory
        • Maja Kowalska nie ma tu konta Maja Kowalska has no account here - Maja Kowalska refers here to a single specific person
          Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here - Maje Kowalskie refers two two specific persons
          Posłużył się białą bronią przyniesioną w torbie He used the white weapon brought in his bagHe used the cold weapon brought in his bag biała broń refers to a specific weapon
          W pobliżu znaleziono kilka białych broni Nearby several white weapons were found Nearby several cold weapons were found białe bronie refers to several specific weapons
          paradox Banacha i Tarskiego został opisany w 1924 roku the Banach-Tarski paradox was described in 1924 → there is only one Banach-Tarski paradox (so it must be single and specific)
          Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at the moment of writing there is only one such secretary
          Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays - Anonimowi Alkoholicy Anonymous Alcoholics refers to a single specific organization
          Zjadłam zimny obiadI ate a cold lunch - zimny obiad refers to a specific meal
        • Виктор Иго је француски писац Viktor Igo je francuski pisac Victor Hugo is a French wirter Виктор Иго Viktor Igo refers to a single specific person
          Провалник је био наоружан хладним оружјем Provalnik je bio naoružan hladnim oružjem The burglar was armed with a cold weapon хладно оружје hladno oružje has a(n) (in)definite specific reference
          Генерални секретар УН Generalni sekretar UN The UN Secretary-General Генерални секретар УН Generalni sekretar UN is specific at the moment of writing
      • It is not a proper name, continue to test NMWE.4
        • (OEG) 𓇓𓏏 𓆤𓏏 nsw - bꞽtꞽ The king of Upper Egypt (𓇓𓏏) and Lower Egypt (𓆤𓏏). The king of Egypt (PT 776a, P) → For the meaning of nsw-bꞽtꞽ see Schenkel, Das Wort für 'König' (von Oberägypten), 1986.
        • A cold weapon is a weapon that does not involve fire or explosions cold weapon is used generically, i.e. refers to all instances of a class
          Cold weapons are prohibited on a plane cold weapons is used generically, i.e. refers to the whole class
          I avoid cold lunches - cold lunches is used generically, i.e. refers to all instances of the class
          The UN Secretary-General is the chief administrative officer of the United Nations UN Secretary-General is used generically, i.e. refers to the whole class
        • Une arme blanche est accessible A bladed weapon is accessible arme blanche has a generic and non-specific interpretation.
          J'évite de porter une chemise blanche I avoid wearing a white shirt chemise blanche does not refer to a specific occurrence
        • μεγὰς βασιλεύςmegas basileus the great king the king of Persia great.NOM.sg.m king.NOM.sg.m
        • Le carte di credito sono un mezzo di pagamento. carte di credito is used generically, i.e. refers to the whole class
        • Biała broń na ogół służy do walki wręcz White weapon is usually used in hand-to-hand combatCold weapon is usually used in hand-to-hand combat biała broń white weapon cold weapon is used generically, i.e. refers to the whole class
          Białe bronie są zabronione na pokładzie White weapons are forbidden onboardCold weapons are forbidden onboard białe bronie white weapons cold weapons is used generically, i.e. refers to the whole class
          dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - gospodarka rynkowa market economy is used generically
          Nie lubię zimnych obiadów I don't like cold lunches - zimne obiady cold lunches refers to all instances of a class

      Test NMWE.2 - [NAMING-CONV] - Concept naming convention

      Does the naming convention between the candidate c and an entity e refer to all instances of a whole semantic class? In other words, can c refer to another entity e' based on the properties of e’, with no need of an extra naming convention?

      • It is not a proper name, go to test NMWE.4
        • (OEG) 𓇓𓏏 𓆤𓏏 nsw - bꞽtꞽ The king of Upper Egypt (𓇓𓏏) and Lower Egypt (𓆤𓏏). The king of Egypt (PT 776a, P) → Different people were nsw-bꞽtꞽ during the history of Egypt. For the meaning of nsw-bꞽtꞽ see Schenkel, Das Wort für 'König' (von Oberägypten), 1986.
        • He used the cold weapon hidden under his coat → if another entity e' occurs which has the same properties as the one in this sentence (it is a weapon that does not use explosives or fire), e' can be called cold weapon with no need of an extra naming convention
          The two cold weapons were found at the place of the crime → if another entity e' occurs which has the same properties as the ones in this sentence (it is a weapon that does not use explosives or fire), e' can be called cold weapon with no need of an extra naming convention
          the UN Secretary-General visited Greece → at a different moment in time, there can be another person e' playing the same role, so she/he can be called UN Secretary-General with no need for an extra naming convention
          I ate a cold lunch → if another entity e' occurs which has the same properties as the one in this sentence (it is a lunch which is cold), e' can be called cold lunch with no need of an extra naming convention
        • Il a utilisé l'arme blanche cachée sous le manteau He used the cold weapon hidden under his coat → if another entity e' occurs which has the same properties as the one in this sentence, e' can also be called arme blanche with no need of an extra naming convention
          Le Secrétaire général de l'ONU a un mandat de 5 ans The UN Secretary-General has a five-year term →any e' may be designated by c with no extra conventions, as long as it occupies the function c
        • μεγὰς βασιλεύςmegas basileus the great king the king of Persia great.NOM.sg.m king.NOM.sg.m
        • La strada statale era bloccata per lavori.
          Ha pagato con una carta di credito.
        • Posłużył się białą bronią przyniesioną w torbie He used the white weapon brought in his bagHe used the cold weapon brought in his bag → if another entity e' occurs which has the same properties as the one in this sentence (it is a weapon that does not use explosives or fire), e' can be called biała broń white weaponcold weapon with no need of an extra naming convention
          W pobliżu znaleziono kilka białych broni Nearby several white weapons were foundNearby several cold weapons were found → as above
          Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at a different moment in time, there can be another person e' playing the same role, so she/he can be called Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization with no need for an extra naming convention
          Zjadłam zimny obiadI ate a cold lunch → if another entity e' occurs which has the same properties (it is a lunch which is cold), e' can be called zimny obiadcold lunch with no need of an extra naming convention
        • Клинички центар Klinički centar Clinical center →any e' may be designated by c with no extra conventions, as long as it performs the function c
          Председник Републике Александар Вучић је изјавио...Predsednik Republike Aleksandar Vučić je izjavio... President of the Republic Aleksandar Vučić said... → at a different moment in time, there can be another person e' playing the same role, so she/he can be Predsednik Republike President of the Republic with no need for an extra naming convention
      • It could be a proper name, continue to test NMWE.3. Note that the answer might be no in two cases:
        • The is no other e' in the concept denoted by the candidate
          • (OEG) 𓇓𓏏𓉐 pr-nsw king's (nsw) house (pr) The palace, the royal administration (Urk. I 251, 1) → there is no other e' which could be called pr-nsw.
          • The theory of relativity was proposed by Einstein → there is no other e' which could be called theory of relativity
            Universal Dependencies is a collection of treebanks - there is no other e' which could be called Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
          • Il Segretario Generale dell’ONU dirige l'apparato burocratico.
          • paradox Banacha i Tarskiego został opisany w 1924 roku the Banach-Tarski paradox was described in 1924 → there is no other e' which could be called paradox Banacha i Tarskiego
            Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays → there is no other e' which could be called Anonimowi Alkoholicy Anonymous Alcoholics
        • There could be another e' in the same class as e but the naming convention does not apply to it
          • (OEG) 𓋾𓈎𓄿𓄣𓃀 Ḥḳꜣ-ꞽb The ruler (Ḥḳꜣ) of the heart (ꞽb) Heqaib (Urk. I 132, 4) → A proper name.
          • John Smith showed up unexpectedly - given another person e', we are not able to deduce that his name is John Smith based just on the properties of e'; we need to be aware of an extra naming convention which assigns e' his name
            Many Johns Smiths live in London →- as above
          • Άρειος πάγοςAreios pagos district devoted to Ares Areopagus devoted.to.Ares.NOM.sg.m district.NOM.sg.m
          • Molti Mario Rossi vivono a Milano. Mario Rossi is a proper noun and there is no naming convention that is applicable.
          • Maja Kowalska nie ma tu konta Maja Kowalska has no account here - given another person e', we are not able to deduce that her name is Maja Kowalska based just on the properties of e'; we need to be aware of an extra naming convention which assigns e' her name
            Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here →- as above

      Test NMWE.3 - [SEM-TYPE] - Semantic type

      Is the entity e referred to by the candidate c a PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT or EVENT?

      • The candidate is a proper name or a definite description, not an MWE, exit.
        • (OEG) 𓇓𓏏𓉐 pr-nsw king's (nsw) house (pr) The palace, the royal administration (Urk. I 251, 1) → location.
        • theory of relativity → a theory is a human product
          Universal Dependencies → a treebank collection is a human product
          Einstein's mother → definite description
          Black Sea → location
        • René Descartes → a PERSON
          l'Organisation des nations unies → an ORGANISATION
          Charante-Maritime →a LOCATION
          le Petit Robert →a HUMAN PRODUCT
          la Nuit Blanche → an EVENT
        • Άρειος πάγοςAreios pagos district devoted to Ares Areopagus devoted.to.Ares.NOM.sg.m district.NOM.sg.m
          Ξενοφῶν ἈθηναῖοςXenophōn Athēnaios Xenophon, the Athenian Xenophon.NOM.sg.m Athenian.NOM.sg.m
        • Mar Nero → location
          Mario Rossi → person
          Organizzazione delle Nazioni Unite → organisation
          Dizionario Treccani → human product
        • Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays → organization
          Hołd pruski 1525 Prussian Tribute 1525 → event
          Morze Martwe Dead Sea → location
          Zygmunt III Waza Sigismund III Vasa → person
        • Закон акције и реакцијеZakon akcije i reakcije The Law of Action and Reaction
      • It is not a proper name, continue to test NMWE.4
        • (OEG) 𓅓𓎕 𓄣 𓈖 𓇓𓏏𓈖 mḥ ꞽb n(.ꞽ) nsw the-one-who-fills (mḥ) the heart (ꞽb) of (n(.ꞽ)) the king (nsw) The king's confidant (Urk. I 190, 11) → an office or an epithet is not a proper name.
        • quantum physics → a domain of knowledge is not a human product
          Alzheimer's disease → a disease is not a human product nor an event
        • sécurité routière
        • linguistica computazionale → a domain of knowledge
          demenza senile → a disease is not a human product
        • paradox Banacha i Tarskiego został opisany w 1924 roku the Banach-Tarski paradox was described in 1924 → a paradox is not a human product

      Test NMWE.4 - [DEVERBAL] - Deverbal NMWE

      Does the candidate contain a deverbal noun and can the candidate be rephrased (in the given context) using a verbal expression which passes the VMWE tests?

      • It is a deverbal nominal MWE (NV), with the corresponding VMWE subcategory, e.g. NV.VID, NV.LVC.full, etc.
        • (OEG) 𓅓𓎕 𓄣 𓈖 𓇓𓏏𓈖 mḥ ꞽb n(.ꞽ) nsw the-one-who-fills (mḥ) the heart (ꞽb) of (n(.ꞽ)) the king (nsw) The king's confidant (Urk. I 190, 11) = > mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) '(My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ)' 'My lord trusted me' → It is an NV.LVC.full.
        • He is a quick decision maker => He makes decisions quickly - make decision is an LVC.full, so decision maker is an NV.LVC.full
        • Sa prise en compte de mes remarques => Il prend en compte mes remarques - prend en compte is a VID, so prise en compte is an NV.VID
          Elle est preneuse de notes pour sa camarade => Elle prend des notes pour sa camarade - prend des notes is an LVC.full, so preneuse de notes is an NV.LVC.full
          La déclaration de guerre est autorisée par le Parlement The declaration of war is authorized by Parliament déclarer la guerre à NP is a VID. déclaration de guerre (à NP) is an NV.VID, argument of the verb autoriser.
        • τῇ προσέξει τοῦ νοῦtē prosexei tou nou the paying of attention the.DAT.sg.f paying.DAT.sg.f the.GEN.sg.m attention.GEN.sg.m
        • È proprio un guasta feste, non si diverte mai. party spoiler buzzkill guastare le feste is a VID. guasta feste is a NV.VID
          La presa in considerazione dell'evento è stato importante
        • wykonawca prac podał termin => Ten, kto wykonuje prace podał termin - wykonuje prace is an LVC.full, so wykonawca prac is an NV.LVC.full
          była to zabawa jego kosztem => bawili się jego kosztem - -bawili się jego kosztem is a VID, so zabawa jego kosztem is an NV.VID
          rzut oka na text => rzuciłam okiem na tekst - rzuciłam okiem is a VID, so rzut oka is an NV.VID
        • хватање прикључкаhvatanje priključka caching connection catching up хватати прикључак hvatati priključak to catch a connection to catch up is a VID, so хватање приључка is an NV.VID
      • Continue to the next test
        • this mountain is a widow maker this mountain is very dangerous - make a widow loses the idiomatic reading, so widow maker is not an NV
        • la mise en bière putting into beer putting the body of a dead person into the coffin - mettre en bière loses the idiomatic reading, so mise en bière is not an NV
        • Il porta voce ha letto la dichiarazione. voice carrier spokesperson - portare la voce loses the idiomatic reading, so porta voce is not a NV
        • dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy => gospodarować rynkowo to manage market-wise is not a verbal MWE, so gospodarka rynkowa market economy is not an NV (but it is an NMWE)
          zrobić coś za Bóg zapłaćto do something for God-payto do something for free => zrobić coś licząc, że Bóg za to zapłacito do something counting on God to pay it back - Bóg zapłaciGod will pay is not a verbal MWE, so Bóg zapłaćGod-payis not an NV (but it is an NMWE)
          był działaczem ruchu robotniczegohe was an activist in a workers' movement => działal w ruchu robotniczymhe acted in a workers' movement is not a VMWE, so działacz ruchu robotniczegoactivist in a workers' movement is not an NV

      Test NMWE.5 - [PRON] - Pronoun

      Does the candidate occur on the closed list of MWE pronouns or should the list be extended with this candidate? Such lists need to be established for each language separately. Care should be taken about distinguishing PronIDs from DetIDs.

      • It is a pronominal idiom (PronID)
        • (OEG) 𓅱𓌡𓏤 𓊪𓈖 𓇋𓅓 𓎡 wꜥ pn ꞽm(.ꞽ) ⸗k this (pn) one (wꜥ) who-is-in (ꞽm(.ꞽ)) you (⸗k). This one who is in you. (PT 254a)
        • I saw just a few
          I expect no one to come
          we love each other
        • je ne suis pas capable de manger quoi que ce soit I am not able to eat what that this be I cannot eat anything
          Je n'ai vu qui que ce soit I not have seen whoever it be.SUBJV.3.SG I didn't see anyone (ProID) → 'qui que ce soit' is a pronominal idiom.
        • τιho ti which anything that which.NOM.sg.n anything.NOM.sg.n
        • Non mi aspetto di fare niente di niente oggi.
        • powtarzał ciągle to samo he repeated always this the same he repeated always the same
          zawiniłam samej sobie I.am.guilty alone myself I'm guilty myself
      • Continue to the next test
        • I saw just a few examples - here a few is a determiner, not a pronoun
          there is no one right way to tell the story - no one is not a pronoun here but two determiners
        • J'achète n'importe quel livre I buy whichever book I buy any book n'importe quel is a DetID
        • Un libro vale l'altro.
        • powtarzał ciągle to samo zdanie he repeated always the same sentence - to samo is not a pronoun but a determiner
          to samo się rozwiąże this alone itself will solve this will solve itself - to samo is not a complex pronoun but a simple pronoun to and an adjective samo
          dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - gospodarka rynkowa market economy is not a pronoun but a nominal phrase

      Test NMWE.6 - [CRAN] - Cranberry word

      Does the candidate expression contain a cranberry word?

      • it is a nominal idiom (NID)
        • cranberry cran is not a standalone word (= 'cranberry' word)
          status quo → foreign words like 'status' and 'quo' are considered cranberry words
          kith and kinfreinds and relations → 'kith' is not a standalone word
          helter-skelter tall tower at a fun-fair → 'helter' and 'skelter' do not exist alone outside this expression
          riff-raff ill-behaved people → 'raff' does not exist alone outside this expression
          cha-cha(-cha) ballroom dance performed with small steps and swaying hip movements → 'cha' does not exist standalone
        • casus belli → foreign words like 'casus' and 'belli' are considered cranberry words
          méli-mélo confused mixture → 'méli' and 'mélo' are not stand-alone words
          frou-frou rustling → 'frou' does not exist outside of this compound
          loup-garou werewolf → 'garou' is not a stand-alone word
          pont-levis drawbridge → 'levis' is not a stand-alone word
          cha-cha-cha ballroom dance performed with small steps and swaying hip movements → 'cha' is not a stand-alone word
          bric-à-brac bric-à-brac → 'brac' is not a standalone word (cf. de bric et de broc (AdvID))
        • di riffa e di raffa riffa and raffa do not exist alone outside this expression
          casus belli → foreign words like 'casus' and 'belli' are considered cranberry words
          a iosa → 'iosa' does not exist outside this expression
          a sbafo → 'sbafo' does not exist outside this expression
          tran tran → 'tran' is not a stand-alone word
        • przodkowie z gatunku homo erectus ancestor from species homo erectus ancestors from the homo erectus species - 'erectus' is not a standalone word in Polish
          dziś wydaje się to jeszcze science fiction today it still looks like science fiction - 'science' and 'fiction' are not stand-alone words in Polish
          odnośnie mass mediów concerning mass media - 'mass' is not a standalone word in Polish
      • Continue to the next test
        • eager beaver both ‘eager’ and ‘beaver’ are stand-alone words
        • café-tabac café and tobacco shop both 'café' and 'tabac' stand-alone words
        • È un cane sciolto, non risponde a nessuno. lone dog
        • dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - both 'gospodarka' economy and 'rynkowa' market-related are standalone words

      Test NMWE.7 - [IRREG-STRUCT] - Irregular syntactic structure

      Does the candidate have an internal syntactic structure which is irregular for its distribution, i.e. it does not have a structure of a nominal.

      • It is a nominal idiom (NID)
        • (OEG) 𓇓𓏏 𓆤𓏏 nsw - bꞽtꞽ The king of Upper Egypt (𓇓𓏏) and Lower Egypt (𓆤𓏏). The king of Egypt (PT 776a, P) → The juxtaposition of two nouns with a single meaning is unusual in Old Egyptian. For the meaning of nsw-bꞽtꞽ see Schenkel, Das Wort für 'König' (von Oberägypten), 1986.
        • secretary(-)general N(-)Adj
          double-bind dilemma Adj-V
          a hold-up V-Adv
          fast day V-N
          love-hate relationship V-V N
          round about V-Adv
        • (une) louise-bonne Louise-good Louise-bonne pear N.proper-Adj
          (un) porte-manteau support-coat coat-rack V-N
          monte-charge raise load goods lift V-N
          (un) franc-parler frank-talk frankness Adj-V
          (un) à-coup at-strike juddering Preposition-N
        • un franco tiratore
          il caro prezzi
          rapporto amore-odio
      • Continue to the next test
        • (OEG) 𓌴𓐙 𓐍𓂋𓊤 𓆑 mꜣꜥ-ḫrw ⸗f The rightness (mꜣꜥ) of his (⸗f) voice (ḫrw). His justification (PT 316d, W) → Both nouns are used in the direct genitive.
        • general secretary
        • segretario generale fishing tour
        • dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - gospodarka rynkowa market economy has a regular nominal syntactic structure N-Adj

      Test NMWE.8 - [MORPH] - Morphological inflexibility

      Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • It is a nominal idiom (NID)
        • (OEG) 𓌴𓐙 𓐍𓂋𓊤 𓆑 mꜣꜥ-ḫrw ⸗f The rightness (mꜣꜥ) of his (⸗f) voice (ḫrw). His justification (PT 316d, W) → mꜣꜥ-ḫrw is usually used in singular (Wb. II 16); *mꜣꜥ.w-ḫrw.w is unattested and certainly ungrammatical.
        • bits and pieces #bit and piece
          (the) grass roots #grass root
          She invested in real estate She invested in *real estates
        • (du) pain perdu bread lost French toast *un/des pain(s) perdu(s)
          (des) vacances d'hiver vacations of winter winter vacation *vacance d'hiver
          (une) respiration mécaniquement assistée respiration mechanically assisted mechanical ventilation *respirations mecaniquement assistées
        • un lieto fine*lieti fine
          la tavola rotonda *tavole rotonde
        • dom dziecka house of a child orphanage #dom dzieci
      • Continue to the next test
        • (OEG) 𓇋𓐍𓅓 𓋴𓎡 ꞽ:ḫm-śk(.w) The-one-who-cannot (ꞽ:ḫm) perish (śk(.w)) The circumpolar star (PT 148a, W) → ꞽ:ḫm(.w)-śk(.w) (PT 149c, W) "those who cannot perish" i.e. "the circumpolar stars".
        • light year light years
        • année-lumière années-lumière
        • ἔργῳ καὶ λόγῳergō kai logō in word and deed deed.DAT.sg.n and word.DAT.sg.m --> pluralisation is possible
        • anno luce → anni luce
        • w nowoczesnej gospodarce rynkowej in a modern market economy we wszystkich nowoczesnych gospodarkach rynkowych in all modern market economies

      Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.

      Test NMWE.9 - [MODIF] - Prohibited modification

      Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?

      • It is a nominal idiom (NID)
        • (OEG) 𓇋𓐍𓅓 𓋴𓎡 ꞽ:ḫm-śk(.w) The-one-who-cannot (ꞽ:ḫm) perish (śk(.w)) The circumpolar star (PT 148a, W) → A modification of ꞽ:ḫm in this NMWE is unattested and certainly ungrammatical (Wb. I 125, 14).
        • cold war #very cold war
          (a) state-of-the-art #mental state-of-the-art, #state-of-the-fine-art
          starting blocks #starting to run blocks
          rowing machine *rowing slowly machine
          runner bean #slow runner bean
        • (un) fait divers fact diverse various news items *un fait tout à fait divers
          (un) livre d'or book of gold guestbook *un livre de mon frère d'or, *un livre de cet or
          (une) table ronde table round round-table discussion #une table très ronde
          (une) lettre recommandée letter recommended registered letter #une lettre recommandée par mon voisin
        • ἔργῳ καὶ λόγῳergō kai logō in word and deed deed.DAT.sg.n and word.DAT.sg.m
        • la Guerra Fredda *una guerra molto fredda
          lo stato dell'arte *lo stato della vera arte
        • słyżby bezpieczeństwa services of security security services - #służby powszechnego/naszego/całkowitego bezpieczeństwa
          środki masowego przekazu means of mass transfer mass media - #służby bardzo masowego oficjalnego przekazu
      • Continue to the next test
        • (OEG)
        • ich efektem była gospodarka rynkowa they resulted in a market economy ich efektem była gospodarka całkowicie rynkowa they resulted in a true market economy

      Test NMWE.10 - [COORD] - Prohibited coordination

      Does coordination of the candidate with another candidate of the same head lead to ungrammaticality or to an unexpected change in meaning?

      • It is a nominal idiom (NID)
        • blackberry *bleu and blackberry
          foul line *foul and side lines
          a can of worms *a can of worms and tuna
        • un rat de bibliothèque rat of library bookworm *rat(s) de bibliothèque et d'hôtel
          un esprit critique spirit critical critical mind #un esprit critique et frappeur
          un pot à épices jar of spices spice jar *pot à épices et à lait
        • un topo da biblioteca *un topo da biblioteca e da libreria
      • Continue to the next test
        • forward roll forward and backward roll
        • navire de guerre ship of war warship navire de guerre et de commerce
          pot à eau jug at water water jug pot à eau et à lait
        • nave da guerra → nave da guerra e da commercio
          porta interna → porta interna ed esterna
        • podstawowe prawa gospodarki rynkowej basic laws of the market economy podstawowe prawa gospodarki rynkowej i nierynkowej basic laws of the market and non-market economy, mutanci gospodarki rynkowej i leninowsko-maoistowskiej mutants of the market and lenino-maoist economy

      Test NMWE.11 - [SYNT] - Syntactic inflexibility

      Does another regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • It is a nominal idiom (NID)
        • (OEG) 𓇋𓐍𓅓 𓋴𓎡 ꞽ:ḫm-śk(.w) The-one-who-cannot (ꞽ:ḫm) perish (śk(.w)) The circumpolar star (PT 148a, W) → [ꜥ.wt]{t} ⸗f ꞽ:ḫm.t-śk(.w) (PT 530b, T) 'limbs which cannot perish' i.e. 'the imperishable limbs'. If ꞽ:ḫm-śk(.w) is used as an adjective following a noun, it literally means 'imperishable'.
        • hot dog #a dog that is hot
          a dog’s breakfast a mess #breakfast of dog
          hard shoulder emergency lane #a shoulder that is hard
        • le cuir chevelu leather hairy scalp #le cuir qui est chevelu
          les sciences naturelles natural sciences #les sciences qui sont naturelles
        • un capro espiatorio → un capro che espia
          il cuoco capelluto → il cuoio che è capelluto
        • rzut wolny free throw free kick - #wolny rzut free throw
          stan wojenny war state martial law - #stan wojny state of war
      • Continue to the next test
        • gospodarka rynkowa i socjalne państwo market economy and a social state gospodarka rynku i socjalne państwo market economy and a social state, rynkowa gospodarka i socjalne państwo market economy and a social state

      Test NMWE.12 - [HEAD] - Semantic head

      Is the semantic head h of the candidate c its hypernym, which can be reformulated by "is c a type of h"? Note that sometimes the syntactic and semantic heads do not coincide.

      • It is a nominal idiom (NID))
        • (OEG) 𓍛 𓊪𓋴𓆓𓏏𓅆 ḥm-pśč̣.t The majesty (ḥm) of the light (pśč̣.t) Pelican (PT 226a, W) → It is not a majesty, but a pelican.
        • white elephant → It is not a type of elephant, it is a valuable possession
          red herring → It is not a type of sea fish, but it suggests an idea of a misleading clue
          a square peg (in a round hole) someone who does not fit in → It is not a peg but a person
        • una falsa pista → it is not a type of 'pista'
          una testa calda → it is not a type of "testa"
        • wymiar sprawiedliwości measurement of justice justice - it is not a type of measurement but an institution
          osobowość prawna legal personality legal person - it is not a type of personality
          manna z nieba manna from heaven miracle - it is not a type of manna
      • Continue to the next test
        • student teacher teacher-in-training →a student teacher is both a student and a teacher
          a bunch of flowers→ these are flowers (here the semantic head 'flowers' is different from the syntactic head 'bunch')
        • cordon bleu cord blue master chef → It does not refer to an object but to a person
          un nuage de lait cloud of milk a dash of milk → It does not refer to a type of cloud but to a small quantity (of milk)
          moulin à paroles mill at words blabbermouth → It does not refer to a type of mill but to a person
        • mazzo di fiori
          studente lavoratore
        • milowy krok one-mile step important event - it is a kind of a step (in the metaphorical sense)
          gospodarka rynkowa market economy - it is a type of economy
          ruch oporu movement of resistence resistance movement - it is a type of a movement

      Test NMWE.13 - [LEX] - Lexical inflexibility

      Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning??

      • It is a nominal idiom (NID)
        • (OEG) 𓇋𓐍𓅓 𓋴𓎡 ꞽ:ḫm-śk(.w) The-one-who-cannot (ꞽ:ḫm) perish (śk(.w)) The circumpolar star (PT 148a, W) → ꞽ:ḫm(.w)-ꞽw.t (PT 367a, W) 'those who don't know the reeds'. The meaning of this expression is not a "circumpolar star".
        • value judgment a personal evaluation of value #importance judgment, #cost/price judgement but cost assessment, price evaluation
          chain reaction #chain change(s)
          deep water #profound water
          vicious circle vicious cycle but #vicious sphere/round/ring...
          vanity case vanity box but #arrogance/narcissism/self-admiration box/case
          boarding pass boarding card but #bording ticket/voucher/document/...
        • globe-trotteur *sphère-trotteur
          tête de lard head of lard stubborn *tête de graisse, *chef de lard
          peine perdue effort lost fruitless effort *peine égarée
          mauvaise/méchante langue bad mouth #bonne/gentille langue
        • ἔργῳ καὶ λόγῳergō kai logō in word and deed deed.DAT.sg.n and word.DAT.sg.m --> cannot replace e.g. with πρᾶγμα pragma and ἔπος epos
        • reazione a catena
          circolo vizioso
        • pole karne penalty field penalty area - #pole karania field of punishing, #obszar karny penalty area
          milowy krok one-mile step important event - #kilometrowy krok one-kilometer step
          gospodarka rynkowa market economy - #gospodarka handlowa/komercyjna/targowa economy of trade/commerce/market
        • ванредна седницаvanredna sednica extraordinary session #нередовна седница, #непланирана седница
      • It is not an MWE, exit
        • vicious person/dog/attack...
          personal/professional... judgement
          deep anxiety/love/conversation...
        • grande/vive/profonde...peine deep sorrow/intense grief
          mauvaise odeur/habitude/surprise... bad smell/habit/surprise
          méchant garçon/professeur/marchand... mean boy/teacher/merchant
        • giudizio personale
        • ruch oporu movement of opposition resistence movement - #ruch/organizacja/ugrupowanie sprzeciwu/opozycji/opozycyjny/protestu/protestacyjny/kontestacji

      Section 7

      Tests for adjectival and adverbial MWEs (AMWEs)

      If the DIST test has allowed us to decide that the MWE candidate has an adjectival or an adverbial distribution, the status of this candidate (as an AdjID, AdvID, AV or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.

      Like for nominal MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.

      Decision tree for adjectival and adverbial MWE candidates

      In this tree, a single YES to one of the tests is sufficient to decide that a candidate is an AMWE.
      • Apply test AMWE.1 - [DEVERBAL: Candidate derives from a VMWE?]
        • It is an AV.VID, AV.LVC.full, etc., depending on the outcome of the VMWE tests, exit.
        • Apply test AMWE.2 - [CRAN: Candidate contains a cranberry word?]
          • It is an AdjID or an AdvID, exit.
          • Apply test AMWE.3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
            • It is an AdjID or an AdvID, exit.
            • Apply test AMWE.4 - [IRREG-STRUCT: Irregular syntactic structure?]
              • It is an AdjID or an AdvID, exit.
              • Apply test AMWE.5 - [MODIF: Modification of a component prohibited?]
                • It is an AdjID or an AdvID, exit.
                • Apply test AMWE.6 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
                  • It is an AdjID or an AdvID, exit.
                  • It is not a MWE, exit

      Test AMWE.1 - [DEVERBAL] - Deverbal AMWE

      Does the candidate contain a deverbal adjective or adverb and can the candidate be rephrased (in the given context) using a verbal expression which passes the VMWE tests?

      • It is a deverbal adjectival or adverbial MWE (AV), with the corresponding VMWE subcategory, e.g. AV.VID, AV.VPC.full, etc.
        • (OEG) 𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹 𓋴𓆓𓄔𓏏𓅓 𓌃𓅱𓂧 𓇋𓍘𓅱 pśč̣.(w)t śč̣m.t mṭw ꞽtꞽ.w The Enneads (pśč̣.(w)t) which-hear (śč̣m.t) the word (mṭw) of the monarch (ꞽtꞽ.w). The Enneads which interrogate the monarch (PT 511c, W) → śč̣m.n ⸗f mṭw śr ꞽś (PT 347b, T) 'he heard the word as an official' i.e. 'he interrogated as an official'; śč̣m.n ⸗f mṭw is a VID.
        • a decision-making process (AV.LVC.full) make a decision is an LVC.full
          a plan brought to fruition over a decade (AV.LVC.cause) bring to fruition is an LVC.cause. We will bring the plan to fruition
          a time-killing activity (AV.VID) kill time is a VID We killed time watching a movie
          made-up stories (AV.VPC.full) make up is a VPC.full. They completely made up these stories.
        • Il était caustique et emporte-pièce take-piece He was caustic and non-nonsense (AV.VID) emporter la pièce is a VID
          Ce plat est très arrache-gueule tearing.up-mouth This dish burns the mouth (AV.VID) arracher la gueule is a VID
          Un exercice casse-gueule break-face a risky exercise (AV.VID) se casser la gueule is a VID
        • (τὸν) λόγον ποιούμενος(ton) logon poioumenos making a comment speaking (the.ACC.sg.m) word.ACC.sg.m do.PRS.PTCP.MID.NOM.sg.m
      • Further tests are required
        • reinforcements brought to the front lines bring to the front lines is not a VID/LVC, so this is not an AMWE
        • une formule attrape-curieux catch-curious a catchy phrase attraper les curieux is not a MWE
        • διαφερόντως τῶν ἄλλωνdiapherontōs tōn allōn differing from others differ.ADV the.GEN.pl.m other.GEN.pl.m
        • być może to prawda be may it is truemaybe this is true => to może być prawdathis can be true - może byćcan be is not a verbal MWE, so być możemaybeis not an AV (but it is an AMWE)
          daleko idące uogólnienia far going generalisationsfar reaching generalisation => te uogólnienia idą daleko these generalisations go far - iść dalekoto go far is not a verbal MWE, so daleko idący far goingfar reachingis not an AV (but it is an AMWE)

      Test AMWE.2 - [CRAN] - Cranberry word

      Does the candidate expression contain a cranberry word?

      • It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution.
        • spick and span very clean and tidy (AdjID) spick is not a standalone word in English
          The boat rocked to and fro on the waves back and forth (AdvID) fro is not a standalone word in English
          She was in fine fettle in good condition (AdjID) fettle is not a standalone word in English
          He drove off in high dudgeon angrily (AdvID) dudgeon is not used outside this idiom
          He was hale and hearty healthy and strong (AdjID) hale is not used outside this idiom
        • se laisser aller à vau-l'eau downhill (AdvID) vau is not a standalone word in French
          Une famille de bon aloi of good sterling A family of sterling reputation (AdjID) aloi is not used outside this expression
          boire à tire-larigot drink to excess (AdvID) larigot is not used standalone in French
          manger à la bonne franquette eat without any fuss (AdvID) franquette is not used standalone in French
          construire un abri de bric et de broc construct a shelter from a hodgepodge of objects (AdvID) → 'broc' is not a standalone word
        • μετ’ ἡμέρηνmet’ hēmerēn during the day in broad daylight during day.ACC.sg.f → μετά meta + accusative ‘during’ (in classical Greek μετά meta + accusative = ‘after’)
        • Po prostu zamknęliśmy się we własnym domu Simply we locked ourselves in our own house - 'prostu' is not a standalone word
        • преко јегоpreko jego bejond jego beyond all measure
      • Further tests are required
        • w celu manipulacji in the aim of manipulation in order to manipulate - both 'w' and 'cel' are standalone words

      Test AMWE.3 - [MORPH] - Morphological inflexibility

      Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

      • It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
        • (OEG) 𓆓𓏏𓇿 𓂋 𓈖𓅘𓎛𓎛 č̣.t r nḥḥ for the linear-eternity (č̣.t) to (r) the circular-eternity (nḥḥ). for ever and ever (PT 414c, W) → * č̣.t r nḥḥ.
        • up in arms very angry (AdjID) #up the arm
          by heart learn something in such a way that you can say it from memory (AdvID) #by hearts
          by no means not at all (AdvID) #by no mean
          from time to time sometimes but not often (AdvID) #from times to times
          hot under the collar embarrassed or angry about something (AdjID) #hot under the collars
          larger than life more interesting, obvious than usual (AdjID) #larger than lives
          down to earth practical (AdjID) #down to earths
          By the way, have you decided yet? (AdvID) *by the ways, have you decided yet?
        • des usines à l'abandon at the abandonment abandoned factories (AdjID) #aux/*à abandons
          Elle vient ici à titre exceptionnel at title exceptional She exceptionally comes here (AdvID) #aux/*à titres exceptionnels
          Il pleut. En conséquence, on ne sort pas. in consequence It rains. As a result, we'll not go out (AdvID) #en conséquences
        • μετ’ ἡμέρηνmet’ hēmerēn during the day in broad daylight during day.ACC.sg.f → #μετ’ ἡμέρας met’ hēmeras
        • masz to jak w banku you have it as in a bank it is guaranteed for you (AdjvID) #jak w bankach as in banks
          z powrotem with return back - #z powrotami
        • брз на језикуbrz na jeziku quick on the tongue quick-tongued #брзи на језицима brzi na jezicima quick on tongues
      • Further tests are required
        • w celu manipulacji in the aim of manipulation in order to manipulate - w celach manipulacji
          daleko idące uogólnienia far going generalisationsfar reaching generalisation - daleko idące uogólnienie far going generalisationfar reaching generalisation, daleko idąca zmiana far going changefar reaching change

      Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.

      Test AMWE.4 - [IRREG-STRUCT] - Irregular syntactic structure

      Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?

      • It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
        • (OEG) 𓆓𓏏𓇿 𓂋 𓈖𓅘𓎛𓎛 č̣.t r nḥḥ for the linear-eternity (č̣.t) to (r) the circular-eternity (nḥḥ). for ever and ever (PT 414c, W) → The preposition ꞽr/r is usually used with a noun as an adverbial phrase of the verb, e.g. ꞽ:šm ⸗k ꞽr P(ꞽ) (PT 624b) “You shall go to Pe”. It is unusual for the preposition ꞽr/r to be used with a coordinating function to link one noun with another, as in č̣.t r nḥḥ.
        • (all) of a sudden suddenly (AdvID) → *(Adv) P Det Adj (=sudden)
          one of a kind unique (AdjID) *of a kind in the sense of of a unique kind
          four-in-hand knot a method of tying a necktie (AdjID) four is an orphan
          back in the day back then (AdvID) #in the day vs. in the old days
          At bottom, he is a kind person (AdvID) #at a/the bottom (of N)
          By and large, the project was a success (AdvID) → Unusual coordination (preposition and adjecvive)
        • Elle est à bout (de forces/nerfs) at limit (of forces/nerves) She is exhausted (AdjID) bout is not determined (unusual) cf. au bout du couloir
          Elle est sous pression au travail She is under pressure at work (AdjID) pression is not determined (unusual) cf. sous une grande pression
          Un costume sur mesure on measure made-to-measure (AdjID) mesure is not determined (unusual) cf. sur la mesure de N
          Un plat aigre-doux sour-sweet A sweet and sour plate (AdjID) → Unusual coordination with hyphen
        • po raz piąty się zgodzić for the fifths time to agree to agree for the fifths time (AdvID) → #po piąty raz
          co roku every year.DATevery year (AdvID) → the adposition 'c' requires an accusative for all other nouns
      • Further tests are required
        • w celu manipulacji in the aim of manipulation in order to manipulate - w celu in the aim of in order to has a regular syntactic structure of a nominal
          daleko idące uogólnienia far going generalisationsfar reaching generalisation has a regular Adv-Adj structure

      Test AMWE.5 - [MODIF] - Prohibited modification

      Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?

      • It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
        • (OEG) 𓆓𓏏𓇿 𓂋 𓈖𓅘𓎛𓎛 č̣.t r nḥḥ for the linear-eternity (č̣.t) to (r) the circular-eternity (nḥḥ). for ever and ever (PT 414c, W) → No modification is attested in č̣.t r nḥḥ in Old Egyptian.
        • from time to time (AdvID) #from long time to short time
        • Au besoin, je vous aiderai at.the necessity If necessary, I'll help you (AdvID) #au besoin urgent, #au besoin que vous connaissez
          Dans tous les cas, c'est foutuin all the cases In any case, it's screwed (AdvID) #Dans tous les cas connus/possibles/qu'on connaît
        • μετ’ ἡμέρηνmet’ hēmerēn during the day in broad daylight during day.ACC.sg.f
        • szkoła średnia school middle college - #szkoła raczej średnia school rather middle
          na serio on seriously seriously - *na bardzo serio on very seriously
          z powrotem with retun back - *z ostatecznym powrotem with final retunr
        • испод житаispod žita under wheat furtive, sly → #испод пуно жита under a lot of grain
      • Further tests are required
        • The idea is dead and buried prematurely (AdjID)
        • L'idée est morte et enterrée dead and buried prématurément (AdjID)
        • ὡς ἔπος εἰπεῖνhōs epos eipein to speak the word almost word.ACC.sg.n say.AOR.INF.ACT → can appear with πᾶς pas ‘every’ and οὐδείς oudeis ‘no’
        • w celu manipulacji in the aim of manipulation in order to manipulate - w tym celu in this aim, w przestępczym celu in a crimial aim, w nieodgadnionym celu in a misterious aim
        • бео као снегbeo kao sneg white as snow - бео као планински снег white as mountain snow

      Test AMWE.6 - [LEX] - Lexical inflexibility

      Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

      • It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
        • (OEG) 𓇳 𓎟 rꜥw nb every (rꜥw) sun (nb). everyday i.e. daily (PT 263a, W) → *grḥ nb "every night".
        • The idea is dead and buried (AdjID) #The idea is dead and interred
          He is on cloud nine very happy (AdjID) #on cloud ten
          ice-cold drinks (AdjID) #snow-cold
          She thinks she is over the hill no longer young (AdjID) #over the mountain
          a hot pink dress (AdjID) #hot red, #cold pink
        • L'idée est morte et enterrée dead and buried (AdjID) L'idée est #décédée et enterrée
          À la limite, on reporte la réunion at the limit if necessary (AdvID) #à l'extrémité, #au seuil
          Par-dessus le marché, il a plu over above the market On top of that, it rained (AdvID) #sur le marché, #par-dessus le bazar/commerce/pacte
        • ὡς ἔπος εἰπεῖνhōs epos eipein to speak the word almost as/to word.ACC.sg.n say.AOR.INF.ACT → #ὡς λόγον εἰπεῖν hōs logon eipein
        • z powrotem with return back - #z powróceniem
          w celu manipulacji in the aim of manipulation in order to manipulate - w zamierzeniu manipulacji in the intention of , w zamyśle manipulacji in the intention of
          daleko idące uogólnienia far going generalisationsfar reaching generalisation -#daleko maszerujące/posuwające się/jadące uogólnienia
      • It is not a MWE, exit
        • τὸ πρὸ τοῦto pro tou the before the before the.ACC.sg.n before the.GEN.sg.n
        • co dzień what day every day - co godzina what hour every hour, co stulecie what century every century , co kwadrans what quarter every quarter of an hour

      Section 8

      Tests for functional MWEs (FuncMWEs)

      If the DIST test has allowed us to decide that the MWE candidate has a distribution of a function word (determiner, adposition, conjunction or interjection) the status of this candidate (as an DetID, AdpID, ConjID, IntID or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.

      Like for nominal, adjectival and adverbial MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.

      Decision tree for functional MWE candidates

      In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a FuncMWE.
      • Apply test FuncMWE.1 - [CRAN: Candidate contains a cranberry word?]
        • It is a DetID, AdpID, ConjID or IntID, exit.
        • Apply test FuncMWE.2 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
          • It is a DetID, AdpID, ConjID or IntID, exit.
          • Apply test FuncMWE.3 - [IRREG-STRUCT: Irregular syntactic structure?]
            • It is a DetID, AdpID, ConjID or IntID, exit.
            • Apply test FuncMWE.4 - [MODIF: Modification of a component prohibited?]
              • It is a DetID, AdpID, ConjID or IntID, exit.
              • Apply test FuncMWE.5 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
                • It is a DetID, AdpID, ConjID or IntID, exit.
                • It is not a MWE, exit

    Test FuncMWE.1 - [CRAN] - Cranberry word

    Does the candidate expression contain a cranberry word?

    • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
      • in lieu of vacation (AdpID) instead of vacation - 'lieu' is not a standalone word in English
        by dint of repetition (AdpID) through repetition - 'dint' is not a standalone word in English
        on behalf of everyone (AdpID) instead of - 'behalf' is not a standalone word in English
      • parce que (ConjID) because - 'parce' is not a standalone word in French
        à l'instar de ces héros (AdpID) at the equivalent of as these heroes - 'instar' is not a standalone word in French
        la plupart de ces héros (DetID) the greater.part of most of these heroes - 'plupart' is not a standalone word in French
      • ととも(共)に COM.together.DAT with (AdpID) → 'とも(共)' is not a free word
    • Further tests are required
      • in front of - all components are standalone words
        in the end of - all components are standalone word
      • au lieu de in place of instead of - all components are standalone words
        dans un supermarché in a supermarket - all components are standalone words
      • na potrzeby wojska for needs of army for the army - all components are standalone words
        po to, by wiedzieć for it, to know in order to know - all components are standalone words

    Test FuncMWE.2 - [MORPH] - Morphological inflexibility

    Does the candidate contain a content word (noun, verb, adjective or adverb), and does a morphological change of this word that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

    • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
      • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → The noun ꜥw (arm) is only used in the singular in this compound preposition.
      • in place of vacation (AdpID) #in places of vacation
        Big deal! (IntID) #Big deals!
        a great deal of experience (DetID) #deals of
      • à la place de Luc (AdpID) at the place of Luc instead of Luc #aux places de Luc
        du fait de la crise sanitaire (AdpID) of the fact of the crisis sanitarydue to the public health crisis #des faits de la crise sanitaire
      • καθ’ ὅτιkath’ hoti in that according.to that
      • w imię przyjaźni in name of friendship in the name of a friendship - #w imiona przyjaźni in the names of friendships
    • Further tests are required
      • on the ground that (ConjID) for reasons based on the fact that ground may be plural: on the grounds that
        after the meeting/meetings → compositional expressions
      • au côté de Luc (AdpID) at.the side of Luc on/at the side of Luc côté may be plural: combattre aux côtés des Alliésfight alongside the allies
      • in aice (AdpID) in the vicinity near → 'i' is inflected with definite article here
      • ὁ δέho de this the.NOM.sg.m PRT
      • w czasie wojny in time of war during war - w czasach wojen in times of wars during wars
        po to, by wiedzieć for it, to know in order to know - both components do not inflect, so there could be no morphological flexibility

    Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.

    Test FuncMWE.3 - [IRREG-STRUCT] - Irregular syntactic structure

    Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?

    • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
      • (OEG) 𓈖 𓈖𓏏𓏏 n-n.tt “for (n) (the fact) that (n.tt) because (PT 716e, T) → A preposition, such as n, followed by the conjunction n.tt is an idiosyncratic feature in Egyptian.
      • in that (ConjID) that is a conjunction preceded by a preposition
        good gracious (IntjID) → Adj + Adj with no N head
        mercy me! (IntjID) → N + Pronoun with omitted verb and agent
      • bien que well that although (ConjID) que is a conjunction preceded by an adverb
        Ça alors! that well My! (IntjID) → Pronoun followed by an adverb
        peu de gens little of people few people (DetID) → Adv + Preposition
      • εἰ δὲ μήei de mē if not if PRT not
      • након што nakon što after that after (ConjID) što is a conjunction preceded by an preposition
    • Further tests are required
      • w czasie wojny in time of war during war - regular structure of a noun phrase in Polish: adp-noun-noun.GENw czasach wojen in times of wars during wars
        po to, by wiedzieć for it, to know in order to know - regular structure of an adverbial: adp-pron

    Test FuncMWE.4 - [MODIF] - Prohibited modification

    Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?

    • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
      • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → No modification is attested in m-ꜥw, e.g. * m-ꜥw pn "in this arm".
      • in addition to (AdpID) #in great addition
        spoons as well as knives (ConjID) → spoons *as well and good as knives
        a little salt (DetID) #a little but strong salt vs. a little but strong person
      • en guise de conclusion in guise of conclusion as a/in conclusion (AdpID) *en juste guise de conclusion
        en sorte que cela se calme in sort that it calms so that it calms (ConjID) *en bonne sorte que cela se calme
        des tas de choses Det.ind.pl lots of things lots of things(DetID) #des tas très hauts de choses vs. des tas énormes de blé
      • οὐ μὲν γάρou men gar for not
      • w imię przyjaźni in name of friendship in the name of our friendship - #w pierwsze/piękne/ważne imię przyjaźni
        jak to? how this? howcome? - #jak samo to?
        po to, by wiedzieć for it, to know in order to know - *po samo to for only it
        w czasie wojny in time of war during war - #w długim/trudnym/niebezpiecznym czasie wojny
    • Further tests are required
      • w czasie wojny in time of war during war - w długim/trudnym/niebezpiecznym czasie wojny

    Test FuncMWE.5 - [LEX] - Lexical inflexibility

    Does the candidate contain a content word (noun, verb, adjective or adverb), and does a regular replacement of this components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

    • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
      • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → If ꜥw “arm” is replaced with ꜥb “association”, it results m-ꜥb “in uniting” i.e. “together with” “in the company of”, e.g. m-ꜥb nčr(.w) (PT 736c, T) “in uniting the gods” i.e. “in the company of the gods”.
      • in the view of the evidence (AdpID) #in the perspective/perception of the evidence
        in consequence of the sentence (AdpID) #in result of the sentence
        as long as you finish your homework (ConjID) *as short/large as you finish your homework
        Give me a little money (DetID) → *Give me a small money
      • Je viens de la part de votre voisin (AdpID) of the part of I come on behalf of your neighbor → *Je viens du nom/de la direction de votre voisin
        Repas préparé par les soins de Madame X (AdpID) by cares of Meal prepared by Mme X → *Repas préparé par l'attention/la prévenance/la sollicitude de Mme X
        Il n'est pas venu sous prétexte qu' il était malade (ConjID) under pretext that He didn't come on the pretext that he was ill → *Il n'est pas venu sous excuse qu' il était malade
      • καὶ δὴ καὶkai dē kai as well as and PRT and
      • mimo że nie wiedziałam although that I didn't knew although I didn't knew (ConjID) - *wbrew że nie wiedziałam
        jak też pretensje as also reproaches and reproaches (ConjID) - *jak oraz pretensje
        coś tam jeszcze something there more something more (PronID) - #coś tu jeszcze
        wpół do piątej at.half to five half past four (AdpID) - #wpół po piątej
    • It is not a MWE, exit

    Section 9

    Language-specific tests

    Language-specific tests may be necessary in one of 3 cases:

    • a VMWE category may be universal or quasi-universal but it may require different tests in different languages,
    • any category specific to a language must be associated with appropriate tests in the same language,
    • universal tests can build upon more elementary language-specific tests (e.g. to distinguish a particle from a preposition).

    Section 9.1

    Language-specific categories (LS)

    Language-specific categories can be proposed for annotation in this task provided that they are carefully defined and accompanied by linguistic tests that allow to distinguish them from other categories. We recommended not redefining the universal and quasi-universal categories described here, but introducing new names and abbreviations in order to answer such needs.

    When a new language(-group)-specific category is introduced, we encourage the use of the LS category with a dotted extension, e.g. LS.SIM or LS.PROV (for "language-specific simile" or "language-specific proverb").


    Section 9.2

    Particles versus prepositions and prefixes

    The following tests allow to properly identify prepositional verb particles in cases where they might be homographic with prepositions in prepositional phrases (PPs) or with verbal prefixes. The word to be discriminated is referred to as a candidate word. The tests are language-specific and concern English, German and Swedish.

    English-specific test for distinguishing particles from preposition

    The following tests concern English words which can be either a preposition or a particle depending on the context, e.g. up, on, through, etc. If a candidate word passes any of the two tests it can be categorized as a particle.

    Test PREP.EN.1 - [FIN-PART] - Sentence-final particle

    Can the sentence be reformulated so that the candidate word w occurs at the end of a clause which is: (i) affirmative or imperative, (ii) headed by the verb governing w, and (iii) not a relative clause?

    • the candidate word is a particle
      • n.a.
      • They got up a petition on Monday. They got it up.
        I took off my clothes. I took my clothes off.
        She tries to take in her clients. She tries to take her clients /in.
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • go to the next test
      • n.a.
      • I got up the hill. *I got it up.
        He has been off alcohol*He has been alcohol off.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    Test PREP.EN.2 - [AD-INS] - Adjunct insertion

    Is an insertion of a circumstantial adjunct prohibited between the governing verb and the candidate word?

    • the candidate word is a particle
      • n.a.
      • They finally got up a petition. *They got finally up a petition.
        I took off my clothes at once. *I took at once off my clothes.
        She always tries to take in her clients. *She tries to take always in her clients.
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • it is not a particle
      • n.a.
      • I got up the hill finally. I got finally up the hill.
        He has been off alcohol recently. He has been recently off alcohol.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    This test might be redundant with respect to test PREP.EN.1. It it occurs to be so (after a large-scale annotation), it may be deleted.

    German-specific tests for distinguishing particles from prepositions and verbal prefixes

    The following tests concern German words which can be both a particle and either a preposition or a verbal prefix, depending on the context, e.g. mit, um, vor, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.

    Test PREP.DE.1 - [FIN-PART] - Sentence-final particle

    Does the candidate word occur at the end of the sentence or can the sentence be reformulated so as to put the candidate word at the end?

    • it is a particle
      • Kommst Du mit? come you with? are you coming?
        Ich schlage vor allen zu verzeihen. I propose to forgive everyone Ich schlage es vor I propose it
        Der Mülleimer wurde umgefahren. The trash bin was knocked down Er fuhr den Mülleimer um. He knocked down the trash bin
      • n.a.
      • n.a.
      • Ik stel voor iedereen te vergeven. I propose to forgive everyone Ik stel het voor I propose it
      • n.a.
      • n.a.
    • other tests are needed
      • Kommst Du mit jemandem? Are you coming with someone? *Kommst Du jemandem mit?
        Er umfuhr den ganzen See mit dem Fahrrad. He drove around the whole lake with a bike *Er fuhr ihn um.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    Test PREP.DE.2 - [SEP-PART] - Separable particle

    Can the verb and the candidate word be spelled both separately and together?

    • it is a particle
      • Passen Sie auf die Autos auf! Be careful with the cars! Sie müssen auf die Autos aufpassen! You must be careful with the cars!
        Er fuhr das Schild um. He drove over the sign Er sollte das Schild nicht umfahren He should not drive over the sign
      • n.a.
      • n.a.
      • Let op de auto's! Pay attention to the cars! Je moet opletten! You must pay attention!
      • n.a.
      • n.a.
    • other tests are needed
      • Er umfuhr den ganzen See mit dem Fahrrad. He rode around the whole lake with a bike *Er fuhr den ganzen See mit dem Fahrrad um.
        Sprechen Sie mit ihm! Speak with him! *Sie sollen ihm mitsprechen.
      • n.a.
      • n.a.
      • zij aanbidden hem they worship him *zij bidden hem aan
      • n.a.
      • n.a.

    Swedish-specific tests for distinguishing particles from prepositions and verbal prefixes

    Many words are ambiguous between particles and prepositons, e.g. för, upp, … Accordingly, the following sentence may have two different senses:

    • Jag hälsade Anna I greeted on.PART Anna I visited Anna
    • see: https://taalportaal.org/taalportaal/topic/pid/topic-13998813296768009#section_svl_rtr_rk
    • Jag hälsade på Anna I greeted on.PREP Anna I greeted Anna

    The difference can only be judged by the stress/intonation pattern. In the first case, with a particle, the stress is not on the verb but on the particle. In the second case, with a prepositional object, the main stress is on the verb, with only secondary stress on the preposition.

    Test PART.SV.1 - [PART-STRESS] - Stress on the particle

    Is the main stress on the candidate word rather than on the verb?

    • it is a particle
      • Ongelukken kunnen 'voorkomen Accidents may happen
      • Jag hälsade Anna I greeted on.PART Anna I visited Anna → The main stress is on the particle
    • it is not a particle
      • Goede regels kunnen ongelukken voor'komen Good rules can prevent accidents
      • Jag hälsade på Anna I greeted on.PREP Anna I greeted Anna → The main stress is on the verb

    Section 9.3

    Identifying multiword tokens

    The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.

    Swedish-specific tests for identifying MWTs

    Test MWT.SV.1 - [VERB-MWT] - Verbal MWT

    Does the candidate token function as a verb?

    • we do not have to decide if it is an MWT (for the purpose of VMWE annotation)
      • mätredskap measuring-tool measuring instruments
        sysselsättning task-settingemployment
    • go to the next test
      • tillhandahålla to-hand-hold provide
        förklara for-clearexplain
        klargöra clear-makeclarify

    Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT

    Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)?

    • it is an MWT
      • tillvarata to-be-take take care of, ta till vara take to betake care of
        avbryta off-breakcancel, bryta av break offbreak off
    • go to the next test

    Test MWT.SV.3 - [CRAN-MWT] - Cranberry component in a MWT

    If you split the token into its component words, is any of these words a cranberry word (i.e. it cannot be used as a standalone word, with the same part-of-speech)?

    • it is not an MWT
      • [No example]
      • beklaga be-complain lamentbe is possible as a verb but not as a particle
        erbjuda er-offer offerer is possible as a pronoun but not as a particle
        försvåra for-difficult make difficultsvåra is possible as an adjective but not as a verb
        jämföra comparejäm is not used as a stand-alone word
    • it is an MWT
      • på|peka on|point point out
        för|klara for|clear explain
        klar|göra creal|make clarify

    Section 9.4

    Language-specific inherently clitic verbs (LS.ICV)

    Inherently Clitic Verbs (LS.ICV) together with the Inherently Reflexive Verbs (IRV) are pronominal verbs. LS.ICV are formed by a full verb combined with one or more non-reflexive clitic that represents the pronominalization of one or more complement (CLI). LS.ICV is annotated when (a) the verb never occurs without one non-reflexive clitic, e.g. entrarci to be relevant to something colloquial form, or (b) when the LS.ICV and the non-clitic versions have clearly different senses or subcategorization frames.

    LS.ICVs represent a specific category for some Romance languages, and they are particularly frequent in the Italian language. It is often challenging to distinguish LS.ICV from IRV, particularly because some clitics may be ambiguous, like se/si which is a polyfunctional clitic pronoun and grammatical marker (and has many functions such as reflexive, reciprocal, impersonal, passivizing, aspectual, middle).

    If the CLI has a clear reflexive meaning the VMWE might be an IRV.

    We start by listing the various categories of LS.ICVs before providing tests to decide whether to annotate a given occurrence as an LS.ICV.

    • Inherently clitic verbs ⇒ ANNOTATE as LS.ICV
      1. The verb without the CLI does not exist
        • infischiarsene (not worry about) vs *infischiare
      2. The verb without the CLI does exist, but has a very different meaning
        • darla (gl.: give it) (transl. fuck around) ≠ dare (give)
          prenderle (gl.: take them) (transl. be beaten) ≠ prendere (take)
          prenderci (gl.: take it) (transl. grasp the truth) ≠ prendere (take)
          starci (gl.: stay there) (transl. agree) ≠ stare (stay)
      3. The verb has more than one CLI of which the second one is an invariable object complement.
        • fregarsene (gl.: matter self of-it) (transl.don’t care about)
          infischiarsene (transl. not worry about)
          curarsene (gl.: take care self of-it) (transl. care about)
          prendersela (gl.: take self it.FEM)(transl. be angry/upset)
          sentirsela (gl.: feel self it.FEM) (transl. be in the mood of)
          sentirselo (gl.: feel self it.MASC) (transl. feel)
          vedersela (gl.: see self it.FEM)(transl. to manage something)
      4. The verb has two non-reflexive invariable CLIs:
        • farcela (gl.: make there it.FEM) (transl. succeed)
      5. The verb has a different meaning with respect to an intensive use of the same two non-reflexive invariable CLIs:
        • andarsene (gl.: go away self from-there) (transl. die) ≠ andarsene (go away)
          bersela (gl.: drink slef it.FEM) (transl. believe) ≠ bersela (drink)

    LS.ICV-specific decision tree

  • Test LS.ICV.1 - [CL-INHERENT] Inherent clitic

    Does the verb only exist with the CLI and never occurs without it?

    • annotate as LS.ICV
      • infischiarsi ⇒ *infischiare
        infischiarsene ⇒ *infischiare
    • next test

    Test LS.ICV.2 - [CL-DIFF-SENSE] - Different sense

    Given the same verb without the CLI/CLIs, are all of its meanings clearly different from the inherently clitic form?

    • annotate as LS.ICV
      • smetterla (gl.: quit it) (transl. knock it off) ≠ smettere (quit)
        prenderle (gl.: take them) (transl. get beaten up) ≠ prendere (take)
        prenderci (gl.: take it)(transl. grasp the truth) ≠ prendere (take)
        starci (gl.: stay there)(transl. up for it) ≠ stare (stay)
        curarsene (gl.: take care self of-it) (transl. care about) ≠ curare (take care)
        prendersela (gl.: take self it.FEM)(transl. be angry/upset)≠ prendere (take)
        sentirsela (gl.: feel slef it.FEM) (transl. be in the mood of) ≠ sentire (feel)
        darla (gl.: give it.FEM) (transl. fuck around) ≠ dare (give)
    • next test

    Test ICV.3 - [CL-DIFF-SUBCAT] - Different subcategorization frame

    Is the subcategorization frame of the simple verb without the CLI different from the subcategorization frame of the LS.ICV?

    • annotate as LS.ICV
      • X se la prende con Y ⇔ X prende Y
    • Exit

    Section 9.5

    Italian-specific decision tree

    For Italian, a language-specific category called inherently clitic verbs (LS.ICV) has been defined. This implies a modified version of the annotation decision tree.

    Steps 1-4 are still valid in Italian. But Step 3 should be realized with the decision tree below instead of the generic decision tree.

    • Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
        • Apply the test IT.S.1 - [CLITICS-ONLY: Are all lexicalized dependents of the verb clitics?]
          • Apply the LS.ICV-specific testsLS.ICV tests positive?
            • Annotate as a VMWE of category LS.ICV
            • It is not a VMWE, exit
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
        • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
            • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
              • Annotate as a VMWE of category IRV
              • It is not a VMWE, exit
            • Non-reflexive clitic ⇒ Apply LS.ICV-specific testsLS.ICV tests positive?
              • Annotate as a VMWE of category LS.ICV
              • It is not a VMWE, exit
            • Particle ⇒ Apply IVPC-specific testsIVPC tests positive?
              • Annotate as a VMWE of category IVPC.full or IVPC.semi
              • It is not a VMWE, exit
            • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
              • Annotate as a VMWE of category MVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category ID
                • It is not a VMWE, exit
            • Extended NP ⇒ Apply LVC-specific decision treeLVC tests positive?
              • Annotate as a VMWE of category LVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Another category ⇒ Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit

    Test IT.S.1 - [CLITICS-ONLY] Clitics only

    Are all lexicalized dependents of the verb clitics??

    • apply LS.ICV tests
    • next test

    Section 9.6

    Hindi-specific decision tree

    For Hindi, LVCs can be formed by a verb and a noun, or by a verb and an adjective which is morphologically identical to an eventive noun. This implies a modified version of the annotation decision tree.

    Steps 1-4 are still valid in Hindi. But Step 3 should be realized with the decision tree below instead of the generic decision tree.

    • Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
        • Apply the VID-specific testsVID tests positive?
          • Annotate as a VMWE of category VID
          • It is not a VMWE, exit
        • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
            • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
              • Annotate as a VMWE of category IRV
              • It is not a VMWE, exit
            • Particle ⇒ Apply IVPC-specific testsIVPC tests positive?
              • Annotate as a VMWE of category IVPC.full or IVPC.semi
              • It is not a VMWE, exit
            • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
              • Annotate as a VMWE of category MVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category ID
                • It is not a VMWE, exit
            • Extended NP or an adjective which is morphologically identical to an eventive noun ⇒ Apply LVC-specific decision treeLVC tests positive?
              • Annotate as a VMWE of category LVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Another category ⇒ Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit

    Section 10

    Annotation management

    This section groups the documentation on practical aspects of the annotation campaign management. Some of these aspects are specific to this shared task, such as the edition of examples by language leaders and the use of the annotation platform FLAT. Others are more generic and concern the guidelines in general, such as the FAQ section.


    Section 10.1

    Frequently Asked Questions (FAQ)

    Annotators often face questions and challenging examples. When several annotators ask the same question, we will update the list of frequently asked questions.

    However, we suggest that language teams set up another communication platform to deal with questions that are specific to a language. This can take the form of a shared online document, a wiki, a dedicated bug tracking system or mailing list. We also suggest keeping track of decisions taken considering borderline examples (with a list of expressions to which the decision applies). These should be kept in a centralized document or page that all annotators can access.

    Whenever you think that a question can also be interesting to other languages, please notify the organizers and we will try to update this page.

    1. How to define an unexpected change in meaning​?
    2. How to annotate lexicalized words which belong to contractions, compounds, and acronyms?
    3. How to annotate coordinated​ VMWEs sharing some components?
    4. How to annotate elliptical​ occurrences of VMWEs?
    5. How to annotate VMWEs that seem to belong to more than one category​?
    6. How to annotate embedded​ VMWEs?
    7. Are existential expressions with there is/are considered VMWEs?
    8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?
    9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?
    10. Does the IRV category include verbs with non-­reflexive clitics?
    11. Should nominalizations​ of VMWEs be annotated?
    12. How to express hesitation between different VMWE categories?
    13. How can one decide what are the semantic arguments of a noun for borderline cases?
    14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?
    15. Should I annotate compound and serial verbs as VMWEs? Of which category?
    16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?
    17. In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?
    18. In the LVC decision tree, should I test that the noun keeps its original meaning?
    19. How can I easily browse the already existing annotations in my corpus?

    1. How to define an unexpected change in meaning​?

    Check the glossary entry that defines unexpected change in meaning

2. How to annotate lexicalized words which belong to contractions, compounds and acronyms?

In some languages adpositions (pre- or post-positions), clitics and determiners are subject to contractions (i.e. they yield multi­word tokens, MWTs). If they are properly split by the tokenizer, only the lexicalized parts of each contraction should be annotated. If you use FLAT for annotating, the display of split contractions is twofold: both in its folded and unfolded version. Only the latter should be subject to annotation, e.g. Jean bénéficie du de le traitement Jean benefits from the treatment, Jean donne du de le grain à moudre à son fils Jean gives grain to grind to his sonJean gives an occasion to act to his son.

Sometimes, however, tokenizers might not handle contraction splitting properly. In this case, a lexicalized component of a VMWE can be merged with an external word:

  • n.a.
  • haberse suicidado have+REFL suicided committed suicide
  • n.a.
  • aller au (à+le) secours go to+the rescueto rescue
  • n.a.
  • n.a.
  • n.a.

A similar problem occurs in languages with productive compounding, where a lexicalized component of a VMWE and a free modifier can build up a multitoken word (since compound splitting might not be a standard feature of a tokenizer):

  • unter Drogeneinfluss stehen to be under the influence of drugs
    Heisshunger haben to have hot hunger to be ravenously hungry
  • n.a.
  • reuzehonger hebben enormous hunger have to be ravenously hungry
  • n.a.

Yet another related phenomenon concerns acronyms whose spelled-out versions may contain predicative nouns which in the abbreviated versions boil down to single letters:

  • the patient has AIDS (acquired immunodeficiency syndrome)
    the book underwent OCR (optical character recognition)
    the program carries out a PCA (principal component analysis)
  • el paciente presenta un SCA (síndrome coronario agudo)
  • le patient présente un SCA (Syndrome coronarien aigu)
    le patient fait un AVC (accident vasculaire cérébral)

Since the current annotation format is token­-based, we prohibit correcting tokenization errors and compound splitting by the annotators for the sake of coherence. Therefore the annotation of such contractions, compounds and acronyms finds no fully satisfactory solution in our schema. We propose to annotate a whole MWT each time it contains a word which is part of a VMWE. Annotators should add a textual comment about the mixed status of this MWT:

  • Drogeneinfluss → MWT containing a lexicalized VMWE component Einfluss and an external word Drogen
    Heisshunger → MWT containing a lexicalized VMWE Hunger and an additional modifier heiss
  • haberse → MWT containing a lexicalized VMWE component se and an external word haber
  • n.a.
3. How to annotate coordinated​ VMWEs sharing some components?

A component shared by two or more coordinated VMWEs should be annotated as belonging to ​both of them.

  • Regeln und Richtlinien aufstellen to set up rules and guidelines to draw up rules and guidelines aufstellen must be annotated both as part of​ to Regeln aufstellen to lay down rules and of Richtlinien aufstellen to draw up guidelines
  • κάναμε βόλτες και ένα σωρό ψώνια στο εμπορικό κέντρο κάναμε we made must be annotated both as part of​ κάναμε βόλτες we made walksand of κάναμε ψώνια we were buying
  • to have a walk or a ride have must be annotated both as part of​ to have a walk and of to have a ride
  • darse un baño o una ducha give a bath or a shower to have a bath or a shower darse must be annotated both as part of​ darse un baño and of darse una ducha
  • hitz eta lan egin word and work do to speak and work egin must be annotated as part of both hitz egin and lan egin.
  • Regels en voorschriften opstellen to set up rules and guidelines to draw up rules and guidelines opstellen must be annotated both as part of​ regels opstellen to lay down rules and of voorschriften opstellen to draw up guidelines
  • odprawić mszę i pokutę celebrate a mass and a penanceodprawić should be annotated both as part of​ odprawić mszę to celebrate a mass and of odprawić pokutę to celebrate a penance
  • a cere cuiva explicații sau socoteală to ask someone.to explanations or account cere should be annotated both as part of​ cere explicații and cere socoteală
  • imeti dober želodec in dobre živce to have a good stomach to bear something well and good nerves to be mentally strong imeti have must be annotated both as part of​ imeti dober želodec and of imeti dobre živce
4. How to annotate elliptical​ occurrences of VMWEs?
Instances of a VMWE in which all but one lexicalized component were omitted or pronominalized should not be annotated. This concerns in particular the cases where a nominal component is concerned by anaphora. For instance, in this decision was hard but he took it, we should not annotate take and decision or it as an instance of a VMWE. We annotate only the transformations in which the syntactic dependency link between the head verb and the ​lexicalized ​complement is preserved, e.g. the decision which he took.
5. How to annotate VMWEs that seem to belong to more than one category​?

Such hesitation issues should normally be solved by the structural tests. For instance, consider the German expression sich eine Frage stellen SELF a question put to doubt. It may seem to belong to both IRV, since sich is required only if stellen co-occurs with Frage, and LVC, since Frage keeps its original meaning and stellen brings no additional meaning. However, test S.2 [1DEP] indicates that an expression like this should be annotated as a VID, since the verb has more than one lexicalized syntactic dependent.

Similarly, the French expression avoir peur have fear to be afraid seems to have features of a VID. Unlike most LVCs, ­it does not allow a determiner *avoir une peur have a fear , except when the noun is modified avoir une grande peur have a great fear . However, test S.4 [CATEG] in the generic decision tree 2, and the LVC-­specific decision tree indicate that it belongs to the LVC category.

6. How to annotate embedded​ VMWEs?

Candidate VMWEs embedded in other VMWEs should be annotated only if they have a VMWE status also outside the particular context. For instance, the VMWE to let the cat out of the bag should be annotated as a VID, and its embedded VMWE to let out as a VPC.

On the other hand, the French expression se faire des idées SELF make DET.PL ideas to imagine things which are not true, se faire should not be annotated as IRV, since it is not inherently reflexive as a standalone verb+clitic combination.

7. Are existential expressions with there is/are considered VMWEs?

Hesitations about a possible LVC status can arise with respect to existential constructions with nouns introducing events or properties (see test LVC.1 [N­-PRED]) as in:

  • es gibt Beschwerden there are complaints
  • υπάρχουν κατηγορίες there-are problems there are problems
  • there are complaints
  • hay quejas there are complaints
  • arazoak daude problems there-are there are problems
  • il existe des plaintes it there has complaints there are complaints
  • het is nodig it is necessary
  • n.a.
  • queixas has complaints there are complaints

Namely, the noun keeps its original sense and the existential verb to be or to have brings no additional meaning. However, a candidate LVC must also pass test LVC.4 [V­-REDUC]. This requires the modification of the noun by the verb's subject, which is impossible with impersonal and empty subjects like there. Therefore, such candidates cannot be LVCs.

Note,​ however, that existential expressions themselves can be VMWEs of the VID type. For instance, in the French example il y a des plaintes it there has complaints there are complaints, two dependents of the verb a has are lexicalized: il it and y there , therefore it is a VID (see test S.2 [1DEP]).

8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?

If at least one of the five LVC tests (9 to 13) is not passed, the candidate is not considered an LVC. For the sake of a deterministic VMWE categorization and higher inter-­annotator agreement, we admit a definition of an LVC which might seem more restrictive than some linguistic studies usually assume. Thus, we exclude from the LVC scope:

  • expressions in which the verb's syntactic subject is not necessarily the noun's semantic subject, like to give courage or to make an impression. These candidates do not pass test LVC.4 [V-­REDUC].
  • expressions where the lexicalized nominal dependent of the verb is its subject, as in the problem lies in something; these candidates do not pass test LVC.4 [V-­REDUC].
  • expressions with aspectual verbs, as in to start, to pursue, to stop a walk. These do not pass test LVC.3 [V-­LIGHT] since they add (aspectual) semantics to the noun. The only exception is when the noun itself is already aspectual, as in to come into bloom
9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?

Pure operator verbs, i.e. such verbs which never have any semantics per se but only carry the grammatical (tense, mood etc.) information, seem to contradict the intuition behind a VMWE. Namely, they usually select a whole semantic class of nouns. For instance to commit selects any negative act (a crime, a suicide, a theft) and to perform selects any activity (a task, an experiment, a miracle). In this sense, their complements resemble open slots and the whole combinations resemble collocations. However, for the sake of a deterministic VMWE categorization and higher inter­-annotator agreement, we do include verb+noun combinations with pure operator verbs, such as to commit a crime and to ​perform a task, into the LVC category. This is because such combinations pass all tests (LVC.0 through LVC.4). We found no other reliable tests which would distinguish such productive cases from less productive ones like to make a decision. In particular, some studies (e.g. Bonial 2014) show that there exist no truly productive light verbs. Therefore, all examples cited here to be classified as LVCs.

10. Does the IRV category include verbs with non­-reflexive clitics?

No, the IRV category only includes (some) combinations of a head verb with a reflexive clitic. As indicated in the borderline cases page of IRV category, other pronouns, whenever lexicalized, trigger the VID category. Recall that whenever more than one dependent of the verb is lexicalized (including or not a reflexive clitic), the VMWE is always categorized as an ID

  • sich Fragen stellen SELF questions put to doubt
  • n.a.
  • s'en aller SELF of-there go to leave
  • n.a.
  • ucvreti jo to escape her to escape something/someone by running
11. Should nominalizations​ of VMWEs be annotated?

The only nominal VMWE variants within our annotation scope are those:

  • headed by the gerund stemming from the head verb of the VMWE - taking of the decision, and
  • in which a noun stemming from a VMWE is modified by a participle or a relative clause headed by the verb stemming from the same VMWE - the decisions taken yesterday, the decision which he took.

Other nominalizations are excluded:

  • Wortbruch word-break a promise which has not been hold
  • a break-down, a forget-me-not
  • toma de decisiones taking of decisions decision making
    puesta a punto setting to point set-up
  • izen-emate, esker-egite name-giving, thanks-doing inscription, thanks-giving
  • la prise en compte the taking into account the fact of taking something into account, peut-être may-be maybe, porte-feuilles carry-sheets wallet
  • vergeet-mij-nietje forget-me-not
  • zabawa czyimś kosztem a play at someone else's expenses derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses
  • un pierde-vară a loses-summer a lazy person
  • šala na tuj račun a joke at someone else's expenses derived from šaliti se na tuj račun to play a joke on someone

For practical reasons (e.g. compatibility with an existing annotation, or usefulness for a particular application) they can be considered language-specific VMWEs but then a new category should be defined for them, so as to keep the universal and the quasi­-universal categories intact

12. How to express hesitation between different VMWE categories?

Once identified in a text, each VMWE is to be assigned to exactly one category. Note that in this version of the guidelines we no longer admit "hesitation labels" (e.g. LVC/VID) used in the pilot annotation. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.

13. How to decide what are the semantic arguments of a noun for borderline cases?

The goal of test LVC.1 is to identify whether a noun is predicative, that is, whether it requires at least one semantic argument. For many classes of abstract nouns, however, it can be tricky to apply the test. We advise listing in a separate document those classes of nouns that pass test LVC.1 in your language. Language teams can also provide links to the documentation of semantic annotation projects such as NomBank for English, which usually include tests and descriptions that help identifying semantic arguments.

We suggest considering that the following categories pass test LVC.1:

  • Illnesses, symptoms and health conditions:
    Ο Γιάννης έχει συνάχι = ο Γιάννης είναι άρρωστος (αρρώστεια is a hypernym of συνάχι)
    Relations:
    Ο Γιάννης έχει σχέση με κάποιον = Ο Γιάννης σχετίζεται με κάποιον
    Ο Γιάννης έχει επαφές με κάποιον = Ο Γιάννης επικοινωνεί με κάποιον (επικοινωνία is a synonym of επαφή)
    Mental content (internal to a cognizer):
    Ο Γιάννης έχει ανησυχία = Ο Γιάννης ανησυχεί
    Ο Γιάννης έχει μια ιδέα = Ο Γιάννης σκέφτεται (σκέψη is a synonym of ιδέα)
    Ο Γιάννης έχει την άποψη = Ο Γιάννης κρίνει (κρίση is a synonym of άποψη)
  • Illnesses, symptoms and health conditions:
    John has a flu = John is ill (illness is a hypernym of flu)
    Relations:
    John has contact with somebody = John contacts somebody
    John has an affair with somebody = John is involved with somebody (involvement is a synonym of affair)
    Mental content (internal to a cognizer):
    John has a worry = John worries
    John has an idea = John thinks (thought is a synonym of idea)
    John has an opinion = John believes (belief is a synonym of opinion)
  • Mental content (internal to a cognizer):
    Miha je v dvomih Miha is in doubts = Miha dvomi Miha doubts
    Miha je mnenja Miha is of opinion = Miha meni Miha believes
    Miha ima predstavo/pojma Miha has an idea = Miha meni Miha thinks (predstava, pojem are synonyms of idea in this context)

Please notice that events and states that have no semantic arguments do not pass test LVC.1, even if they have verbal/adjectival paraphrases:

  • Natural phenomena: rain, snow, tornado, flood, earthquake
    Informational content (external to a cognizer): information, news
  • Natural phenomena: dež, sneg, tornado, poplava, potres rain, snow, tornado, flood, earthquake
    Informational content (external to a cognizer): informacije, novice information, news

Finally, notice that not any verb + predicative noun combination forms an LVC. Additionally, the verb needs to be "light", not adding semantics to the noun. The remaining LVC tests guarantee this.

14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?

Most of the time, it is easy to test whether a determiner is lexicalized by searching alternatives in corpora (or on the web). For instance, the is lexicalized in to kick the bucket because searches for other determiners (this, a, some, three, many, etc.) either do not return any result or return only literal uses of this verb phrase.

However, borderline cases do exist, in which alternatives are rare but possible, specially for LVCs and decomposable VIDs. For instance, while the standard form of the idiom spill the beans forbids some determiners (#spill three/twenty beans), it is possible to find some variation (spill these/many/all/my/his/more/no beans).

We argue that the selection of some determiners (but not all) by a VMWE is comparable to selected prepositions for verbs. Thus, it can be seen as a regular grammatical phenomenon, suggesting that when the determiner varies, then it should not be included in the annotation scope. Possesive pronouns (my, her, their, etc.) and reflexive clitics (myself, herself, themselves, etc.) are exceptions to this rule (see also Section 1.4). Namely, when they are constrained to agree in number and person with the subject (I do my best, *I do your best), they are realized by different lexemes, i.e., strictly speaking, they are not lexicalized. We consider, however, that - with respect to lexicalization - they constitute single lexemes inflecting for number and gender.

Patricular language teams may of course adopt their own criteria for annotating partly frozen determiners. Then, these decisions should be documented in language-specific guidelines.

15. Should I annotate compound and serial verbs as VMWEs? Of which category?

It depends. In many Indo-European languages (including Germanic, Romance and Balto-Slavic families), verbal chains using auxiliary and modal verbs are used to express tense, mood, modality and aspect. This is a regular linguistic phenomenon, fully productive, that can be applied to any verb and should not be annotated at all.

On the other hand, some languages have idiomatic compound and serial verbs, that is, VMWEs whose lexicalized components are two verbs, and where of them does not express tense, mood, modality and/or aspect with respect to the other one. Therefore, we have created a new category in edition 1.1 to annotate these constructions, called multi-verb construction (MVC), covering examples such as:

  • will sagen want to say that is to say
  • to let go
    to make do
  • querer decir want say to mean
  • ?
  • laisser tomber let fall to give up
    vouloir dire want say to mean
  • lasciar andare let go to unhand
    voler dire want say to mean
  • wil zeggen want to say that is to say
  • dać komuś żyćto let someone livenot to bother someone
    można wytrzymaćone can standthe situatiion is reasonably good
  • querer dizer want say to mean
    ouvir falar hear speak to know/remember vaguely
  • n.a.
  • n.a.
16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?

The guidelines determine that only lexicalized components should be annotated. Therefore, we suggest that, in such cases, if the NP is compositional, only the head of the NP is included in the scope of the LVC. This may lead to the annotation of odd LVCs that actually never occur by themselves without a modifier. This is not a problem and is already the case for other VMWEs, e.g. the ones that only occur with a determiner, but the determiner is not lexicalized. The only cases where the NP should be included as a whole is if the complement is a non-compositional MWE, so that it would not make any sense to annotate only the head.

  • παίζω το χαρτί του ευρωσκεπτικισμού to-play the paper the.SG.GEN euroscepticism.SG.GEN to use the asset of euroscepticism, to use euroscepticism as an asset
    κάνω στάση εργασίας to-make stop work.SG.GEN to go on strike, to strike → the expression στάση εργασίας is non-compositional (term)
  • darse una larga ducha caliente give.self a long shower hot to have a long and hot shower
  • présenter un Syndrome Coronairien Aigu to present an acute coronary syndrome
    mener une vie de débauche to have a life of pleasures
    faire un faux pas make a false step to commit a faux pas → the expression faux pas is non-compositional
  • mieć wyrzuty sumienia to have reproaches of the conscience to feel guilty
  • fazer uma sessão de fotos/autógrafos to make a photo/autograph session
    fazer roleta russa to make russian roulette to play russian roulette → the expression roleta russa is non-compositional
    ter uma situação financeira/profissional/estável to have a financial/professional/stable situation

Notice that these suggestions also apply to LVCs whose nominal complements are introduced by prepositions (i.e. verb+PP LVCs). As usual, the preposition should be included if it is lexicalized and then the NP introduced by the preposition is analyzed exactly as described above.

If the complex dependent is an acronym, you may want to add the textual comment "PART" to indiate that only part of the full version is lexicalized (generally, the head), just like for contractions and compounds.

17. In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?

Depending on the language, aspect can be realised by various lexical, morphological and syntactic means.

  1. We consider aspect a morpological feature in the following cases:
    • Perfective or continuous aspect introduced by inflection and/or analytical tenses:
      • John was making a presentation
        he called her while having a walk
      • Jan was een presentatie aan het maken Jan was making a presentation
    • Perfective or imperfective aspect inherent to the verb (independently of its inflected form), recognisable either by a prefix or by an ending:
      • pełnić rolęfulfil.IMPERF a roleto play a role
        wypełnić rolęfulfil.PERF a roleto play a role
        wypełni rolęfulfil.PERF a roleto play a role
      • Taja je postavljala vprašanjaTaja was asking questions
        ves čas je dajal napačne napovedi he was always giving wrong forecasts
  2. We consider aspect a semantic feature in the following cases:
    • Starting, continuation or completion is expressed by precise verbs which usually modify other verbs:
      • η Μαρία άρχισε τη συζήτηση Maria started the conversation
        ο Γιάννης διέκοψε την κουβέντα John interrupted the discussion
      • Anthony started his presentation in advance
        the weather interrupted the transmission twice
        we kept our show regardless of the reactions
      • de regen onderbrak de wedstrijd the rain interrupted the match
      • Tomaž je začel svoje predavanje Tomaž started his lecture
        Politik je nadaljeval svojo napoved reform the politician continued his forecast about reforms
        naredili bomo konec onesnaževanju we will make end to pollution we will put an end to pollution

In Test LVC.3, we verify whether the verb adds "light" semantics to the predicative noun. When aspect is expressed as a morphological feature, such as in the first item above, we consider that the verb is light and test LVC.3 passes. However, when aspect is a semantic feature rather than a morphological feature, test LVC.3 fails and we do not have an LVC.

18. In the LVC decision tree, should I test that the noun keeps its original meaning?

The previous version (1.0) of the annotation guidelines contained Test 10 [N-SEM], which checked if the noun in an LVC candidate preserves one of its original senses. If it did not, the candidate was not an LVC.

In the current version of the guidelines we have abandoned this test because:

  • it proved hard to establish the list original senses of a noun,
  • this test was superfluous with respect to Test LVC.4 [V-REDUC],
  • in some verbal idioms (VIDs) the noun also keeps its original sense, so the test can be misleading for the LVC vs. VID distinction.
19. How can I easily browse the already existing annotations in my corpus?

Grew-match is the perfect tool for this purpose. It can be used in two modes

  • As a corpus browser - here you can ask Grew queries and the MWEs matching these querries will be diaplayed. The 3 latest versions of your corpus are uploaded on Grew-Match (select the correct langauge). In particular, the latest version is the one which is loaded in the development branch of your language repository (see here).
  • As a consistency check tool - available from the language table in the PARSEME wiki. This tool groups all sentences containing the same MWEs (like here for Polish).

Section 10.2

Adding new examples in your language

It is often useful to have examples of a phenomenon shown in your own language. Examples in the guidelines are presented as in the template below:

  • MWEs with their lexicalized components in Arabic are indicated like this.
  • MWEs with their lexicalized components in Bulgarian are indicated like this.
  • MWEs with their lexicalized components in Czech are indicated like this.
  • MWEs with their lexicalized components in German are indicated like this.
  • MWEs with their lexicalized components in Greek are indicated like this.
  • MWEs with their lexicalized components in English are indicated like this.
  • MWEs with their lexicalized components in Spanish are indicated like this.
  • MWEs with their lexicalized components in Basque are indicated like this.
  • MWEs with their lexicalized components in Farsi are indicated like this.
  • MWEs with their lexicalized components in French are indicated like this.
  • MWEs with their lexicalized components in Irish are indicated like this.
  • MWEs with their lexicalized components in Hebrew are indicated like this.
  • MWEs with their lexicalized components in Hindi are indicated like this.
  • MWEs with their lexicalized components in Croatian are indicated like this.
  • MWEs with their lexicalized components in Hungarian are indicated like this.
  • MWEs with their lexicalized components in Indonesian are indicated like this.
  • MWEs with their lexicalized components in Italian are indicated like this.
  • MWEs with their lexicalized components in Japanese are indicated like this.
  • MWEs with their lexicalized components in Lithuanian are indicated like this.
  • MWEs with their lexicalized components in Maltese are indicated like this.
  • MWEs with their lexicalized components in Dutch are indicated like this.
  • MWEs with their lexicalized components in Polish are indicated like this.
  • MWEs with their lexicalized components in Portuguese are indicated like this.
  • MWEs with their lexicalized components in Romanian are indicated like this.
  • MWEs with their lexicalized components in Slovene are indicated like this.
  • MWEs with their lexicalized components in Swedish are indicated like this.
  • MWEs with their lexicalized components in Turkish are indicated like this.
  • MWEs with their lexicalized components in Chinese are indicated like this.

Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. Languages use color codes according to their language groups. See the section on notation for more information.

In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.

Editing or adding examples

The shared examples edition spreadsheet used in previous versions of the guidelines is not used any more, all modifications are done on online and are visible immediately. To edit or add examples to the guidelines, you need to create an account on the guidelines 2.0 examples edition platform. You also have to ask Takuya Nakamura, Agata Savary or Carlos Ramisch to grant you the edition rights for your language.

Once you are logged in, you will see some buttons close to each example.

  • The 'copy' button copies the source of the example, and is useful if you want to copy the example of another language and then translate it.
  • The 'source' button is always available for languages you have the right to edit, and allows you to edit the example's XML-like source code, as described below.
  • The 'edit' button is only shown for examples that follow the formatting rules, and allows you to edit the example using a user-friendly interface.

Instructions to create well formatted examples (or correct the ill-formatted ones in 'source') are available in the example edition instructions.

When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language (use the 'copy' button), and then adapting it to your language. You can then paste the example in your language's 'source' mode. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example, you should always check if it is relevant in the context.

If there is something wrong or suspicious with your example, the interface will show an error or warning message.

If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the example empty or add a n.a. comment.

Examples with tags

Let us analyse the English example below, shown in 'source' mode:

MWEs with <lex>their lexicalized components</lex> in English are indicated like this.

As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex> and </lex>. When writing an example, you will often have to use XML tags. We describe below the most important ones.

Bold: you should surround lexicalized components with the tags <lex> and </lex>. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>. This code is presented as follows:

  • He will take a shower

Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe> and </nmwe> tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe> yields the following:

  • This is not an MWE

Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u> and </u>. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe> yields the following:

  • This is not an MWE

Latin-script transcription: You can optionally provide latin-script transcription if your language does not use latin characters. Latin-script transcriptions must be surrounded by the tags <latin> and </latin>. For example, the code الدرس <latin>ad-dars</latin> generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.

  • الدرس ad-dars

Gloss icon: You should also provide English glosses and translation for your examples. Glosses and translations should always be provided in English, and never in another language. Glosses must be surrounded by the tags <gl> and </gl>. Translations must be surrounded by <trans> and </trans>. English examples can also use the tag <trans> to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans> generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>.

  • défendre son bifteck defend one's beefsteak to defend one's interests

Comments: Some examples are presented followed by an explanation or comment, in normal font (black color). This is done by using the tags <n> and </n>. For example, the code some words <n>→ further details</n> generates this:

  • some words → further details

Newline: Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they can be presented on separate lines using the tag <br/>. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). This tag will be treated by the 'edit' interface to break examples that can be edited separately. For example, the code example 1 <br/> example 2 <br/> example 3 will be rendered as follows:

  • example 1
    example 2
    example 3

Inside normal text, you may also use tags such as <i> (italics), <strong> (bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.


Section 10.3

Annotation platform FLAT

The annotation will be performed using the online annotation platform FLAT. The documentation of the platform annotation is provided in a separate document. Check the useful links below:


Section 10.4

Best practices

Annotating VMWEs in text is a hard task. Many tests are semantic and require not only a strong knowledge about the language, but also knowledge of advanced notions in linguistics. As a consequence, ensuring annotation quality and, above all, intra- and inter-annotator consistency, is a challenge. We provide here a set of hints that you can use to try to optimize the annotation effort and ensure the quality of the resulting corpus.

Resources and people

This website only covers the annotation guidelines. Do not forget that many other resources are available on the PARSEME shared task 1.1 website. That website is not for system authors, but for language leaders, annotators and organizers. It contains many useful data, notably the names and contacts of people that can help you, and user manuals for FLAT, for the language leaders, etc. Also, you can use the mailing lists if you need to ask questions that could be relevant for other teams as well. In short, don't be shy to ask if you would like to do something but you're not exactly sure where to start :-)

NotVMWE label

The new FLAT configurations for edition 1.1 allow you to use an optional annotation label called NotVMWE. This is not a new VMWE category, but an auxiliary label which simply means "this is not a VMWE". NotVMWE is an optional and useful label you can use to indicate that something should not be annotated, specially if it is a borderline case. Adding this annotation allows you to add a textual comment saying why you decided not to annotate this construction (e.g. after discussing it with fellow annotators and recording the decision in the list of solved cases).

While you don't need to use this label, we recommend that you use it for challenging/hard cases which, in the end, you decide not to annotate as a VMWE. This kind of annotation will be useful when performing consistency checks. Of course, NotVMWE labels will all be removed in the final released corpora, since this kind of information is irrelevant for shared task participants.

List of solved cases

In edition 1.0, some languages have ensured consistency by keeping a separate shared document (e.g. a Google spreadsheet) where hard/challenging cases were documented. We advise language leaders to implement such a list of solved cases. This allows all annotators to contribute to the discussion of hard cases, and to reach a common decision that can be later applied systematically to all occurrences of the expression and for similar expressions. From our experience, this greatly enhances the satisfaction of annotators and saves some valuable time during the consistency checks. Even for languages that have a single annotator, she/he can keep a personal list of difficult cases and their decisions, to ensure intra-annotator consistency.

Consistency checks

Once all files have been annotated, language leaders will perform the final consistency checks using semi-automatic tools. During these consistency checks, all occurrences of a single expression annotated by all annotators will be shown together. There, language leaders may change annotations performed by individual annotators if they are incoherent with the other annotations. Therefore, do not worry too much if you are unsure about an annotation. Try to be as consistent as possible, but if you do not remember a particular annotation performed earlier, it is not necessary to search through the corpus on FLAT (this is quite time-consuming). If there is some minor inconsistency, it will probably be corrected later by the language leader. But note your decision down on the list of solved cases so that next time you come across the same expression (or a similar one) you do not spend so much time thinking about it.

Intuition and tradition vs. guidelines

You may sometimes (often) find that the guidelines do not reflect your intuition about a given construction, or that they contradict the linguistic tradition and literature in your language. We understand that this is frustrating, but please, remember that our main objective is achieving universal modelling of MWEs while preserving diversity. Therefore, please refrain from using undocumented criteria (a.k.a. intuition), or tests that are only known/documented in your language.

The guidelines were designed taking feedback from many language teams into account. They are also meant to continuously evolve, and we do count on you to play an active role in this process. Therefore, if you disagree with their current version, please, choose one of the two options:

  • Follow the guidelines anyway to ensure the corpus-to-guidelines consistency, but express your criticism (documented with glossed and translated examples in your language), best via Gitlab issues. You may also add comments to those annotations which you would like to modify once the guidelines have been enhanced.
  • Create a language-specific section for the guidelines, describing your own tests and decision trees. We will be happy to publish it online.

Inter-annotator agreement

Usually, data annotation campaigns require measuring inter-annotator agreement (e.g. kappa) to verify that the guidelines are clear and that the annotators are well trained. We encourage language teams to measure inter-annotator agreement. However, in the PARSEME shared task, the organizers do not set any hard threshold on the kappa value required to accept your annotations as part of the shared task. This is a collaborative effort, so we do not feel comfortable with making such requirements to language teams.

Furthermore, VMWE annotation is a very hard task so inter-annotator agreement is expected to be low. We recommend that language teams use complementary tools and resources to compensate for the low agreement, such as the list of solved cases and consistency checks mentioned on this page. After the annotation is completed, we may ask you to double-annotate a sample of your data so that we can calculate inter-annotator agreement, for instance, to report it on a corpus description article. But you should not worry too much about this: do your best in trying to understand the guidelines, do not hesitate to suggest improvements, and try to train annotators as much as possible, for instance, with pilot annotations and discussions. This way, you will ensure that the data released in the shared task for your language will be of high quality. And remember you will have the opportunity to improve it incrementally for the next shared task.

TODO label

We have introduced a new label on FLAT called "{change-me} TODO". This label is a temporary mark-up used to indicated that a given VMWE must be dealt with by a human annotator. It will be used when a corpus is automatically converted and some annotations must be manually checked. For instance, the OTH category from shared task 1.0 disappeared in edition 1.1. Therefore, all VMWEs annotated as OTH in the 1.0 corpora will be automatically converted using the TODO label. This means that all TODO labels must be changed into a valid new category (e.g. VID). In the final annotated corpora, any remaining TODO label will be removed, since this is not actually a VMWE category but just an auxiliary label.

Existence questions and corpus queries

Some tests ask if is possible/impossible to find some attested variant of a candidate. While for many cases this is straightforward (the variant can be easily found), some borderline cases will inevitably occur in which it is hard to tell if a given variant is impossible or just very rare.

Decisions for hard cases like this should not be made based solely on introspection and intuition. In case of doubts, we recommend that annotators:

  1. check existing lexicons for their languages
  2. perform corpus queries using any available large raw monolingual corpus
  3. run web queries, e.g. using Sketch Engine, Linguee or plain Google
  4. discuss the case with other annotators, reach a decision and mark it in the list of solved cases

In all cases, the list of lexicons, monolingual corpora and/or web platforms to consult should be agreed upon in advance by all annotators.


Section 11

Glossary

Candidate VMWE

A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.

Collocation

A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:

  • цените се покачват prices rise
    играя футбол to play football
  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
  • μετανιώνω πικράmetaniono pikra regret.1SG bitterly
  • the graphic shows
    drastically drop
  • responder a una petición to answer a request
    el diagrama muestra the diagram shows
    coger el tren to take the train
  • de bus nemen to take the bus
  • zalać rynek to flood the market to dominate the market
    przyznać rację to admit right to admit that someone is right
    uprawiać sport to practice sports
    wzruszać ramionami to shrugging one's shoulders
  • občutno zmanjšati significantly reduce
    drastično zmanjšati drastically reduce

Cranberry word

A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:

  • вземам на мушка някого/нещо take on target to critisise heavily somebody/something
  • jemandem Angst einjagen to-someone chase-in fear to frighten someone
    jemanden einen Besuch abstatten
  • to go astray
  • sin decir ni chus ni mus chus is not a stand-alone word without to_say neither chus nor mus without saying a word
    no decir ni chus ni mus chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
    hacer algo a troche y moche troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
  • se mettre martel en tête SELF put a hammer in head to worry a lot
  • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
    sprawiedliwości stało się zadośćjustice has been done
  • pune pe roate - roate is a form found only in expressions in the literary language
  • biti si kvit owe nothing to somebody; each party got what it deserved/asked for

Extended nominal phrase

An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:

  • noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
    • въпрос question, зелена светлина green light
    • εξήγηση, ο σκύλος, το ίδιο βιβλίο, πολλά παλιά ρούχαejigisi, o skilos, to idio vivlio, pola palia rucha explanation, the dog, the same book, many old clothes
    • explanation, the dog, many old documents
    • explicación, el perro, muchos documentos antiguos
    • explication, le chien, quelques documents anciens
    • uitleg explanation, de hond the dog, veel oude documenten many old documents
    • ludzie people, najbliżsi współpracownicy closest collaborators
    • razlaga, pes, številni stari dokumenti explanation, the dog, many old documents
  • prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
    • за здраве for (good) health
      преди всичко before everything
    • για το αγόρι, με φόβο, από χαράγia to aγori, me fovo, apo chara for the boy, with fear, by hapinnes for the boy, with fear, of happiness
    • on the bed, after the lesson, in front of the window
    • en la cama, después de la clase, enfrente de la ventana
    • sur le lit, après le cours, devant la fenêtre
    • op het bed on the bed, na de les after the lesson
    • ze stanowiska from a position
      dla wszystkich for everyone
      z prawdziwego zdarzeniafrom a true event genuine
    • na postelji, po pouku, pred hišo, za steno on the bed, after the lesson, in front of the house, behind the wall
  • noun phrases with case markers
    • предавам богу дух give to god.GEN soul to die
    • n.a.
    • ludzi people.GEN, najbliższymi współpracownikami closest.INST collaborators.INST
    • n.a.
    • mačka cat (nominative), mačke cat (genitive), mački cat (dative), mačko cat (accusative), o mački cat (prepositional), z mačko cat (instrumental)
  • noun phrases with postpositions
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n. a.

ENP is close to the UD understanding of the nominal phrase.

Particles

Particles are hard to distinguish from homographic prepositions:

  • ich schlage vor allen zu verzeihen I propose to forgive everyone
    ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
  • to get up a petition
    to get up a hill
  • n.a.
  • jestem zaI an forI am in favor
    jestem za ustawąI an for the lawI am in favor of the law
  • n.a.
  • n. a.

The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:

  • das Schild um|fahren to drive over the sign
    den See umfahren to drive around the lake
  • n.a.
  • Ongelukken kunnen voorkomen Accidents can happen
    Ongelukken kunnen worden voorkomen accidents may be prevented
  • n. a.

Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.

Reflexive clitics

Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:

  • се, си
  • mich, dich, sich, uns, euch
  • me, te, se, nos, os
  • me, te, se, nous, vous
  • mi, ti, si, ci, vi
  • me, je, zich, ons, jullie
  • się, sobie
  • me, te, se, nos, vos
  • mă/m-, te, se/s-, ne, vă/v-, se/s- (for accusative); îmi/mi-/-mi, îți/ți-/-ți, își/și-/-și, ne, vă/-vă/v-, își/și-/-și (for dative)
  • se, si

Semantic argument

A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.

  • Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit implies the existence of its semantically mandatory participants. For instance, a visit cannot hold if there is no visitor or no visitee, courage is a property of a being, a presentation implies the existence of a presenter, of an audience and of a presented topic. Some participants are not semantically mandatory, for instance the addressee is not semantically mandatory for a whisper because one can whisper without an addressee. We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
    • To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.
    • To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.
    • To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.
    • To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.
    • To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.
    • To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.
    • To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.
    • priti v poštev to come into consideration to be considered
      imeti mnenje to have an opinion to believe
  • Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.

Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.

Subcategorization frame

A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:

  • return: [NP]subject + [NP]direct object + [to NP]oblique
    • Example: [my sister]subject returned [the book]direct-object [to the library]oblique

Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).

Syntactic and semantic heads

The syntactic head of a construction is the part of the construction which determines the morphosyntactic valence constraints of the whole construction. For instance, in The producers of tobacco use a form of asbestos in this kind of filter, the syntactic head of producers of tobacco is producers, since it determines e.g. the plural form of the verb use.

The semantic head of a construction is the part of the construction which determines the lexico-semantic selectional restrictions of the whole construction. In the sentence above producers is also the semantic head, since it determines the semantic type of the whole construction (here: human), which agrees with the constraints imposed by the verb use.

Cases in which syntactic and semantic heads differ include transparent nouns: part of the room, liter of wine, her jerk of a husband, etc. For instance in The majority of tobacco producers uses a form of asbestos in this kind of filter, the syntactic head of majority of tobacco producers is majority and the semantic head is producers.

Bibliography:

  • Charles J. Fillmore, Collin F. Baker, and Hiroaki Sato. 2002. Seeing Arguments through Transparent Structures. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
  • Alan Cruise. 2006. A Glossary of Semantics and Pragmatics, Edinburgh University Press.
  • Adam Przepiórkowski. On heads and coordination in valence acquisition. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing (CICLing 2007), number 4394 in Lecture Notes in Computer Science, pages 50–61, Berlin, 2007. Springer-Verlag.

Syntactic argument

Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.

Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.

Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.

Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.

Syntactic operator

A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:

  • отдавам почит to give tributeto pay tribute
  • eine Entscheidung treffen to make a decision
    Angst haben to have fear
    ein Verbrechen begehen to commit a crime
  • παίρνω απόφασηperno apofasi take.1GS decision.SG.ACC to make a decisio, to decide
  • to make a decision
    to have fear
    to commit a crime
  • tomar una decisión
    tener miedo
    hacer ilusión
  • een beslissing nemen to make a decision
    een misdrijf plegen to commit a crime
  • oddać hołd to give-back tributeto pay tribute
  • priti v poštev to come into consideration to consider

Unexpected change in meaning

An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility​. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:

  • давам ти книгаI give you a book давам ти тетрадка/роман/том/учебник I give you a notebook/novel/volume/textbook
  • Ich gebe dir mein Buch I give you my book Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • Te doy mi libro I give you my book Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
  • I give you my book I give you my notebook/novel/volume/publication
  • daję ci książkęI give you a book daję Ci zeszyt/powieść/tom/publikację I give you a notebook/novel/volume/publication
  • îți dau carteaI give you the book îți dau caietul/romanul/volumul/publicația I give you the notebook/novel/volume/publication
  • dam ti besedo I give you a wordI promise #dam ti izraz/zlog/glagol I give you a word/syllable/verb

the same does not hold for:

  • давам ти дума I give you a wordI give you my word #давам ти слово/израз/текст I give you a word/expression/text
  • Ich gebe Dir mein Wort I give you my word, i.e. I promise #Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • Te doy mi palabra I give you my word, i.e. I promise #Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
  • I give you my word #I give you my term/expression/headword
  • Ik geef je mijn woord I give you my word, i.e. I promise #Ik geef je mijn boek I give you my book
  • daję ci słowo I give you a wordI give you my word daję Ci wyraz/sylabę/czasownik I give you a word/syllable/verb
  • Îți dau cuvântul I give you my word #Îți dau caietul/romanul/volumul/publicația I give you my notebook/novel/volume/publication
  • dati komu besedo to give (someone) a wordto promise someone

That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:

  • давам своята дума to give one's word to someone
  • jmd. sein Wort geben to give one's word to s.o.
  • to give one's word to someone
  • dar a alguien tu palabra to give one's word to s.o.
  • zijn woord geven to give one's word
  • dać komuś słowo to give someone a wordI give one's word to someone
  • a-ți da cuvântul cuiva to give your word to someone
  • n.a.

is a VMWE.

Similarly, Test IVPC.1 [PART-REDUC] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:

  • n.a.
  • Ich fange das Buch an I begin to read the book does not imply Ich fange das Buch I catch the book
    Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
  • to check in upon arrival does not imply to check upon arrival (it is an IVPC)
    to look up into the sky implies to look into the sky (it is not an IVPC)
  • n.a.
  • n.a.
  • n.a.
  • n.a.
g

Ungrammaticality

Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).


Section 12

Contact

These guidelines were written by many authors. If you have questions, comments, suggestions, you can contact the people in charge of the PARSEME corpora initiative.

You are welcome to also contribute to this initiative in other ways - see why and how.


An error has occured !