Annotation guidelines
corpora annotated for multiword expressions
Welcome to the official annotation guidelines of the PARSEME corpora version 1.3.
For previous versions, you can check the index of versions. See also what is new in the guidelines version 1.3 as compared to version 1.2.
Here, you'll find detailed definitons, examples and linguistic tests to guide your decision as to whether a given combination in your language is a verbal multiword expression. Use the table of contents on the left to navigate between sections and the header buttons to show/hide examples.
In addition to these general guidelines, language teams may also provide extra documentation, like lists of borderline cases and decisions taken concerning them. They should all be compatible with these general guidelines.
If you spot errors or if something remains unclear after reading the guidelines, please contact us and we'll do our best to correct the problems.
Authors and contributors (alphabetical order)
Chérifa Ben Khelil, Archna Bhatia, Claire Bonial, Marie Candito, Fabienne Cap, Silvio Cordeiro, Vassiliki Foufi, Polona Gantar, Voula Giouli, Najet Hadj Mohamed, Carlos Herrero, Uxoa Iñurrieta, Mihaela Ionescu, Iskandar Keskes, Alfredo Maldonado, Verginica Mititelu, Johanna Monti, Joakim Nivre, Mihaela Onofrei, Viola Ow, Carla Parra Escartín, Manfred Sailer, Carlos Ramisch, Renata Ramisch, Monica-Mihaela Rizea, Agata Savary, Nathan Schneider, Ivelina Stonayova, Sara Stymne, Ashwini Vaidya, Veronika Vincze, Abigail Walsh, Hongzhi Xu.
Developers (alphabetical order)
Quentin Barrouyer, Carlos Ramisch, Baptiste Souche
Table of contents
- 1 Definitions and scope
- 2 Textual annotation scope
- 3 Categories of VMWEs
- 4 Annotation process - decision tree
- 5 Cross-lingual tests
- 6 Language-specific tests
- 7 Annotation management
- 8 Glossary
- 9 Contact
Section 1
Definitions and scope
In this shared task, we aim at identifying verbal Multiword Expressions (VMWEs) in running texts in about 20 languages from several language families. VMWEs are of particular interest to the PARSEME COST action since they frequently introduce discontinuity and long-distance dependency issues, which are central to deep parsing and to other Natural Language Processing tasks.
This document defines the annotation scope and puts forward a classification of VMWEs together with linguistic tests for their identification and categorization.
Section 1.1
Notation
The notational convention used throughout the document is the following:
- Italic is used to display example sentences and expressions.
- Bold is used to highlight the lexicalized components of a candidate VMWE inside an example (positive or negative).
- Underline is used to focus the reader's attention on the important part of an example
- An asterisk (*) precedes ungrammatical examples.
- A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
- Different colors are used to display examples:
- Red is used for counter-examples, that is, expressions which look like VMWEs but are not one, whatever the language.
- According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
- Shades of green are used for positive examples in Germanic languages.
- Shades of blue are used for positive examples in Romance languages.
- Shades of orange are used for positive examples in Slavic languages.
- Shades of pink are used for positive examples in other language families.
- Examples are preceded by the 2-letter language code in parentheses
- Examples can be shown and hidden using the toggle buttons in the header.
Section 1.2
Words and tokens
While the definition of an MWE inherently relies on the notion of a word, manual annotation and automatic identification of VMWEs in our task is performed on texts which are automatically tokenized. It is therefore important to understand the distinction between words and tokens in the context of VMWEs.
A word is a linguistically (notably semantically) motivated unit. The detection of words is, thus, language-dependent and annotation experts should have a clear idea of how to define it for their own language (even if this definition proves hard in general).
A token is a technical and pragmatic notion, defined according to more or less linguistically motivated clues and depending on the particular tokenization tool at hand. Note that the notion of a token is ambiguous in NLP. It can also mean an individual occurrence of a certain linguistic unit, as opposed to a type, i.e. the set of all surface realisations of a unit. In these guidelines, we refrain from using this second sense.
Tokens should ideally be as close as possible to words. However, in practice - due to the hardness of the (automatic) tokenization task - the relation between tokens and words is not always 1-to-1. The following cases occur:
- A token coincides with a word:
- Several tokens build up one word, like in abbreviations, possessive markers, words with "accidental" separators, inflected or derived forms of foreign names, etc. In this case we speak of a multitoken word (MTW): The pipe symbol '|' indicates token separation in these examples
- One token can contain several words, like in contractions and compounds. In this case we speak of a multiword token (MWT). Identifying MWTs is important because they can be potential candidates for VMWEs. However, defining what is a word and a MWT is a hard and language-specific question and language-specific MWT tests are being designed to this end. Examples of MWTs include: See also the representation of MWTs in Universal Dependencies The precise word forms cannot always be straightforwardly deduced from the MWT containing them and vice versa, as in don't, della, du, etc.
παίρνωperno take
έναςenas a
απόφασηapofasi decision
καλός kalos beautiful beautiful
περί peri about about
год|. year
Wie geht|'|s How goes it How are you
υπΔρ υποψήφιος διδάκτορας PhD candidate
pp|. pages
Pandora|'|s
a|/|f|. a favor in favor
Rte|. remitente sender
Pandora|'|s Pandora's
SMS|-|ować to write an SMS
d|-|voastră polite "you"
str|. pages
le|-|to
tweet|-|овање tweet|-|ovanje to write tweets
Apfelbaum = Apfel+Baum apple treeapple tree
al = a+el to+the to the
compárese = compare+se compare SE_PARTICLE be it compared
suicidarse = suicididar+se suicide SELF to commit suicide
jarleku = jar(ri)+leku sit+place seat
b'fhearr = ba+fhearr be.COND better prefer
appelboom = appel+boom apple treeapple tree
pannenkoek = pan + koek pancake
robiłem=robi+łem do.3.SG.PRES+be.1.SG.PAST.AGLI did
żeśmy = że+śmy that+be.1.PL.AGL that-we
новосадски = ново + садски novosadski = novo + sadski Novi Sad (an adjective from a city name)
While a VMWE always contains at least two words, the relation between VMWEs and tokens can be twofold:
- A VMWE contains several tokens, whether each of them coincides with a word or not:
- A VMWE contains one (multiword) token:
прочитам от корица до корица to read from cover to cover (5 words, 5 tokens)
wie geht's (2 words, 4 tokens) how goes it how are you
παίζω στα δάχτυλαpezo sta dachtyla play in-the fingers to know very well (3 words, 4 tokens)
to open Pandora's box (3 words, possibly 5 tokens)
dar por sentado 3 words, 3 tokens to give for seated to take for granted
irse de rositas 3 words, 4 tokens to go_self of little_roses to get off scot free
cavalcare l'onda (3 words, 4 tokens) ride the wave ride the wave
robił|em z igły widły made.3.SG.M1+be.1.SG.AGL a pitchfork out of a needle I made a mountain out of a molehill (4 words, 5 tokens)
cair de pára-quedas to fall with parachute to arrive unprepared in the middle of a situation (3 words, possibly 5 tokens)
queixar-se-ia complain-self-would would complain (2 words, possibly 5 tokens)
vreči puško v koruzo throw a rifle in the corn to give up (4 words, 4 tokens)
hedh një sy (3 words, 3 tokens) throw an eye take a look
причати на|памет pričati na|pamet to talk by heart to talk not relying on facts (3 words, 2 tokens)
anfangen at-catch to begin
aanvangen at-catch to begin
Note finally that multitoken words are not considered verbal MWEs since they contain one (multitoken) word only:
Whenever the distinction between a word and a token is judged by a particular language team as hard to tackle, a possible option is to consider these two notions equivalent for the needs of this shared task.
Section 1.3
Verbal Multiword expressions
Multiword expressions (MWEs) are (continuous or discontinuous) sequences of words with the following compulsory properties:
- They show some degree of orthographic, morphological, syntactic or semantic idiosyncrasy with respect to what is considered general grammar rules of a language. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not annotated.
- Their component words include a head word and at least one other syntactically related word. Most often the relation they maintain is a syntactic (direct or indirect) dependence but it can also be e.g. a coordination. Depending on the category of the head word, the whole MWE can be nominal, adjectival, prepositional, verbal, sentential, etc.
- At least two components of such a word sequence have to be lexicalized. In this task we only annotate the lexicalized components and ignore open slots.
Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that
However, as non-compositionality is a subjective notion, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that (verbal) MWEs have some degree of semantic non-compositionality that implies limited flexibility .
Verbal MWEs (VMWEs) are simply multiword expressions whose syntactic head in the prototypical form is a verb.
Section 1.4
Syntactic variants of VMWEs
VMWEs occurring in a corpus can have various syntactic structures. Since the linguistic tests are structure-driven (cf. e.g. structural tests), there is a necessity to neutralize variation before the tests are applied. In this section we introduce definitions answering these needs.
Prototypical forms
A (candidate) VMWE in its prototypical form (if it exists) is a verbal phrase in active voice whose head verb is in a finite form and whose other lexicalized components depend either on the verb or on another lexicalized component. The VMWE can also contain coordinated verbs. These phrases can be:- Partly saturated, where only some of their arguments are lexicalized:
- Partly saturated, where the lexicalized arguments include the subject:
- Partly saturated, where lexicalized head verbs are coordinated:
- Fully saturated:
أنظار الخطف he kidnapped the sight to grab attention
كلمة ألقى he threw a word make a speech
вземам трудно решение make a difficult decision to make a difficult decision
nahm sich das zu Herzen took this to heart
παίρνω τα μέτρα μου perno ta metra mou take-1.SG the-NE.PL.AC measures-NE.PL.AC my-1.SG.GE.POSS to take precautions
γράφω στα παλιά μου τα παπούτσια κάποιονgrafo sta palia mu ta paputsia kapion him-MA.SG.AC write-1.SG to-the-NE.PL.AC old-NE.PL.AC my-1.SG.GE.POSS shoes-NE.PL.AC I ignore someone
θα μπορούσα να έχω πάρει μία άλλη απόφασηθa borusa na echo pari mia ali apofasi could-1SG to have-1SG take-INF one.SG.ACC other.SG.ACC decision I could have made another decision
break her heart
took this to heart
could take this to heart
would have been making a decision
could have made a different decision
le hubiera roto el corazón him/her would_have broken.he/she the heart he/she would have broken his/her heart
se lo tomaría muy a pecho him/her it would_take very to breast he/she would to it deeply to heart
erabaki bat hartu decision one take make a decision
erabaki garrantzitsuak hartzen ari ziren decision importants taking they-were they were making important decisions
déan dearmad ar rud do forgetfulness on something forget something
spezzare il cuorebreak the heart break the heart
prendere a cuore take to heart take to heart
nam het ter harte took this to heart
a trece ceva sub tăcere to keep something under silence.ACC to keep quiet about something
vzeti si k srcu take something to heart to think about something seriously
bi si lahko to vzel k srcu could take this to heart could think about this seriously
bo v pomoč will be in help will be helpful
ka në dorë have in hand have control over
i bie shkurt hit it short cut to the chase
mbaj mend hold mind remember
thyej zemrën e dikujt break the heart of someone break someone's heart
hap zemrën open heart open up, confide
чашата на търпението ми прелива glass.DET of patience my.POS overflows my patience runs out
geen haar op mijn hoofd die eraan denkt no hair on my head which it of thinks I would not dream of it
пао некоме мрак на очи pao nekome mrak na oči darkness fell on someones eyes someone lost control over oneself
seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
мислити и цвеће брати (није исто) misliti i cveće brati (nije isto) to think and (to) pick flowers (is not the same) to think and put into action are different things
de kogel is door de kerk the bullet is through the church the die is cast
kości zostały rzuconethe dice have been thrownalea iacta est
še pes ima rad pri jedi mir even the dog does not want to be disturbed during its meals do not bother people during their meals
удри бригу на весеље udri brigu na veselje turn worries to joy do not worry
Meaning-preserving variants
Meaning-preserving variants of a (candidate) VMWE include notably:- Verbal expressions with analytical tenses and modals:
- Nominal groups (headed by nominal complements from the prototypical VMWEs) with relative clauses:
- Non-finite verbal clauses (with infinitives, participles, gerunds, regular nominalizations, masdars, etc.)
- Diathesis alternation (passive, impersonal, middle, etc.) :
- Expressions with interposed modifiers (e.g. complex determiners and quantifiers, such as half a dozen, an impressive number of, …):
- Max took the bull by the horns.
- The news took John by surprise.
- Bob took part in the inquiry
- Money burns a hole in Bob’s pocket.
- Two universal categories, i. e. valid for all languages participating in the task:
- Light verb constructions (LVCs) with two subcategories:
- LVCs in which the verb is semantically totally bleached (LVC.full)
- LVCs in which the verb adds a causative meaning to the noun (LVC.cause)
حكم أصدر pronounce judgmenthe pronouncd a judgmentдържа под контрол to keep under controleine Rede halten a speech holdto give a speech(OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)παίρνω μία απόφασηperno mia apofasi take-1SG a decision to decide
δίνω μια εξήγησηdino mia exigisi give.1SG an explanation to explain
ασκώ κριτικήasko kritiki to criticiseto give a lecturehacer una promesa to_make a promise to make a promisemin hartu pain take to hurt oneself
lo egin sleep do to sleepavoir du courage to have couragebain triail as extract trial from tryδιάνοιαν ἔχεινdianoian ekhein thought.ACC have.INF to have a thought
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish
λόγοις χράομαιlogois khraomai words.DAT use.1SG I speak
ἐν νῷ ἔχωen nо̄ ekhо̄ en mind.DAT have.1SG I have in mind
ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angrydržati govor hold a speech to give a speechfare un discorsoto_make a speechto give a speech
fare una promessa to_make a promise to make a promisepieņemt lēmumu to take a decisionto make a decisionħa deċizjoni took a decisioneen toespraak houden a speech holdto give a speechpodjąć decyzję to take a decisionfazer uma promessa to make a promisea lua o decizie to take a decisionto make a decisionimeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinionjap mësim give lesson give a lecture
bëj një premtim do a promise make a promiseдонети одлуку doneti odluku to bring a decision to take a decisionhålla ett tal hold a speechto give a speech做 讲座 do speech to give a speechقيمه أعطى give a value to give a value for somth or someoneдавам възможност give an opportunity(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)δίνω προτεραιότηταto grant rights
to give a headache
to provoke the destruction of the buildingdar dolor de cabeza to_give pain of head to give a headache
hacer ilusión to_make excitement to make excited/to look forward tocuir lúcháir ar put joy on give delight toτιμωρίαν ἀποδίδωμιtimо̄rian apodidо̄mi punishment.ACC give.1SG I inflict punishment
ὀργὰς παρασκευάζομαιorgas paraskeuazomai anger.ACC.PL cause.1SG I make angry
δίκην ἐπιτίθημιdikēn epitithēmi justice.ACC impose.1SG I fine (sb)
τιμωρίαν ποιέωtimо̄rian poieо̄ punishment.ACC do.1SG I inflict punishmentzadati glavobolju komu to give a headache to someone, izazvati nezadovoljstvo to cause dissatisfactiondare il mal di testa to_give pain of head to give a headache
dare noia to_give trouble to annoynest nelaimi to carry misfortuneto bring misfortunerechten verlenen rights grantto grant rightsnakłada obowiązek na użytkowników put a duty on the users
dać prawo to give the rightto grant the right
narazić na straty expose to losses
stawiać komuś celto put an aim to someone to set a goal to someoneda cuiva bătăi de cap give sb. a hard timedati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)jap të drejtë give the right grant rightsдржати реч držati reč to hold a word to keep a promise
задати главобољу некоме zadati glavobolju nekome to give a headache to someone to make problems to someone授予 权力 give power to grant power - verbal idioms (VIDs):
إجتماععقدtie a meeting to lead a meetingправя се на дръж ми шапката to behave myself as 'hold my hat' pretend to be naive and innocent
цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
река и отсека to say and cut to say firmly, decisivelyschwarz fahren to drive black take a ride without a ticket, in Kraft treten into force step to come into effect, in die Waagschale werfen in the weighing pan throw to bring to bear
einen drauf setzen going one better(OEG) 𓐣𓂝𓏝 𓃹𓈖𓇋𓋴 𓌃𓅱𓏝 𓈖 𓋹𓈖𓐍𓅱 wč̣ꜥ Wnꞽś mṭw n ꜥnḫ.w Unas (Wnꞽś) shall-separate (wč̣ꜥ) the word (mṭw) for (n) the living (ꜥnḫ.w). Unas shall judge the living (PT 273b, W)κόβω φλέβεςkovo fleves cut vains to be at a complete state of boredom
απορώ και εξίσταμαι wonder1SG.PST and be-amazed1SG.PST to wonder
παίρνω των ομματιών μουperno ton omation mu take the eyes mine to leave (in dispair)
χάνω τα αυγά και τα καλάθιαchano ta avga ke ta paschalia loose-1SG the eggs and the baskets to be at a complete and utter loss
κόβει το μάτι μουkovi to mati mu cut.3SG the.SG.NOM eye.SG.NOM my to be sharp-eyed
παίρνουν τα μυαλά μου αέραpernun ta miala mu aera take.3PL the.PL.NOM brain.PL.NOM air.SG.ACC to become arrogant
δεν δίνω του αγγέλου μου νερόden dino tu agelu mu nero not give my angel water to be stingyto go bananas
fortune favors the bold
to drink and drive
to voice act
to pretty-print
to short-circuit
to tumble dryhacer de tripas corazón make of intestines heart to pluck up the courage
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a rideadarra jo horn play to pull (somebody's) leg, to be kidding
burua hautsi head break to rack one's brains, to think very hard
ikusi eta ikasi see and learn
hortxe dago koska just-there is the-crux that's the crux of the matterdéfendre son bifteck defend one's beefsteak to defend one's interests
court-circuiter to short-circuitag cur is ag cúiteamh arguing and debating arguing back and forthπερὶ πολλοῦ ποιέομαιperi pollou poeomai above much.GEN do.1SG I hold in high esteem
οἷον τ'ἦνhoion t’ēn of.what.sort.NOM and was.3SG it was possible
δίκην δίδωμιdikēn didо̄mi justice.ACC give.1SG I get punishedmlatiti praznu slamu to beat empty straw to talk aimlessly, mazati komu oči to blur eyes to someone to cheat someonegettare le perle ai porci to_throw the pearls to the pigs to waste something good on someone who doesn't care about it
andare e venire to_come and goback and forth
corto-circuitare
to short-circuitatstiept kājas to strech one's legs to diegħasfur żgħir qalli a bird small told me to hear something from the grapevine
iqum u joqgħod jump and stay to fidgethet ijs breken ice break to break the icerzucać grochem o ścianę throw peas agains a wall to try to convince somebody in vain
pluć i łapać to spit and catch to be lazy, to do nothing usefulfazer das tripas coração transform the tripes into heart to try everything possible
pintar e bordar paint and knit to abusea trage pe sfoară to pull on rope to fool
a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock togetherubiti dve muhi na en mah to kill two flies with one strike to achieve two aims at once, spati kot ubit to sleep like dead to sleep soundlyi bie murit me kokë hit the wall with head to try the impossible
i vë flakën to it put flame to cause troubleхрабре срећа прати hrabre sreća prati fortune follows the bold fortune favors the bold
китити се туђим перјем kititi se tuđim perjem decorate oneself with someone else's feathers steal someone's thunder / take credit for someone else's accomplishments吃 闭门羹 eat closed-door-soup to be locked out
哑巴 吃 黄连 dumb-person eat bitter-medicine a dumb person eats bitter medicine, and he cannot speak out the bitterness - Three quasi-universal categories, valid for some language groups or languages but non-existent or
very exceptional in others:
- inherently reflexive verbs (IRV):
усмихвам се to smilesich bemühen to endeavour, sich enthalten himself contain to abstain(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).- NA in Modern Greekto find oneself in a difficult situation
to to help oneself to the cookiessuicidarse to suicide
quejarse to complainn.a.se suicider to suicide
se soucier to worryn.a.–– This category does not apply to Ancient Greek.smijati se to laughsuicidarsi to suicide
lamentarsi to moanzich bemoeien to get involved, zich vergissen to be mistakenbać się to fear SELFto be afraidse queixar to complaina se gândi to thinkbati se to be afraid, smejati se to laugh, drzniti si to dare to do somethinggëzohem rejoice myself to be happy
pendohem repent myself to regret
kujdesem to care myself to take careбојати се bojati se to be afraid
коцкати се kockati se to gamble - verb-particle constructions (VPC) with two subcategories:
- fully non-compositional VPCs (VPC.full), in which the particle totally changes the meaning of the verb
- semi non-compositional VPCs (VPC.semi), in which the particle adds a partly predictable but non-spatial meaning to the verb
not applicable to Bulgarianer gibt auf he gives up, er wirft ihr das vor he throws her that against he reproches that to herμπαίνω μέσα get in get in to go bankrupt
βάζω μπροςvazo bros put forward to startto do inn.a.n.a.cas chuig turn towards happen to have–– This category does not apply to Ancient Greek.postaviti za to set for to appointbuttare giù to_throw down to swallowhij geeft op he gives upnot applicable to Polishjogar fora This seems to be the only VPC in Portuguese. We annotate it as ID and do not use the VPC category.n.a.n.a.hedh poshtën.a.not applicable to Bulgarianκάνω πίσωkano piso do back to back offto eat upn.a.tabhair suas give up–– This category does not apply to Ancient Greek.andare avanti to_go forward to move onopeten to eat up
opdrinken to drink upn.a.n.a.eci paran.a.把握 住 机会 grasp hold opportunity to grasp the opportunity successfully → a Chinese Resultative Verbal Construction (RVC) - multi-verb constructions (MVC):
will sagen want to say that is to say(MEG) 𓁹𓏏 𓀀 𓈝𓅓𓏏𓂻 𓅓 𓏃𓈖𓏏𓇋𓇋𓏏𓊛 ꞽr.t (⸗ꞽ) šm.t m ḫnt.yt My (⸗i) making (ir.t) of going (šm.t) southwards (m ḫnt.yt) I made a departure southwards. (Sin. B 5-6)έχω να κάνωecho na kano have to do to cope
έδωσα πήραedosa pira give.1PST take.1PST I struggledto let go
to make doquerer decir to_want to_say to mean?laisser tomber let fall to give up
vouloir dire want say to mean?φθάνουσι ἐρχόμενοιphthanousi erkhomenoi overtake.3PL go.PTC they go first
τυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to gomože biti can be it is possiblelasciar andare to_let go to unhand
voler dire to_want say to meanwil zeggen want to say that is to say
laten vallen let fall to give up
leren kennen to learn know to become acquainteddać komuś żyćto let someone livenot to bother someone
można wytrzymaćone can standthe situatiion is reasonably goodquerer dizer want say to mean
ouvir falar hear speak to know/remember vaguelyn.a.n.a.do të thotëдај шта даш daj šta daš give what you give to be satisfied with small (from someone)
ићи куда некога ноге носе ići kuda nekoga noge nose to go where one's feet carry somone to go without an aim排列 成 arrange become to arrange to be
试试 看 try see to try and see
- inherently reflexive verbs (IRV):
- language-specific categories, defined for a particular language in a separate documentation.
- inherently adpositional verbs (IAVs)
- Step 1 - identify a candidate, that is, a combination of a verb with at least one other word which could form a VMWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). If the candidate has the structure of a meaning-preserving variant, find the corresponding canonical form. The following steps should be applied to this canonical form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
- Step 2 - determine which components of the candidate (or of its canonical form) are lexicalized, that is, if they are omitted, the VMWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
- Step 3 - depending on the syntactic structure of the candidate's canonical form, formally check if it is a VMWE using the generic and category-specific decision trees and tests below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
- Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Particle ⇒ Apply VPC-specific tests ⇒ VPC tests positive?
- Annotate as a VMWE of category VPC.full or VPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Structural tests (S)
- Light-verb constructions (LVC)
- Verbal idioms (VID)
- Inherently reflexive verbs (IRV)
- Verb-particle constructions (VPC)
- Multi-verb constructions (MVC)
- Inherently adpositional verbs (IAV) - optional and experimental
- Apply the VID-specific tests
تنلاصبرbe patient you getif you stay patient you will get what you want →non of the verbs is clearly the head, as there in no universally accepted syntactic representations of coordinationsцъфна и вържа → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationleben und leben lassen live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationέδωσε πήρεedose pire gave3SG.PA took3SG.PA he succeeded none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationto pretty-print → there is an unusual case of an adjective modifying a verb
to drink and drive → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationcoser y cantarto_sew and to_singeasy as pie, a piece of cakeikusi eta ikasi see and learn → none of the verbs is clearly the headag cur is ag cúiteamh arguing and debating arguing back and forth → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationἠντεβόλει καὶ ἱκετεύεēntebolei kai iketeue supplicate.3SG and beseech.3SG he begged and beseechedžariti i paliti to stoke and to burn to be powerful , vedriti i oblačiti to brighten and to cloud to be poweful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationvivi e lascia vivere live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationleven en laten leven live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationpluć i łapać to spit and catchto be lazy, to do nothing useful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationpintar e bordar paint and knit to abuseživi in pusti živeti to live and let live to live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationhyr e dil come and go come and go none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationведрити и облачити vedriti i oblačiti to brighten and cloud to be very powerful
што не иде не иде što ne ide ne ide what doesn't go, doesn't go don't force something → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationdet knallar och går it trots and walks it is OK/as usual → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination - continue to the next test
ريح للرجليه أسلمhe gave his feets to the wind he runs away so fast → أسلمto give is the head and the NP depends on itгушна букета to hug the bunch of flowers to die → гушна is the head and the NP depends on it
правя на салата to make into salad to scold → правя is the head and the PP depends on iteine Fratze ziehen a grimace pull to make a face → ziehen is the head and the NP depends on it
er gibt auf he gives up → gibt is the head and auf is the particle depending on itκάνω γκριμάτσαkano grimatsa to make grimace to make a face κάνω is the head and the NP depends on it
παίρνω μία απόφασηperno mia apofasi take a decision to make a decision, to decide παίρνω is the head and the NP depends on it
βάζω μπροςvazo bros put forward to start βάζω is the head and μπρος depends on itto make a face → make is the head and the NP depends on it
to give up → give is the head and up is a particle depending on itdar la cara to_put the face face the consequences → dar is the head and the NP depends on it
hacer muecas to_make grimmaces to make a face → hacer is the head and the NP depends on itlan egin work do to work → the verb egin is the head and the NP depends on itéirigh as rise out of quit → the verb éirigh is the head and the particle as depends on itχάριν ἔχειkharin ekhei gratitude.ACC have.3SG he is grateful → ἔχει is the head and the NP depends on itsložiti facu make a face to show reaction → složiti is the head and the NP depends on itfare le linguacce to_make the grimaces → fare is the head and the NP depends on it
far fuori to_make out to kill → fare is the head and fuori is a particle depending on itnaar de bekende weg vragen to ask for the known road up → vragen is the head and naar de bekende weg is the extended NP depending on itzbijać bąki to smash fartsto fool around, to do nothing useful→ zbijać is the head and the NP bąki depends on it
dać komuś popalićto let someone smoketo make someone's life hard → dać is the head and the infinitive popalić depends on itbater as botas → bater is the head and the NP depends on it
criar vergonha na cara → criar is the head and the two NPs depend on ita face baie to make bath to bath → face is the head and the NP depends on it
a ieși înainte to go forth to greet → ieși is the head and înainte is a particle depending on itimeti krompir to have potatoes to be lucky → imeti is the head and the NP depends on itheq dorë remove hand give up heq is the head, and dorë depends on it.обесити нос obesiti nos hang one's nose to feel down→ обесити is the head and the NP нос depends on it
седети скрштених руку to seat with arms crossedto be inactive, withut the initiative → седети is the head and the NP (in the instrumental case) скрштене руке depends on itatt ge upp to give up → ge is the head and upp is the particle depending on it - Apply the VID-specific tests
لسانهالقطأكل the cat ate his tongueused to talk about someone who was known to talk a lot, then suddenly we see him silent→ two dependents,لسانه his tongue and القط the catна стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person → two dependents, на стар краставичар (PP) and краставици (NP)
прочитам от корица до корица to read from cover to cover → two dependents, от корица (PP) and до корица (PP)
правя (нечий) живот черен make someone'l life black to ruin someone's life → two dependents, (нечий) живот (NP) and черен (small clause)die Katze aus dem Sack lassen to let the cat out of the bag → two dependents die Katze and aus dem Sackκάνω την καρδιά μου πέτραkano tin kardia mu petra make the heart mine stone two dependents, την καρδιά and πέτρα
δίνω τόπο στην οργήdino topo stin orγi give place to anger to hold in one's anger two dependents, τόπο and στην οργήto make ends meet → two dependents, ends and meet
to let the cat out of the bag → two dependents, the cat and out of the bagdejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more → two dependents, con la miel and en los labios
dar gato por liebre to_give cat for hare to rip off, to take for a ride → two dependents, gato and por liebreodolkiak ordainetan eman black-puddings in-exchange give to do something as a response to something somebody has done to oneself (similar to 'what goes around comes around')ići glavom kroz zid to go with head through the wall to be stubborn → two dependents glavom and kroz zidmettere il carro davanti ai buoi to_put the cart in front of the oxen put the cart in front of the horse → two dependents carro and davanti ai buoieen kat in de zak kopen to buy a pig in a poke → two dependents kat and in de zakchować głowę w piasek to hide head in sandto pretend not to see a problem → two dependents, głowę head and w piasek in sand
bać się własnego cienia to fear SELF one's own shadowto be very timid → two dependents, się SELF and własnego cienia own shadowtapar o sol com a peneira to hide the sun with a sieve to sugar-coat → two dependentsa da bir cu fugițiito give tribute with fugitives theto disappear→ two dependents, bir and cu fugiții
a- i ieși ochii din cap to his come out eyes the from head to stare→ three dependents, i, which is a non-RCLI, ochii, and din capskrivati glavo v pesekto hide head in sand to pretend not to see a problem → two dependents, glavahead and v pesekin sand
vlečeš me za nosyou are pulling my nose you're pulling my leg → two dependents, meme and za nosmy noseI hedh benzinë zjarrit I throw gasoline on the fire To make a situation worse (aggravate a problem) Two dependents: benzinë and zjarritићи линијом мањег отпора ići linijom manjeg otpora go down the line of less resistanceto take the path of least resistance → two dependents, линијом linijom line and мањег отпора manjeg otpora less resistence
продати рог за свећу prodati rog za sveću to sell a horn for a candle to deceive somebody on purpose → two dependents, рог rog horn and za sveću за свећу for a candleatt sätta sig upp mot någon to sit oneslef up against someone To defy someone → two dependents, sig and upp - Continue to the next test
مثلاً ضرب hit an example to give examlpe → the single dependent is a noun phrase,مثلاًexampleритам камбаната kick the bell to die → the single dependent is a noun phrase, камбаната
ставам на кайма turn into mince to be destroyed → the single dependent is a prepositional phrase, на кайма
одирам жив skin alive to make someone suffer → the single dependent is an small clause (adjective), живeine Fratze ziehen a grimace pull to make a face → the single dependent is a noun phrase, Fratze
, in Betracht ziehen to take into consideration → the single dependent is a prepositional phrase, in Betracht
er gibt auf he gives up → the single dependent is a particle aufπαίρνω σκληρά μέτραperno sklira metra take hard measures take strict measures → the single dependent is a noun phrase, μέτρα → the single dependent is a noun phrase
φέρω βαρέωςfero vareos bring heavily to resent the single dependent is an adverb, βαρέωςto make a face → the single dependent is a noun phrase, face
to take into account → the single dependent is a prepositional phrase, into account
to take turns → the single dependent is a noun, turns
to give up → the single dependent is a particle, uphacer muecas to_make grimmaces to make faces → the single dependent is a noun phrase, muecas
tener en cuenta to_have in account to take into account → the single dependent is a prepositional phrase, en cuentamin eman pain give to hurt (somebody) → the single dependent is a noun phrase, min
kontuan hartu into-account take to take into account → the single dependent is a noun phrase with a postpositional suffix, kontuanbain triail get trial try → the single dependent is a noun, éirigh as rise out of quit → the single dependent is a particleπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → the single dependent is a prepositional phrase
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → the single dependent is an NPimati osjećaj to have a feeling → the single dependent is a noun, osjećajfare le linguacce to_make the grimaces to make a face → the single dependent is a noun phrase linguacce
prendere in considerazione to take into consideration → the single dependent is a prepositional phrase, in considerazione
egli lo fa fuori he kills him → the single dependent is a particle fuoriopgeven to give up → the single dependent is a particle, opbić na alarm to strike on alarmto raise the alarm → the single dependent is a prepositional phrase, na alarm on alarm
cholera wie cholera knowsI have no idea→ the single dependent is the nominal subject choleracometer um crime to commit a crime → one dependenta face fațăto make faceto to deal with→ the single dependent is a noun phrase, față
a ieși înainte → the single dependent is an adverb, înaintegre za it is about → the single dependent is a particle, za
smejati se to laugh → the single dependent is a reflexive clitic, se
imeti mačka to have a hangover → the single dependent is a noun, mačekhedh poshtë Throw down To reject or dismiss the single dependent: poshtë (adverb)ићи као алва ići kao alva go like halva to sell well → the single dependent is a prepositional phrase, као алва kao alva as halva
језик прегризао bite off your tonguedo not foresee bad things→ the single dependent is the NP језик jezik tongueatt ge upp to give up → the single dependent i s the particle upp - Apply the VID-specific tests
أوزارها الحرب وضعت the war put its weights the war is over →الحرب is the subject of وضعتчашата преля the glass overflowed this is the last straw → чашата is the subject of преляein kleines Vöglein hat mir gezwitschert a little bird told meμου είπε ένα πουλάκιmu ipe ena pulaki me told a little-bird a little bird told me → a little bird is the subject of tolda little bird told someone → a little bird is the subject of toldha llegado tu hora has arrived your time your time has come → tu hora is the subject of ha llegado
me lo ha dicho un pajarito it to_me has told a little_bird a little bird has told me → un pajarito is the subject of ha dichotxoritxo batek esan → txoritxo batek is the subject of esanptičica mi je šapnula a little bird whispered to me → ptičica is the subject of šapnulame lo ha detto l'uccellino a little bird told me → l'uccellino is the subject of ha dettoboontje komt om zijn loontje he that mischief hatches, mischief catcheslicho wie devil knowsI have no ideaa sua hora chegou your time has arrived your time has come
um passarinho me contou que ... a little-bird me.DAT told that ... little bird told me that...a șoptit o păsăricăwhispered a bird little a little bird told someonesrce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something → srce heart is the subject of pade falls , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky → sekira hatchet is the subject of pade fallsMë zuri koka My head caught me I got a headache Koka (head) is the single lexicalized dependent, functioning as the subject of the verb zuri (caught).иде некоме карта ide nekome karta the card goes for someone to have luck → карта is the subject of иде
пасти некоме камен са срца pasti nekome kamen sa srca a stone falls from one's hearth to feel relieved → карта is the subject of пасти - Continue to the next test
زيارة ب قام he did with visit to make a visit→ زيارة is the object of قامобичам чашката love the glass to be an alcoholic
вземам назаем take in loan to borrow
намирам се find SELF to be situatedκάνω μια ευχήkano mia efchi do a wish to make a wish μία ευχή is the object of είπεto make a wish → a wish is the object of makepedir un deseo to_ask a wish to make a wish → un deseo is the object of pedirhitz eman→ hitz is the object of emanλόγοις χράομαιlogois khraomai word.DAT use.1SG I speak λόγοις is the object of χράομαιnapraviti prekršaj to make an offense → prekršaj is the object of napravitidare spettacolo to_make a scene → spettacolo is the object of dareeen toespraak houden→ toespraak is the object of houdenbać się fear SELFto be afraid
chodzić prostą drogą to go (on) a straight road.INST to avoid complications
zacznać od zera to start from zero to start from scratchplouă cu găleata rains with bucket-the It rains heavily → cu găleata is the adverbial of plouăimeti glavo na ramenih to have head on shoulders to be sensible → glava head is the object of imeti havemarr hua take loan to borrow hua (loan) is the single lexicalized dependent, functioning as the object of the verb marr (take).тврдити пазар tvrditi pazar to secure shopping to pretend not to be interested in order to gain more → пазар is the object of тврдити
обрати бостан obrati bostan to pick melon to be ruined → бостан is the object of обрати - Reflexive clitic - apply IRV tests. If the outcome is negative, discard the VMWE candidate.
- Particle (as opposed to an adposition) - apply VPC tests. If the outcome is negative, discard VMWE candidate.
- Verb with no lexicalized dependent - apply MVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
не искам и да чуя don't want to even hear to oppose strongly → и да чуя is a VPwill sagen want to say that is to sayέχω να κάνωhave to doconcernto let go
to make doquerer decir to_want to_say to meann.a.laisser tomber let fall to give up
vouloir dire want say to meanτυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to gopustiti koga živjeti to let someone live not to bother someone, znati raditi to know to work to be capablelasciar andare to_let go to unhand
voler dire want say to meanwil zeggen want to say that is to saydać komuś żyćto let someone livenot to bother someone
można wytrzymaćone can standthe situatiion is reasonably goodquerer dizer want say to mean
ouvir falar hear speak to know/remember vaguelyn.a.n.a.може бити može biti can beit is possible though unlikely - Adposition (preposition or postposition, as opposed to a particle) - in step 3 of the annotation process adpositions are not annotated unless they introduce a lexicalized dependent. Adpositions are covered optionally and experimentally in the post-annotation step (step 4), following the inherently adpositional verb (IAV) guidelines.
разчитам на to rely on
излизам със to come out with. Modern Greek does not have IAV expressionsto come across
to rely onconfiar en to_trust in to trust in entender de to_understand of to know aboutn.a.–– This category does not apply to Ancient Greek.izlaziti s kim to go out with someoneconfidare su to_trust in to trust in intendersi di to_understand of to know aboutbehoren tot to belong toconta pe count onn.a. - Extended nominal phrase (possibly including modifiers, prepositions, postpositions or case markers) - apply LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
زيارة ب قام make a visit → ب زيارة is a noun phrase composed of preposition and a nounритам камбаната kick the bell to die → камбаната is a noun phrase composed of a single noun
давам зелена светлина give green light to allow → зелена светлина is a noun phrase composed of an adjective and a noun
ставам на кайма turn into mince to be destroyed → на кайма is a prepositional phrase composed of a preposition governing a noundie Nase rümpfen the nose wrinkle turn up one's nose at sth. → die Nase is a noun phrase composed of a determiner and a noun
in Kraft treten intoκάνω μία ευχήkano mia efchi make a wish to make a wish → μία ευχή is a noun phrase composed of a determiner and a noun
δίνω εξηγήσειςdino exigisis give explanations to explain → εξηγήσεις is a noun phrase composed of a single plural nounto make a wish → a wish is a noun phrase composed of a determiner and a noun
to take turns → turns is a noun phrase composed of a single plural nounpedir un deseo →un deseo is a noun phrase composed of a determiner and a noun
entrar en vigor→en vigor is a prepositional phrase composed of a preposition and a nounkontuan hartu into-account take to take into account → the NP, kontuan, is composed of a noun (kontu), a determiner (a) and a postposition (-n)
urratsak egin steps do to take steps → the NP, urratsak, is composed of a single plural noun (urrats+ak)τὴν ἴσην χάριν αποδίδωμιtēn isēn kharin apodidо̄mi the same gratitude.ACC give.1SG I show the same gratitude → τὴν ἴσην χάριν is an NP composed of a DP and an adjectivedoći do zaključkato come to conlusion, to conclude→ do zaključka in doubt is a prepositional phrase composed of a preposition governing a nounprendere in considerazione take into account → in considerazione is a prepositional phrase composed of a preposition and a noun
rompere il silenzio to break the silence → il silenzio is a noun phrase composed of an article and singular noun
mettere radici → radici is a noun phrase composed of a single plural nouneen wandeling maken to take a walk → een wandeling is a noun phrase composed of a determiner and a noun
te koop zetten to put for sale → te koop is an extended noun phrase composed of a preposition and a noun
in aanmerking komen in comment come to qualify → in aanmerking is an extended noun phrase composed of a preposition and a nounpodjąć decyzjęto take a decision→ decyzję decision is a nominal phrase composed of a single noun
chodzić prostą drogą to go (on) a straight road.INST to avoid complications → prostą drogą(on)a straight road is a noun phrase composed of an adjective and a noun in (instrumental)
bujać w obłokach to swing in the cloudsto fantasize→ w obłokach in the clouds is a prepositinal phrase composed of a preposition and a nountomar banho to take a shower → banho is a noun phrase composed of a single nouna rupe tăcerea to break silence the to start talking → tăcerea is a noun phrase composed composed of a single noun
a face baie to do bathto take a shower → baie is a noun phrase composed of a single nounbiti v dvomih to be in doubts to doubt→ v dvomih in doubts is a prepositional phrase composed of a preposition governing a noun, klicati jelene to call cerfs to vomit → jeleni cerfs is a noun phrase composed of a single plural nounузети маха узети маха to take swing/moment to spread→ маха maha swing/moment is a nominal phrase composed of a single noun
дати часну реч dati časnu reč to give an honorable word to promose firmly → часну реч časnu reč (honorable word is a noun phrase composed of an adjective and a noun in (accusative)
пасти на ум некоме pasti na um nekome to drop on one's mind to get an idea→ на ум na um on mind is a prepositinal phrase composed of a preposition and a noun - (Hindi-specific) Adjective which is morphologically identical to an eventive noun: Apply the LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
- Adjective: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
излизам сух от водата to come out dry from the water to avoid taking responsibility
одирам жив skin alive to make somone suffer
гоня дивото chase the wild.ADJ to take risks → дивото is a substantiverot sehen to see redτα βάφω μαύρα them-NE.PL.ACC paint-1.SG black-NE.PL.ACC be very sadto stand firm, to see redme las vi negras me the saw black I saw myself in trouble
ponerse negro put.self black to get/become irritated
poner verde put green to criticise (someone)zuriak eta beltzak aditu white and black hear to hear all sorts of thingsvoir rouge to see red to be very angryostati svoj to stay one's own to be consistentvedere nero to see blackblauw zien van de kou to be blue/perished with the cold
zwartrijden black drive to take a ride without a ticketzrobić swojeto do one's ownto do what one is supposed to dopensar grande to think biga vedea roșu to see red
a o face lată to CL.ACC make wideto partynarediti svojeto do one's ownto do what one is supposed to doбити зелен biti zelen to be greento be young, unexperienced - Adverb: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
изваждам наяве take out in the open to uncover
хващам натясно catch in a tight place to coerce, to pressureφέρω βαρέωςfero vareos bring heavily to resentto get wellcaer bien fall well to be liked byalferrik galdu uselessly get-lost to ruin, to spoilκαλῶς εἶχενkalо̄s eikhen beautifully have.IMPF.3SG he was welldobroproći to go well to be successfulfare passi avanti to_make steps forward to make progressbeter worden to get wellchcieć dobrze to want wellto have good intentions
robić komuś dobrze to do someone.DAT wellto please someone
źle/marnie skończyć badly finishto come to a bad endcair bem fall well to be appropriatea se face bine to himself make well to get well
a face bine to make well to helpobrniti se na bolje to turn for better to be better, iti predaleč to go to far to demand to much or to do something inappropriateдобро доћи dobro doći to come wellto be useful
боље рећи bolje reći to say better to say in other words, more precisely - Pronoun: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
мързи ме (it feels) lazy me.ACC to be lazyτα καταφέρνωta kataferno them achieve to make it
την πατάωtin patao her step-on to failto make itjugársela play.self.it to risk itelkar hartu each-other take to get on with somebody, to agreesuarekin jolasean ibili with-fire playing be to play with firele faire it make to be enough/successfulfarcela to make it to managehet maken it make to be successfulNo example found in Polishdá-lhe João! give to him/her, João! show them what you got, João!a o coti CL.ACC.F.3SG turn to turnwith the non-anaphoric feminine clitic 'o' functioning as an expletiveimeti ga pod kapo to have him under one's hat to be drunk, mahniti jo to hit her to start going (somewhere)n.a. - Verb with lexicalized dependents including fully lexicalized clauses: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
السيف العذل سبق The sword preceded the blamesaid when someone do something without thinking and regret itне мога да кажа две думи на кръст cannot say two words on a cross to not be able to speak or express oneself
правя сам да си говори make someone talk to himself to drive someone crazyανοίγω τον ασκό του Αιόλουopen the bag of Aeolus open the bag of Aeolus to open the floodgates
και οι τοίχοι έχουν αυτιάke i tichi echun aftia and walls have ears everyone might be listeningto make ends meet, to know on which side the bread is buttered
hacer de tripas corazón make of intestines heart to pluck up the courage
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a riden.a.okretati se kako vjetar puše to turn how the wind blows to be inconsistentsbarcare il lunario to_land the living to make ends meet
non avere peli sulla lingua do not have hair on the tongue to be outspokenlachen als een boer die kiespijn heeft laughing on the other side of his/her face/mouthwiedzieć, co w trawie piszczy to know what in the grass squeaks to know what is going on, to be well informedvedeti, koliko je ura to know what the time it is to realize the truthзнати у ком грму лежи зец знати у ком грму лежи зец I know in which bush the rabbit lies to know what is going on, to be well informed - Other: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
- They are formed by a verb v and a (single or compound) noun n,
which either directly depends on v (and possibly contains a case marker or a postposition), or is introduced
by a preposition.
In case of Hindi, the noun can be replaced by an adjective which is morphologically identical to an eventive noun. If you annotate Hindi, everywhere is this page when the noun is referred to, you should read the noun or the adjective.إتخذ إجراء make action → verb+direct object noun
قام بزيارة make a visit → verb+prepositional-object noun
أدى التحية العسكرية do the military salutesalute →verb+ composed nounвземам решение to make a decision
държа под контрол to keep under controlzum Einsatz kommen to the use come to be called into action
eine Rede halten a speech hold to give a speech(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)παίρνω μία απόφαση perno mia apofasi make a decision to decide verb + direct-object noun
δίνω στα νεύραdino sta nevra give to-the nerves cause to be nervous verb + prepositional-object noun
έχω στην κατοχή μουecho stin katochi mu have.1SG to-the possession my to possess verb + prepositional-object nounto give a lecture → verb + direct-object noun
to come into bloom → verb + prepositional-object noun
to make a high five → verb + compound nounhacer una promesa make a promise to make a promise
poner en peligro put in danger endanger, jeopardise→ verb + prepositional-object noun
tener dolor de cabeza have pain of head to have a headache → verb + compound nounlan egin work do to work, aurrera egin front-to do to go aheadfaire une présentation make a presentation → verb + direct-object noun
procéder à une analyse proceed to an analysis to make an analysis → verb + prepositional-object noun
faire un faux pas make a faux-pas → verb + compound nounἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punishstupiti na snagu step into force come into force
držati predavanje to hold a speech to give a speechchiamare in causa to_call in cause to single out
fare una passeggiata to_make a walk to have a walkeen toespraak houden a speech holdto give a speech→ verb + direct-object noun
in bloei staan in bloom stand to be in bloom→ verb + prepositional-object nounodnieść sukces carry-away success to be successful
mieć wyrzuty sumienia to have reproaches of conscience to blame oneself
wykonać rzut karny to perform a penalty kickfazer um aborto to make an abortion → verb + direct-object noun
estar com fome be with hunger to be hungry → verb + prepositional-object noun
fazer uma mesa redonda make a table round to have a round table (discussion) → verb + compound nouna duce dorul to carry yearning.the to miss somebody
a da divorț to give divorce to divorce
a da în clocot to give in boil to come to the boil
a da în fiert to give in boil to come to the boilbiti v dvomih to be in doubts → verb + prepositional-object noun, to doubt
imeti predavanje to give a lecture → verb + direct-object nounдати на знање dati na znanje give on knowledge to inform
поднети жалбу podneti žalbu to submit an appeal to file a complaint - The (single or compound) noun n is predicative and refers to an event (e.g. decision,
visit) or a state (e.g. fear, courage). Predicative nouns are nouns that have semantic arguments, that is, they express predicates whose meaning is
only fully specified by their semantic arguments:
قرار أخذ make a decision →noun refers to an event , there are 2 argument : a decider and decision
كلمةألقى to give a word → noun refers to an event , there are 2 arguments : the talker and the speechвземам решение to make a decision → noun refers to an act or event
давам съгласие to give permission → noun refers to an act or event
имам притеснения to have concerns → noun refers to a feeling or state
имам готовност to be ready → noun refers to a feeling or stateeine Entscheidung treffen to make a decision → noun refers to an event
Angst habento have fear→ noun refers to a state(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W)παίρνω μία απόφασηperno mia apofasi take decision to decide → noun refers to an event
κάνω βόλταkano volta make walk to walk → noun refers to an event
έχω αγωνίαecho agonia have anxiety to be anxious → noun refers to a state
κάνω κουράγιοkano kuragio make courage to be courageous → noun refers to a stateto make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
to pay a visit → noun refers to an event, there are 2 arguments: a visitor and a visited place/person
to have fear→ noun refers to a state, there are 2 arguments: somebody who is afraid and something frightening
to have courage → noun refers to a state, there is 1 argument: the courageous persondar un consejo give an advise to give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
tener valor to have courage→ noun refers to a state, there is 1 argument: the courageous personnegar egin cry do to cry → noun refers to an act or event
lo egin sleep do to sleep → noun refers to a statedonner un conseil give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
avoir du courage to have courage→ noun refers to a state, there is 1 argument: the courageous personμου εἰς τὴν γνώμην εἰσῄειmou eis tēn gnо̄mēn eisēei I.GEN into the opinion.ACC come.into.IMPF.3sg it came to my mind noun refers to a state
ἐξέτασιν ποιέομαιexetasin poieomai inspection.ACC do.1SG I inspect noun refers to an eventdonijeti odluku to bring a decision to make a decision → noun refers to an event
imati osjećajto have feeling→ noun refers to a statefare una domanda → noun refers to an event
avere paura, avere coraggio → noun refers to a stateeen beslissing nemen to make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
moed hebben to have courage→ noun refers to a state, there is 1 argument: the courageous personprowadzić rozmowy to lead conversations to lead negotiations→ the noun refers to an event
mieć rację to have rightto be right→ the noun refers to a statefazer uma prece to make a prayer → noun refers to an event, there are 2 arguments: the prayer and the thing she/he prays for
ter sintomas to have symptoms → noun refers to a state, there are two arguments: the person having symptoms and the disease causing these symptomsa lua o decizie to make a decision, a face o vizită to pay a visit→ noun refers to an event
a avea curaj → noun refers to a statebiti v dvomih to be in doubts to have doubts → noun refers to a state
imeti predavanje to give a lecture → noun refers to an eventkam frikë
kam kurajëдонети одлуку doneti odluku to bring a decision to make a decision (to decide) → the noun refers to an event
имати право imati pravo to have rightto be right→ the noun refers to a state - We retain two sub-categories of verbs, which define two sub-categories of LVCs:
- The verb v is "light" in that it contributes to the meaning of the whole only by bearing
morphological features: person, number, tense, mood, as well as morphological
aspect. This implies that v's syntactic subject
is n's semantic argument. In this case, we annotate the construction as LVC.full.
نصيحةأسدى to weave an advice to give advice
تاريخالصنع fabricate the history to make history
إستراتيجية ال وضع put a strategy to make a strategyдавам изявление give a statement to make a statement
нанасям щети spread damages to cause damages(OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)κάνω μία παρουσίασηkano mia parusiasi make presentation to present
κάνω επίσκεψηkano episkepsi make visit to pay a visit, to visit
παίρνω απόφασηperno apofasi take decision to decideto make a presentation
to pay a visit
to have rights
to have a headache
to carry out a destructiondar un paseo give a walk to go for a walk
tener valor to have courage
tener dolor de cabeza have pain of head to have a headachefaire une présentation to make a presentation
faire une visite to make a visit
avoir le droit to have the right
avoir un mal de tête to have a headacheἐλπίδα / ἐλπίδας ἔχωelpida / elpidas ekhо̄ hope.SG / hope.PL have.1SG I have hope(s)napraviti pogrešku to make a mistakefare una presentazione to make a presentation
fare una visita to make a visit
avere il diritto to have the right
avere un mal di testa to have a headacheeen presentatie geven to give a presentation
een bezoek brengen to make a visit
onder stress staan under stress stand to be stressedodnieść sukces carry-away success to be successful
mieć rację to have rightto be right
cierpieć na anemię to suffer from anemiarealizar uma apresentação to make a presentation
fazer uma visita to make a visit
ter um direito to have a right
ter dor de cabeça have pain of head to have a headachea face o prezentareto make a presentation
a face o vizită to pay a visitimeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion, biti v pomoč to be in help to be helpful, delati razlike to make differences to differentiatejap një shfaqje
kam dhimbje kokeвршити претрес vršiti pretres to do a search to conduct a search
имати право imati pravo to have rightto be right - The verb v is "causative" in that it indicates that the subject of v is the cause or
source of the event or state expressed by n. In other words, the noun has semantic arguments expressed as
non-subject elements in the sentence, and the subject of the verb brings an additional information, indicating
the cause of source of the event/state. In this case, we annotate the construction as
LVC.cause. These constructions are expected to be less idiomatic than other VMWEs and can be
understood as complex predicates with a causal support verb.
حربالأعلن to declare war
حقوق أعطى to give rights
أملأعطىto give hopeдавам възможност to give an opportunity
нося късмет to bring luck(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)δίνω ικανοποίησηdino ikanopiisi give satisfaction to satisfy
προκαλώ καταστροφήcause distruction
δίνω χαράdino chara give joy to make happyto grant rights
to give a headache
to provoke a reactiondar derecho to grant the right
dar vértigo give vértigo to make dizzy
causar un accidente to provoke an accidentdonner le droit to grant the right
donner le vertige give the vertigo to make dizzy
provoquer un accident to provoke an accidentἐλπίδα / ἐλπίδας παρέχωelpida / elpidas parekhо̄ hope.SG / hope.PL give.1SG I make hope(s)dati mogućnost to give an opportunitydare il diritto to grant the right
dare le vertigini to_give the vertigo to make dizzy
causare un incidente to provoke an accidentrechten verlenen to grant the right
een ongeluk veroorzaken to provoke an accidentto sprawia nam kłopot this causes us trouble
nakłada obowiązek na użytkowników put a duty on the users
dać prawo to give the rightto grant the right
narazić na straty expose to losses
stawiać komuś celto put an aim to someone to set a goal to someonedar o direito to grant the right
dar tontura give vertigo to make dizzy
provocar um acidente to provoke an accidenta da dureri de cap to give pains of head to give a headachedati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)provokoj një debat
bëj aksidentизнети мишљење izneti mišljenje to take out one's opinion to state one's opinion
задати главобољу zadati glavobolju to cause a headacheto give a headache
- The verb v is "light" in that it contributes to the meaning of the whole only by bearing
morphological features: person, number, tense, mood, as well as morphological
aspect. This implies that v's syntactic subject
is n's semantic argument. In this case, we annotate the construction as LVC.full.
- Apply test LVC.0 - [N-ABS: Is the noun
abstract?]
- It is not an LVC, exit
- Apply test LVC.1 - [N-PRED: Is
the noun predicative?]
- It is not an LVC, exit
- Apply test LVC.2 - [V-SUBJ-N-ARG:
Is the subject of the verb a semantic argument of the noun?]
- Apply test LVC.3 - [V-LIGHT:
The verb only adds meaning expressed as morphological features?]
- It is not an LVC, exit
- Apply test LVC.4 - [V-REDUC:
Can a verbless NP-reduction refer to the same event/state?]
- It is not an LVC, exit
- It is an LVC.full
- Apply test LVC.5 - [V-SUBJ-N-CAUSE:
Is the subject of the verb the cause of the noun?]
- It is not an LVC, exit
- It is an LVC.cause
- Apply test LVC.3 - [V-LIGHT:
The verb only adds meaning expressed as morphological features?]
- continue to next test
... قرار decision ، علم science ، أمل hope ، إجتماع meetingпроблем problem, възможност opportunity, изявление statement, план plan(OEG) 𓈖𓂋𓃭𓅱 nr.w fear fear (PT § 302c-d, W)απουσίαapusia absence
θυμόςthimos anger
αγάπηaγαpi love
δυσκολίαδiskolia difficulty
υπόσχεσηiposchesi promise
παρουσίασηparusiasi presentation
εμφάνισηemfanisi appearancepriority, anger, love, opinion, difficulty, speech, presentation, birthpaseo walk, derecho right, ilusión excitement, fe faith, duelo griefpas step, édition edition, discours speech, explication explanation, lute fightὀργή orgē anger anger
τιμωρίαtimо̄ria punishment punishment
πίστιςpistis trust trustproblem problem, mogućnost opportunity, ideja ideapriorità priority, rabbia anger, amore love, opinione opinion, difficultà difficulty, discorso discourse, presentazione presentation,所有possession, 検討examination, 名誉会長honorary chairmanliefde love, mening opinion, strijd fightkłopot problem, wysokość height, praca work, prawo right, zysk profitprioridade priority, festa party, fé faith, nascimento birth, distinção distinction, problema problem, gol goal (soccer)răspuns answer, prezentare presentationdvom doubt, mnenje opinion, ime name, vloga role, odločitev decisiondëshirë, mendim, vështirësi, fjalim, përparësi, zemërimмишљење mišljenje opinion, претрес pretres search, побуна pobuna rebellion, одлука odluka decision - it is not an LVC
طاولة table، ورقة paper، شخص person ، يد handправя торта to make a cake → a cake is a physical entity (not abstract)
давам пари to give money → money is a physical entity (not abstract)
подавам ръка to give out handto help in a difficult situation → hand is a physical entity (not abstract)(OEG) 𓊹 nčr god god (PT 460a-b, W)καρέκλα karekla chair , τραπέζι trapezi table , χέρι cheri hand , άνθρωπος anθropos humanchair, keyboard, hand, personmesa table, silla chair, mano hand, foto picture,aulki, teklatu, esku, pertsonachaise chair, clavier keyboard, main hand, personne personπαῖςpais child child
οἶκοςoikos house house
ἀγορά agora market square market squarestol table, ruka hand, kruna crownsedia chair, tastiera keyboard, mano hand, persona person家house, 車car, 家族familystoel chair, hand hand, persoon personzłożyć kartkę to fold a sheet→ a sheet is a physical entity (not abstract)
złożyć broń to lay down arms→ arms is a physical entity (not abstract)
bić pianę to beat foamto exaggerate about a problem→ foam is a physical entity (not abstract)
wystawić fakturę to issue a bill→ a bill is a physical entity (not abstract)
mieć brata to have a brother→ a brother is a physical entity (not abstract)cadeira chair, teclado keyboard, mão hand, pessoa person, pedra rockscaun chair, pian pianooseba person, mačka cat, kapa hat, avtomobil car, roka handkarrige, tastierë, dorë, njeriизнети јело izneti jelo to take out a dish→ a dish is a physical entity (not abstract) - continue to next test
إجتماععقد tie a meeting to lead a meeting→ event with 2 arguments the meeting and the person that organize the meeting
حوار أجرىmake a dicussion→ event with 2 argument the discussion athe the person who contribute the discussionпоставям акцент to emphasize → event, with two arguments: the agent and the object being emphasized
имам право → property, with one semantic argument: the possessor of the propertyeinen Besuch abstatten to pay a visit → event, with two arguments: the visitor and the visitee
Angst haben to have fear → property with one semantic argument: the entity having fear
einen Blick auf etwas werfen a glance at sth. throw to take a glance at sth → an event with two arguments the entity glancing and the entity glanced at(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → property with one semantic argument: the entities having fear.κάνω μία επίσκεψη kano mia episkepsi to-make a visit pay a visit, visit → event, with two arguments: the visitor and the visitee
έχω τη δυνατότητα echo ti δinatotita have.1SG the ability to be able → property, with two core semantic arguments: the entity having the ability and the object of the ability
έχω μίσος echo misos have hate to hate → state, with two arguments: the entity being in the state of hating and the entity hated
βγάζω λόγο vγazo loγo take-out.1SG speech to make a speech → event, with one obligatory argument: the entity making the speech
παίρνω απόφασηperno apofasi take decision to decide event, with two arguments: the entity taking the decision and the decisionpay a visit → event, with two arguments: the visitor and the visitee
have strength → property, with one semantic argument: the entity having strength
take a glance at something → event, with two arguments: the entity glancing and the entity glanced at
make a contribution → event, with two arguments: the contributor and the beneficiary (notice that contribution could refer to both the event and the thing being contributed, but we always prefer the former reading when possible)hacer una visita make a visit to pay a visit → event, with two arguments: the visitor and the visitee
tener valor to have courage → property, with one semantic argument: the entity having courage
echar un vistazo a algo give a glance to something to take a quick look at something → event, with two arguments: the entity glancing and the entity glanced atbisita egin visit do to pay a visit → event with two arguments: the visitor and the visitee
itxaropena ukan hope have to hope, to have hope → event with one single argument: the person who hopesavoir du courage to have courage→ state(property), with one argument: the entity having courageπροσέχω τὸν νοῦνprosekhо̄ ton noun hold.to.1SG the thought I pay attention (to sth/sb) → an event with two arguments the entity paying attention and the entity paid attention to
ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry → property with one semantic argument: the entity being angryimati osjećaj to have a feeling → property with one semantic argument: the entity having feeling
otići u posjet to go to a visit to someone to pay a visit → event, with two arguments: the visitor and the visiteefare una visita → event, with two arguments: the visitor and the visitee
avere forza → property, with one semantic argument: the entity having strength
dare uno sguardo a qualcosa → event, with two arguments: the entity glancing and the entity glanced at評価するevaluation.makeevaluate
評価を得るevaluation.acc obtainobtain an evaluationeen bezoek brengen to pay a visit → event, with two arguments: the visitor and the visiteezłożyć wizytę to submit a visitto pay a visit→ event, with two arguments: the visitor and the visitee
złożyć skargę to submit a complaintto make a complaint → event, with two arguments: the complaining person and the one he/she complains about
mieć prawo to have the right→ state, with two arguments: the person having the right and the thing (s)he has the right to
budzić zastrzeżenia to wake-up reservations to provoke reservations → state, with two arguments: the person having reservations and the object of the reservationster fome to have hunger to be hungry → property, with one argument: the entity that is hungry
ter idade para fazer algo to have age (to do something) to be old enough (to do something) → state, with one argument: the entity that is old enough
In PT, we consider that the following classes of predicative nouns pass the test: diseases (gripe, trombose, infarto), physical sensations (fome, sede, sono), emotions (medo, paixão, nojo), cognitive entities internal to the cognizer (ideia, opinião, preocupação), characteristics (coragem, teimosia, fraqueza), relations (contato, conflito, amizade) and nouns expressing communication or speech acts (conversa, discussão, briga, conselho).a face o vizită to make a visit to pay a visit → event, with one argument: the entity that visits
a avea curaj to have courage → property, with one semantic argument: the entity having courageimeti predavanje to give a lecture → event, with two arguments: a lecturer and the people who are attending the lecturejap një kontribut
kam fuqi
i hedh një shikimподнети жалбу podneti žalbu to submit an appeal to file a complaint → event, with two arguments: the complaining person and the one he/she complains about
имати право imati pravo to have the right → state, with two arguments: the person having the right and the thing (s)he has the right to - it is not an LVC
كتابه أحمد أعطى gave Ahmed his book Ahmed gave his book → the nounكتاب is a physical entity that does not pass test LVC.0, even though أحمد could be considered its semantic argument
إعصارًا أحمدشهد Ahmed experienced a tornado→ the noun إعصارًا tornado is an event, but has no semantic argumentsИван хвърли боклука Ivan threw out the garbage → physical entity (not event/state)Joe macht einen Kuchen→physical entity (not event/state), even though Joe could be considered a semantic argument(OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire (PT 376b, W)Ο Γιάννης παίρνει τα ρούχα τουO Yanis perni ta rucha tu The John take.3SG the clothes his → the noun is a physical entity (not event/state) that does not pass test LVC.0
Ο Γιάννης έχει ωραίο σπίτιO Γianis echi oreo spiti The John has nice house → the noun is a physical entity (not event/state) that does not pass test LVC.0Joe makes a cake → the noun is a physical entity that does not pass test LVC.0, even though Joe could be considered its semantic argument
Joe experienced a tornado → the noun is an event, but has no semantic arguments
Joe has a lot of money → the noun is abstract and Joe could be considered its semantic argument, but we consider that money (as well as other goods such as car and bananas) can exist independently of a possessor, so the possessor (owner) should not be considered as semantic argument of moneyAna tiene una bicicleta Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
Ana hace una foto Ana takes a picture → noun is not abstract, so it does not pass test LVC.0pastela egin cake make to make a cake> → physical entity (not event/state)Anna a un vélo Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
Anna affronte la tempête Anna faces the storm → noun is abstract but has no argumentsἔχει δύναμιν καὶ πεζὴν καὶ ἱππικην καὶ ναυτικήνekhei dunamin kai pezēn kai hippikēn kai nautikēn have.3SG force.ACC and on.foot.ACC and on.horseback.ACC and naval.ACC he has an (army force) on foot, on horseback, and at sea → the noun is a physical entity (not event/state)Ivan ima olovku Ivan has a pencil → noun is not abstract, so it does not pass test LVC.0Joe fa un dolce → physical entity (not event/state), even though Joe could be considered its semantic argument
Joe ha vissuto un tornado → event, but has no semantic argumentJan maakt een taart→physical entity (not event/state), even though Jan could be considered a semantic argumentprzetrwać burzę to survive a storm → burza storm has no semantic arguments although it is abstractquebrar a cabeça to break one's head to rack one's brain → physical entity, does not pass test LVC.0
In PT, we consider that the following classes of abstract nouns do not pass this test: informational content that do not require agents (informações, notícias), natural phenomena (chuva, neve, tornado).Joe a făcut o prăjiturăJoe made a cake → physical entity (not event/state), even though Joe could be considered its semantic argumentJanez ima avto → the person that has a car could be considered as a semantic argument, but the car is not an event or a stateJoe bën një ëmbëlsirë
Joe ka shumë paraпреживети земљотрес preživeti zemljotres to survive the earthquake → земљотрес zemljotres earthquake has no semantic erguments although it is abstract - continue to next test
- Go to test LVC.5
- continue to next test
- it is not an LVC
- annotate as LVC.full
دورا يلعب أحمد Ahmed plays a role → دور أحمد Ahmed's role
تحقيق أحمد ب قام Ahmed made an inquiry → تحقيق أحمد Ahmed's inquiryИван пое отговорност Ivan took responsibility → отговорността на Иван — both refer to the same property/event
Иван взе решение Ivan made a decision → решението на Иван — both refer to the same property/eventPaul hat eine Rede gehalten Paul has given a speech → Paul's speech both refer to the same speech event
Ich habe ihm einen Besuch abgestattet I have paid him a visit → mein Besuchmy visit both refer to the same visiting event(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W) → (*) nr.w ⸗f m ꞽb ⸗śn His fear (is) in their hearts — both refer to the same fearing event.Ο Γιάννης έκανε μία παρουσίασηO Yanis ekane mia parusiasi John made a presentation John's presentation --> both refer to the same presenting event
Η Μαρία έδωσε μία υπόσχεσηI Maria edose mia iposchesi Maria gave a promise Maria promised Η υπόσχεση της Μαρίας --> --> both refer to the same promising eventPaul had a walk → Paul's walk — both refer to the same walking event
I paid him a visit → my visit to him — both refer to the same visiting event
Hester gave birth to Pearl → Pearl's birth to Hester — both refer to the same birthing event (note that the key criterion is that Hester, the subject of the verb, is a (prepositional) dependent of birth in the paraphrase)
The party gave priority to senior members → the priority of senior members for the party — both refer to the same prioritization eventPedro dio un paseo Pedro gave a walk Pedro took a walk → el paseo de Pedro Pedro's walk— both refer to the same walking event
El capitán da la orden de partir The captain gives the order to leave The general orders to leave → la orden del capitán de partir The general's order to leavePellok bisita egin zidan → Pelloren bisita -- both refer to the same visiting eventPaul a fait une enquête Paul made an inquiry → L'enquête de Paul Paul's inquiry
Paul procède à une perquisition Paul makes a search→ La perquisition de/par Paul the search of/by Paul
Le général donne l'ordre de partir The general gives the order to leave The general orders to leave → l'ordre du général de partir The general's order to leave
Les soldats reçoivent l'ordre de partir The soldiers receive the order to leave The soldiers are ordered to leave→ l'ordre aux soldats de partir The order to the soldiers to leave
Jean souffre de troubles psychiques John suffers from psychic troubles → Les troubles psychiques de Jean John's psychic troubles
Jean présente une hypersensibilité John presents a hypersensibility John has a hypersensibility→ l'hypersensibilité de Jean John's hypersensibility
Paul reçoit des menaces de (la part de) Pierre Paul receives threats from (the part of) Peter Paul is threatened by Peter → les menaces de Pierre à Paul Peter's threats to Paul
Ce médicament présente un risque This medicine presents a risk This medicine poses a risk → le risque de ce médicamentthis medicine's risk
Ce fait attire l'attention de la justice This fact attracts the attention of the justice → l'attention de la justice pour/sur ce fait the attention of the justice on/about this factΚῦρος ἐξέτασιν ποιεῖταιKuros exetasin poieitai Cyrus inspection.ACC do.3SG Cyrus inspected → ἐξέτασιν (τοῦ Κύρου) refers to the same eventIstraživač je donio zaključak The researcher made a conclusion → njegov zaključak his conclusion both refer to the same eventPaolo ha fatto una conquistaPaul made a conquer→ la conquista di Paolo
Il generale da l' ordinedi partire. The general gives the order to leaveThe general orders to leave → L'ordine di/da parte del generale di partire
Paolo riceve delle minacce da (parte di) Piero → le minacce di Piero a PaoloPaul heeft een toespraak gehouden Paul has given a speech → Paul's toespraak both refer to the same speech eventObecni oddali hołd poległym The present gave-back tribute to the fallen The audience payed tribute to the fallen→ hołd obecnych the tribute of the audience
Jan miał na myśli Marię Jan had on thought Maria Jan meant Maria→ myśl JanaJan's thought
Jan otrzymał wymówienieJan received a dismissal→ wymówienie dla Jana dismissal for Jan
Inwestycja przynosi zyski the investment brings profit→ zyski z inwestycji profit from the investmentJoão cometeu um deslize → o deslize do João — both refer to the same event
O jogador cobrou um pênalti the player charged a penalty kick the player took a penalty kick → o pênalti do jogador the player's penalty kick — both refer to the same event
João tem consciência do perigo John has conscience of the danger John is aware of the danger → a consciência do João sobre o perigo John's awareness of the danger — both refer to the same state
João recebeu a remuneração John received the remuneration → a remuneração do João John's remuneration — both refer to the same event
O paciente recebeu a visita dos familiares The patient received the visit of the relatives → a visita dos familiares ao paciente the visit of the relatives to the patient — both refer to the same event
João apresenta lesões John presents lesions → as lesões do João John's lesions — both refer to the same statePaul a făcut o plimbarePaul had a walk → plimbarea lui Paul Paul's walk — both refer to the same walking event
i-am făcut o vizită I paid him a visit → vizita mea — both refer to the same visiting eventimeti dvome to have doubts to doubt → imeti have adds no meaning to dvomi doubts besides that of having a property
delati razlike to make differences to differentiate → delati in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to eventПрофесор држи предавање Profesor drži predavanje The professor is holding a lecture→ професорово предавање profesorovo predavanje The professor's lecture
Овај лек представља ризик Ovaj lek predstavlja rizik this drug presents a risk this drug poses a risk → ризик од овог лека rizik od ovog leka risk of this drug this drug's risk - it is not an LVC
في عام 2001 النور رأى أفاد التقرير بأن برنامج الصحة The report states that the Health Programme saw the light in 2001 The report states that the Health Programme began with its current components in 2001 → نور برنامج الصحة# the light of health program — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP ( نور برنامج الصحة the light of health program ) fails to refer to the original event ( رأى برنامج الصحة النور ) the health program saw the light ( started )Иван хвърли поглед на вестника Ivan threw a glance at the newspaper → #погледът на Иван върху вестника — different semantics; and requires a different prepositionPaul hat einen guten Eindruck gemachtPaul has made a good impression → #Paul's Eindruck auf seine Freunde Paul's impression on his friends has a different semantics(OEG) 𓂧𓈖 𓃹𓈖𓇋𓋴 𓌴𓐙𓂝𓏏 (w)ṭ.n Wnꞽś mꜣꜥ.t Unas set Right Unas set Right (PT 265c, W) → (*) mꜣꜥ.t Wnꞽś 'Unas's Right' fails to refer to the original event (Unas set Right).ο Παύλος πήρε νέα από τον αδερφό του O Pavlos pire nea apo ton aδerfo tu The Paul take.3PST news from his brother → #Τα νέα του Παύλου από τον αδερφό του Paul's news from his brother -- one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (τα νέα του Παύλου) fails to refer to the original event (Ο Παύλος πήρε νέα)Paul got news from his brother → #Paul's news from his brother — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Paul's news) fails to refer to the original event (Paul got news)Juan recibió la noticia de su hermano Juan got the news from his brother → #La noticia de Juan — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (la noticia de Juan) fails to refer to the original event (Juan recibió una noticia)Hizlariak interesa piztu zuen Speaker interest switched-on The speaker awakened interest → #Hizlariaren interesa, #the speaker's interest -- different semanticsSon comportement porte une atteinte grave à l'honneur des soldats His behaviour seriously jeopardises the soldiers' honnour → #l'atteinte de son comportement the jeopardy of his behaviourἡ γυνὴ πίστιν ἔλαβεhē gunē pistin elabe the woman assurance get.AOR.3SG the woman got an assurance → πίστις τῆς γυναικός ‘the woman’s assurance’ fails to refer to the original event (the woman got an assurance)Petar je dobio poruku od direktora Petar received message from his boss → #Petar's news from his boss — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Petar's message) fails to refer to the original event (Petar received message)Paul kreeg nieuws van zijn broerPaul got news from his brother → #Pauls nieuws van zijn broer — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Pauls nieuws) fails to refer to the original event (Paul kreeg nieuws)Michael Phelps pobił rekord sprzed 2 tysięcy latMichael Phelps broke the record from 2 thousand years ago→ #Michael Phelps' record
Ulica nosi imię sławnego poety The street carries the forename of a famous poet The street carries the name of a famous poet.→ imię ulicy the forename of the street
Adam jest tego samego zdania Adam is of the same opinion Adam has the same opinion → #zdanie Adama Adam's opinion refers to the contents of his opinion, not to the fact of having an opinionO jogador cobrou uma falta the player charged a foul the player took a free kick → a falta do jogador the player's foul — the focus changes from taking a free kick to being one of the parts involved in a foul (it's a VID)
O jogador provocou uma lesão the player provoked a lesion → a lesão do jogador the player's lesion — In the reduced NP, the focus changes from hurting somebody else to getting hurt
O músico apresenta suas composições the musician presents his compositions → as composições do músico the musician's compositions — the reduced NP does not keep the sense of presenting, it is not refer to the same event as the verbal constructionPaul a făcut o impresie bunăPaul made a good impression → #Impresia lui Paul despre soția sa Paul's impression on his wife— different semanticsto začeti predavanje to begin a lecture → začeti to begin adds an aspectual meaning to the nounБранко је оборио рекорд у трци на 100 метара Branko je oborio rekord u trci na 100 metara Branko broke the record in 100m race→ #Бранков рекорд #Brankov rekord - annotate as LVC.cause
- it is not an LVC
- verbs that are typically used to express the cause of predicative nouns in general (e.g. cause, provoke), or
- verbs that are only used to express the cause of particular predicative nouns (e.g. grant in to grant a right).
- verbs which encode a manner of causation:
to call a meeting entails communication to schedule the meeting
to hold a meeting entails leadership
to organize classes entails preparationσυνἠγαγεν ἐκκλησίανsunēgagen ekklēsian lead.together.3SG meeting.ACC he held a meeting entails leadership - verbs which encode modality:
to allow dialogue entails permission
to foster dialogue entails assistance
to require dialogue entails necessity - aspectual verbs whose subject is a semantic argument of the noun:
αρχίσαμε τη συζήτησηarchisame ti syzitisi we started the conversation
τελειώσαμε τη συζήτησηteliosame ti sizitisi finished.01.PST the conversation We finished the conversationwe started the meeting
we ended the meeting
we continued the meetingἄρχειν τοῦ λόγουarkhein tou logou start the speech to begin speakingwe begonnen de vergadering we started the meeting - some LVCs have no derivationally-related equivalents, such as to have a flu, to have faith and to commit a crime;
- some constructions that are not LVCs do have a derivationally-related equivalent such as to write an email and to email;
- some LVCs have derivationally-related equivalents that do not mean the same as the LVC, such as to make a face and to face, or that have different argumental structure from the LVC, such as to have a problem and to be problematic.
- Subject
- Direct object
- Circumstantial or adverbial complement
- Reflexive clitic or particle: the VMWE is either an IRV (reflexive pronoun) or a VPC (particle), never a VID.
- Verb with no lexicalized dependent: fine-grained tests need to be applied in order to discriminate between a MVC and a VID. See the section on Structural tests.
- Extended nominal phrase: fine-grained tests need to be applied in order to discriminate between an LVC and a VID. See the section on Structural tests.
- Adjectival phrase
- Verb with lexicalized dependents
- Relative clause
- Non-reflexive pronoun
- Apply test VID.1 - [CRAN: Candidate contains cranberry word?]
- It is a VID, exit.
- Apply test VID.2 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.4 - [MORPHSYNT: Regular morphosyntactic change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.5 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- It is not a VID, exit
- it is a VID
- further tests are required
- it is a VID
- further tests are required
- it is a VID
- further tests are required
- it is a VID
- further tests are required
- it is a VID
- It is not a VID, exit
- Inherently reflexive ⇒ ANNOTATE as IRV
- The verb without the RCLI does not exist
усмихвам се to smile, страхувам се to be afraidstydět se to be ashamed, divit se to wondersich schämen to be ashamed, sich wundern to wonder(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).suicidarse to suicide, abstenerse to abstainn.a.s'évanouir to faint, se suicider to suicidesuicidarsi to suicide, arrabbiarsi to get angryzich schamen to be ashamed, zich vergissen to be mistakendowiedzieć się to find out, bać się to be afraidqueixar-se to complain, abster-se to abstaina se teme to be afraid with obligatory ACC reflexive clitic
a își însuși to appropriate with obligatory DAT reflexive cliticsramovati se to be ashamed, bati se to be afraidстидети се stideti se to be ashamed,
бојати се bojati se to be afraidatt försova sig to sleep in
att gifta sig to get married - The verb without the RCLI does exist, but has a very different meaning
смея ≠ смея се to dare ≠ to smile, намирам ≠ намирам се to find ≠ to be situatedsich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle(OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).to find oneself in a difficult situation
to to help oneself to the cookiesrecoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insistn.a.s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to actriferire ≠ riferirsi to report, tell ≠ to referzich aanstellen ≠ aanstellen to put on airs, to act ≠ to appoint, zich begeven ≠ begeven to proceed ≠ to break down, zich realiseren ≠ realiseren to realise (achieve) ≠ to realise (be aware)znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manageencontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refera se îndura ≠ a îndura to have the heart ≠ to suffer
a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IRV (it passes Test15)dati se it is possible (to do something) ≠ dati to give, dobiti se to meet ≠ dobiti to getгубити ≠ губити се gubiti ≠ gubiti se to lose ≠ to pass outatt känna sig ledsen/arg to feel sad/angry ≠ to touch
- The verb without the RCLI does not exist
- Reciprocal ⇒ NOT ANNOTATED
- The RCLI has a sense of mutually:
целувам се to kiss each other, срещам се to meet each otherlíbat se to kiss each other, potkávat se to meet each othersich küssen to kiss each other, sich treffen to meet each otherbesarse to kiss each other, verse to see each othern.a.s'embrasser to kiss each other, se rencontrer to meet each otherbaciarsi to kiss each othercałować się to kiss each other, spotykać się to meet each othercumprimentar-se to greet each other, ver-se to see each othera se saluta to greet each otherpoljubljati se to kiss each other, srečati se to meet each otherпољубити се poljubiti se to kiss,
срести се sresti se to meet
- The RCLI has a sense of mutually:
- Reflexive ⇒ NOT ANNOTATED
- The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
мия се to wash oneself, реша се to combe oneselfmýt se to wash oneself, drbat se to scratch oneselfsich waschen to wash oneself, sich kratzen to scratch oneself(OEG) 𓇋𓅱 𓈖𓐩𓈖 𓇓𓅱 𓃹𓈖𓇋𓋴 ꞽw nč̣.n św Wnꞽś Unas (Wnꞽś) has-protected (nč̣.n) himself (św). Unas has protected himself. (PT 290c, W)mirarse to look at oneself, vestirse to dress oneselfn.a.se laver to wash oneself, se parler to talk to oneselflavarsi to wash oneself, vestirsi to dress oneselfzich wassen to wash oneself, zich scheren to shave oneselfmyć się to wash oneself, drapać się po głowie to scratch oneself on the headapressar-se to hurry oneself, vestir-se to dress oneselfa se spăla to wash oneselfumivati se to wash oneself, praskati se to scratch oneselfумивати се umivati se to wash one's face,
чешати се češati se to scratch oneselfatt tvätta sig to wash oneself
- The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
- Body part, also called possessive reflexive ⇒ NOT ANNOTATED
- Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
мия си ръцете wash REFL.POSSESSIVE hands wash one's handsmýt si nohy wash RCLI.DAT the feet wash one's feetsich das Bein brechen RCLI the leg break break one's leg(OEG) 𓂜 𓂻𓅱𓈖 𓇋𓋴 𓃹𓈖𓇋𓋴 𓆓𓋴 𓆑 nꞽ ꞽw.n ꞽś Wnꞽś č̣ś ⸗f Indeed (ꞽś), Unas (Wnꞽś), his (⸗f) body (č̣ś), cannot-come (nꞽ ꞽw.n). Indeed, Unas himself cannot come. (PT 333b, W)rascarse el brazo scratch.RCLI the arm scratch one's armn.a.se gratter la tête RCLI scratch the head scratch one's headgrattarsi la testa RCLI scratch the head scratch one's headmyć sobie nogi wash RCLI.DAT the feet wash one's feetimpossible, uses possessive insteada-şi rupe mâna RCLI.DAT break arm break one's armumivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's armсломити си ногу to break RCLI the foot slomiti si nogu to break one own's leg,
умити си лице umiti si lice to wash RCLI the face to was one own's face
- Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
- Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
- The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
- Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
книги се пишат трудно books write.PL RCLI difficult it is difficult to write booksdie Häuser verkaufen sich gut the houses sell RCLI well the houses sell welllas casas se venden bien the houses RCLI sell well the houses sell welln.a.les pots se vendent bien the pots RCLI sell well the pots sell wellle case si affittano the houses RCLI rent the houses are renteddomy dobrze się sprzedają houses sell.PL RCLI well houses sell wellas casas se vendem bem the houses RCLI sell well the houses sell wellcasele se vând bine houses-the RCLI sell well houses sell wellhiše se dobro prodajajo the houses sell RCLI well the houses sell wellземља се добро продаје zemlja se dobro prodaje the land RCLI well sell the land's selling well
- Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
- In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
трудно се пишат книги difficult RCLI write.PL books it is difficult to write booksse alquilan casas RCLI rent houses people rent housesn.a.si affittano case RCLI rent houses people rent housesdobrze sprzedają się te domy well sell RCLI these houses these houses sell wellalugam-se casas rent-RCLI houses people rent housesse vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
se construiesc locuințe noi RCLI built houses new new houses are builtnove hiše se gradijo new houses RCLI built new houses are builtдобро се продаје ова роба dobro se prodaje ova roba well RCLI sell these goods these goods are selling well
- In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
- Impersonal ⇒ NOT ANNOTATED
- The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
- There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
- The verb is in third person singular, even when the object is plural
не се вечеря късно not RCLI have dinner late it is not good to have dinner latehier tanzt es sich gut here dances it RCLI well people dance well herese busca a actores RCLI searches to actors people look for actors
se trabaja mejor aquí RCLI works better here people work better heren.a.il se dit des bêtises it RCLI says silly things people say silly thingssi lavora troppo RCLI works too much people work too much
si affitta molte case RCLI rents many houses people rent many housesza dużo się pracuje too much RCLI works people work too much
bzdury się opowiada nonsense RCLI tells people tell nonsensedorme-se muito sleeps-RCLI much people sleep a lot
conta-se histórias tells-RCLI stories people tell storiesse lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
se aleargă dimineața RCLI run in the morning people run in the morninggovori se/govorijo se neumnosti it says/they say RCLI silly things people say silly thingsради се превише radi se previše it works RCLI too much there's too much work being done,
говоре се глупости govore se gluposti they say RCLI nonsense nonsense is being said
- Inchoative ⇒ NOT ANNOTATED
- Similar to middle, but the RCLI marks a less productive syntactic alternation:
- the direct object of the transitive version appears as subject of the REFLV
- the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
вратата се отваря the door opensdveře se otvírají the door opensdie Tür öffnet sich the door opensla puerta se abrió the door openedn.a.la porte s'est subitement ouverte the door suddenly openedla porta si apre the door opensdrzwi się otwierają the door openso vaso se quebrou the vase brokemașina s-a stricat the car broke down
ușa s-a deschis the door openedvrata se odpirajo the door opensврата се отварају vrata se otvaraju the doors are openingdörren öppnar sig the door opens
- Similar to middle, but the RCLI marks a less productive syntactic alternation:
- Apply test IRV.1 - [INHERENT]
- Annotate as IRV
- Apply test IRV.2 - [DIFF-SENSE]
- Annotate as IRV
- Apply test IRV.3 - [DIFF-SUBCAT]
- Annotate as IRV
-
- verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
- It is not a VMWE, exit
- Annotate as IRV
- verb has a subject ⇒ Apply test IRV.5 - [MIDDLE-INCHO]
- It is not a VMWE, exit
- Apply test IRV.6 - [REFL]
- It is not a VMWE, exit
-
- subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
- It is not a VMWE, exit
- Annotate as IRV
- subject is PLURAL ⇒ Apply test IRV.8 - [RECIPRO]
- It is not a VMWE, exit
- Annotate as IRV
- subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
- verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
- annotate as IRV
страхувам се ⇒ *страхувам to be afraid
усмихвам се ⇒ *усмихвам to smilesich schämen ⇒ *schämen to be ashamed
sich wundern ⇒ *wundern to wonder(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).suicidarse ⇒ *suicidar to suicide
abstenerse ⇒ *abstener to abstainn.a.s'évanouir ⇒ *évanouir to faint
se suicider ⇒ *suicider to suicidesuicidarsi ⇒ *suicidare to suicidezich schamen ⇒ *schamen to be ashamed
zich vergissen ⇒ *vergissen to be mistakendowiedzieć się ⇒ *dowiedzieć to find out
bać się ⇒ *bać to be afraid
wydarzyć się ⇒ *wydarzyć to happenqueixar-se ⇒ *queixar to complain
abster-se ⇒ *abster to abstaina se teme ⇒ *a teme to be afraid
a își însuși ⇒ *a însuși to appropriatesramovati se ⇒ *sramovati to be ashamed
čuditi se ⇒ *čuditi to wonderбавити се ⇒ *бавити baviti se ⇒ *baviti to deal with,
дивити се ⇒ *дивити diviti se ⇒ *diviti to admire - next test
- annotate as IRV
намирам се ≠ намирам to be situated ≠ to find
радвам се≠ радвам to feel happy ≠ to make happysich verstehen ≠ verstehen to get along well ≠ to understand(OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).to find oneself in a difficult situation
to to help oneself to the cookiesrecogerse ≠ recoger to go home ≠ to pick up, to gathern.a.s'apercevoir ≠ apercevoir to realize ≠ to see
s'agir ≠ agir to be ≠ to actriferirsi ≠ riferire to refer ≠ to report, to tellzich voordoen ≠ voordoen to arise ≠ to showznajdować się ≠ znajdować to find oneself ≠ to be
sprawdzić się≠ sprawdzić to prove appropriate ≠ to check
wybrać się≠ wybrać to go ≠ to chooseencontrar-se ≠ encontrar to be ≠ to meet
referir-se ≠ referir to concern ≠ to refera se îndura ≠ a îndura to have the heart to ≠ to sufferrazumeti se ≠ razumeti to get along well ≠ to understandзнати ≠ знати се znati ≠ znati se to know ≠ to know someone,
забављати ≠ забабљати се zabavljati ≠ zabavljati se to amuse someone else ≠ to amuse oneself to amuse someone ≠ to date someone - next test
- annotate as IRV
X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses YX se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Yn.a.X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to VinfX si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot YX verwondde zich aan Y ⇔ X verwondde Y X wounded/injured RCLI to Y ⇔ X wounded/injured Y
X toonde zich ADJ ⇔ X toonde NOUN X showed RCLI ADJ ⇔ X showed NOUN ?? elle se trouve grosse want se trouver hier zelfde betekenis als trouverX tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
X dziwi się Y.dat ⇔ Y dziwi X ⇔ Z dziwi X Y.inst X surprises SELF Y.dat ⇔ Y surprises X ⇔ Z surprises X Z.instX se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot YX se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that YА се објаснио с Б ⇔ А је објаснио Б A se objasnio s B A resolved the issues with B ⇔ A explained something to B - next test
- do NOT annotate as verbal MWE
не се вечеря късно ⇔ хората не вечерят късно not RCLI have dinner late it is not good to have dinner latehier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well herese duerme mucho ⇔ las personas duermen mucho people sleep a lot
se busca a actores ⇔ la gente busca a actores people look for actorsn.a.il se dit des bêtises ⇔ les personnes disent des bêtises people say silly thingssi dorme molto ⇔ le persone dormono molto people sleep a lot
si affitta molte case ⇔ le persone affittano molte case people rent many housespracuje się za dużo ⇔ ludzie pracują za dużo people work too much
opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsensedorme-se muito ⇔ as pessoas dormem muito people sleep a lot
conta-se histórias ⇔ as pessoas contam histórias people tell storiesse lucrează până târziu ⇔ lumea lucrează până târziu people work until late
se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morninggovorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsenseради се превише. ⇔ људи раде превише. radi se previše. ⇔ ljudi rade previše. there's too much work being done ⇔ people are working too much. - annotate as IRV
- do NOT annotate as verbal MWE
някой отваря вратата ⇒ вратата се отваря somebody opens the door ⇒ the door opensman kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opensla gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door openedn.a.on vend bien ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opensqualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opensktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
ktoś nasila skargi ⇒ skargi nasilają się somebody increases complaints ⇒ complaints increase
ktoś rozgrywa mecz ⇒ mecz rozgrywa się somebody plays a game ⇒ the game playsalguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ tell.PL-RCLI stories somebody tells stories ⇒ stories are told
alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy ⇒ the boy RCLI calmedsomebody calmed the boy down ⇒ the boy calmed down
o juiz casou João com Maria ⇒ João se casou com Maria the judge married João with Maria ⇒ João RCLI married with Maria the judge married João with Maria ⇒ João got married to Maria
o juiz casou Maria e João ⇒ Maria e João se casaram the judge married Maria and João ⇒ Maria and João RCLI married the judge married Maria and João ⇒ Maria and João got married
alguém lembrou João do meu aniversário ⇒ João se lembrou do meu aniversário somebody reminded João of my birthday ⇒ João RCLI reminded of my birthday somebody reminded João of my birthday ⇒ João remembered my birthdaycineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door openednekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door openedнеко је отварао врата ⇒ врата се отварају neko je otvarao vrata ⇒ vrata se otvaraju someone was opening the doors ⇒ the doors were being opened,
неко шири гласине ⇒ галасине се шире neko širi glasine ⇒ glasine se šire someone's spreading the rumors ⇒ the rumors are being spread - next test
- do NOT annotate as verbal MWE
Павел лекува себе си ⇒ Павел се лекува Pavel heals himselfPaul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himselfPaul washes only himself ⇒ Paul washes himselfPablo se lava a sí mismo ⇒ Pablo se lava Paul washes himselfn.a.Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himselfPaolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himselfPaul wast alleen zichzelf ⇒ Paul wast zich(zelf) Paul washes himselfPaweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
Paweł bogaci tylko siebie ⇒ Paweł bogaci się Paul enriches himself Paul gets rich
Paweł myje tylko siebie ⇒ Paweł myje się Paul washes himselfPaulo só lava a si mesmo ⇒ Paulo se lava Paul washes himselfPaul se spală doar pe sine ⇒ Paul se spală. Paul washes himselfPavel praska sam sebe ⇒ Pavel se praska Paul scratches himselfМарко лечи сам себе ==> Марко се лечи Marko leči sam sebe ==> Marko se leči Marko is treating himself ==> Marko is getting treated - next test
- The subject is singular: test REFL-MUTUAL
- The subject is plural or coordinated (Bob and Alice): test RECIPRO
- do NOT annotate as verbal MWE
Павел се мие ⇔ те се мият един друг they wash each otherPaul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each otherPablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each othern.a.Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each otherPaolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each otherPaul wast zich ⇔ Zij wassen elkaar they wash each otherPaweł się myje ⇔ oni myją się nawzajem they wash each otherPaulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each otherel se spală ⇔ ei se spală unul pe altul they wash each otherPavel se umiva ⇔ umivajo drug drugega they wash each otherМарко се забавља ⇔ они један другог забављају Marko se zabavlja ⇔ oni jedan drugog zabavljaju Marko is amusing himself ⇔ they are amusing one another
- annotate as IRV
- Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
- Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
- do NOT annotate as verbal MWE
Павел и Елена се целуват ⇔ Павел целува Елена и Елена целува Павел Pavel and Elena kissPaul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each otherPablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
los niños se abrazan ⇔ los niños abrazan a los niños the children hug each othern.a.Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
les jours se suivent ⇔ les jours suivent les jours the days follow each otherGiovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altroPaweł i Elena całują się ⇔ Paweł całuje Elenę i Elena całuje Pawła, Paweł i Elena całują się nawzajem Paweł kisses Elena and Elena kisses Paweł, Paweł and Elena kissJoão e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each otherIon şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
participanții se salută ⇔ participanții îi salută pe participanți the participants greet each otherPavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each otherМ и Н су се пољубили ⇔ М је пољубио Н и Н је пољубила М M i N su se poljubili ⇔ M je poljubio N i N je poljubila M M and N kissed ⇔ M kissed N and N kissed M - annotate as IRV
- In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
- s'abstenir to abstain
- In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
- enamorarse to fall in love
- alzarsi to get up
- In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
- queixar-se-ia would complain
- In Romanian the clitic and the verb are either separate or have a hyphen between them:
-
se aude un clopot RCLI hears a bell a bell is heard
s-aude un clopot RCLI-hears a bell a bell is heard
-
se aude un clopot RCLI hears a bell a bell is heard
- If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the VID:
пилците се броят наесен chicken REFL are counted in the autumn the true results can be seen only at the end ⇒ кокошките се броят the hens REFL countedsich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun Ndarse cuenta de to realize ⇒ *darse N de for any noun N
meterse en líos to get in trouble ⇒ REFLV not annotated in literal equivalents like meterse en una tienda to get in a storen.a.se rendre compte de to realize ⇒ *se rendre N de for any noun N
s'arracher les cheveux RCLI tear the hair worry ⇒ REFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nailrendersi conto di to realize ⇒ *si rende N di for any noun N
si strappa i capelli RCLI tear the hair to worry ⇒ REFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nailzich uit de voeten maken RCLI out of the feet make to get out of the way ⇒ *zich uit de N maken for any noun N
zich in de kijker spelen RCLI in the field-glass play to attract attention with one's skills ⇒ *zich in de N spelen for any noun Nzdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun Ndar-se mal to fail ⇒ dar-se ADV intransitive is acceptable only for antonym bem well
meter-se numa fria to get-RCLI in a cold to get in trouble ⇒ REFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabina-și smulge părul din cappuliti si lase tear RCLI the hair to worry ⇒ REFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrowsкитити се туђим перјем kititi se tuđim perjem decorate RCLI someone else's feathers steal someone's thunder; take credit for someone else's accomplishments - If the REFLV would be annotated as IRV in syntactically comparable literal constructions, annotate both the IRV and the VID as embedded MWEs (rare):
смея се през сълзи laugh REFL through tears to laugh bitterlyn.a.rozlatywać się w proch scatter itself into dust disappearvirar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/becomea i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IRV; both the IRV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of a VID (6.4_R)
a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to gorežati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IRVсмејати се као луд smejati se kao lud to laugh like crazy - They are formed by a lexicalized head verb v and a lexicalized particle p dependent on v.
- The meaning of the VPC is fully or partly non-compositional.
- In fully non-compositional VPC (VPC.full) the change in the meaning of v goes
significantly beyond adding the meaning of p:
n.a.die Fische sind eingegangen the fish went in the fish diedto do in to kill, destroy, cheat or harm severelyn.a.rondkomen round-come to make ends meetn.a.n.a.
- In semi-non-compositional VPCs (VPC.semi), p adds a partly predictable but non-spatial meaning to v
n.a.to eat up to eat completelyn.a.opeten to eat completelyn.a.n.a. - In fully non-compositional VPC (VPC.full) the change in the meaning of v goes
significantly beyond adding the meaning of p:
- For intransitive verbs, the particle can occur without an NP. The fact that there is no NP that could be governed by the particle to form a PP shows that it is a particle rather than a preposition.
- For transitive verbs, the particle can occur either before or after the direct object. The fact that it is mobile and can go before or after the NP shows that it is a particle rather than a preposition
- Apply test VPC.1 - [PART-REDUC: Can the verb without the particle refer to the same event?]
- It is a VPC.full.
- Apply test VPC.2 - [PART-SPATIAL: Is the particle spatial?]
- It is not a VPC, exit
- Apply test VPC.3 - [PART-SPATIAL-LIT: Is the particle spatial in a literal reading?]
- It is a VPC.semi
- It is not a VPC, exit
- It is a VPC.full.
- Go to the next test.
- It is not a VPC, exit.
- Go to the next test
- It is not a VPC, exit.
- It is a VPC.semi.
- They usually have the same subject.
- They usually denote actions that are closely connected and may be seen as part of the same event.
- They function together as a single predicate.
- They are unaccompanied by any explicit coordination, subordination, or dependency marker.
- They only have a single tense, aspect and polarity value.
- They may be idiomatic or indicate successions of events.
- The V-gov (vector) verb is semantically delexicalized and the V-dep (polar) verb contains the core meaning of the whole. Note that V-dep might be seen as the head and V-gov as the dependent, in dependency frameworks such as Universal Dependencies, where the principle of the primacy of content words is applied.
- Apply Test MVC.1.BASE - [MVC-STRUCT-BASE: V-dep is non finite and V-gov bears inflection?]
- It is not a VMWE, exit
- Apply Test MVC.3.KAR - [INS-REDIRECT-KAR: kar or ke appears just after V-dep?]
- Apply Test MVC.6 - [MANNER: V-gov indicates the manner/means/direction of V-dep?]
- It is a manner serial verb, not a VMWE, exit
- Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep?]
- It is a reason serial verb, not a VMWE, exit
- Apply Test MVC.8 - [SEQ: V-gov and V-dep bound by temporal sequence?]
- It is a temporal sequence serial verb, not a VMWE, exit
- Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
- It is a serial verb expressing simultaneous actions, not a VMWE, exit
- Continue to the next test
- Apply Test MVC.10 - [LIGHT: V-gov in the
closed list of light verbs?]
- Annotate as MVC
- Apply Test MVC.13 - [V-LEX: V-dep refers to the same event/state as V-gov+V-dep?]
- It is not a VMWE, exit
- Annotate as an MVC
- Apply Test MVC.2.ASPECT - [INS-DISCARD-ASP: V-gov can take un aspect marker –le or –guo?]
- It is not a MVC, exit
- Apply Test MVC.5 - [MODAL: V-gov is a modal or an auxiliary verb?]
- It is not a MVC, exit
- Apply Test MVC.6 - [MANNER: V-gov indicates
the manner/means/direction of V-dep (or vise versa)?]
- It is not a MVC, exit
- Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep (or vise versa)?]
- It is not a MVC, exit
- Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
- It is not a MVC, exit
- Apply Test MVC.4 - [SHARE-ARGS: V-gov and V-dep share arguments?]
- Annotate as an MVC
- It is not a MVC, exit
- Apply directly Test MVC.13 - [COMP: V-dep refers to the same event/state as V-gov+V-dep?]
- It is not a VMWE, exit
- Annotate as an MVC
- continue to the next test
- it is not an MVC
- continue to the next test
- it is not a MVC
- it is NOT an MVC
我wǒ I看出来 kànchūlái figure out→ 我看wǒkàn I see了le aspect marker出来 chūlái exit→ The insertion of the aspect marker 了 le aspect markeris grammatically sound
- continue to next test
我wǒ I听说tīngshuō heard → *我听wǒtīng I heard了 le aspect marker说 shuō say→ The insertion of the aspect marker 了 le aspect marker leads to ungrammaticality in the phrase
- it is NOT an MVC
- continue to next test
- it is NOT an MVC
- continue to next test
- it is a purpose serial verb, not an MVC
n.a.Saya Ibersiap pergi get ready to go= SayaI bersiap untuk pergi get ready for the purpose of going→ The insertion of untuk for/to is grammatically sound and does not change the meaning of the sentence.n.a.n.a.
- continue to next test
- it is NOT an MVC, but an honorific construction.
n.a.n.a.お-話し-する o-hanasi-suru I humbly talkn.a.n.a.
- continue to next test
- Go directly to test MVC.6 [MANNER].
- Go directly to test MVC.10 [LIGHT].
- it is an MVC
- it is not an MVC
n.a.
- it is NOT an MVC
n.a.n.a.可以 kéyǐcan, 可能 kěnéngmight, 会 huìwill, 必须 bìxūmust, 需要 xūyàoneed to, 要 yàowant to, 能 néngable to, 应该 yīng gāishould
- continue to next test
- it is a manner serial verb, not an MVC
n.a.n.a.us-ne ciikh-kar mujh-e bulaa-yaa He-erg yell-ConjPpl I-dative call-perf he called me by screamingpulang melalui return-home pass-through go home by passing through (a place)投げ込み nage komi throw go in throw into
なぐり殺し naguri korosi punch kill kill by punching走进来 zǒu jìnláiwalk enter walk into (a place) - continue to next test
- it is a reason serial verb, not an MVC
n.a.n.a.vo melaa jaa-kar khush hu-aa he fair go-ConjPpl happy become-perf he got happy having gone to the fairn.a.
- continue to next test
- it is a sequential serial verb, not an MVC
n.a.n.a.us-ne gilaas banaa-kar bec-aa he-erg glass make-ConjPpl sell-perf having made the glass, he sold itbersiap pergi prepare go prepare in order to go (somewhere) → the first verb must happen before the second verb happens, otherwise the sentence will not make sense.夫人が最初にfujin ga saisho ni the wife first叩き起こさtataki okosa hit to awakenれre verb suffix != #夫人が最初にfujin ga saisho ni the wife first起き叩さtataki okosa hit to awakenれ re verb suffix→ The two verbs 叩き tataki hitand 起こさ okosa awakenare bound by temporal sequence, such that if the order is switched, the sentence does not make sense.n.a.
- continue to next test
- it is a serial verb expressing simultaneous actions, not an MVC
n.a.n.a.berlari menuju run head-towards run and go towardsn.a.
- continue to next test
- it is a (light) MVC
n.a.n.a.
- continue to next test
- it is a preposition-like MVC
n.a.n.a.n.a.排列成 páiliè chéng arrange become arrange into (something)
- continue to next test
- it is a deverbalized V1/V2 MVC
n.a.(JA) 響き渡る hibiki wataru echo spread-widely reverberate → The first verb is a noun-like argument of the second verb [deverbalized V2]
聞き違え kiki chigae listen be-different mishear/misunderstand → The second verb is a noun-like argument of the first verb [deverbalized V1]n.a.n.a. - continue to next test
- it is an MVC
- it is not an MVC
it will make me think → it will make me build/solve/constructquiero leer tu tesis want.I read your thesis I want to read your thesis → quiero adquirir/descargar/imprimir tu tesis want.I acquire/download/print your thesis I want to get/download/print your thesisje l'ai laissé finir la présentation I him have let finish the presentation I let him finish the presentation → je l'ai laissé commencer/lancer/interrompre la présentation I him have let start/launch/interrupr the presentation
ce garçon veut dire autre chose this boy wants say other thing this boy wants to say something else → ce garçon veut chuchoter/communiquer/crier autre chose this boy wants whisper/communicate/scream another thingik heb mijn trui laten wassen I had my sweater washed→ ik heb mijn trui laten strijken/verven/maken I had my sweater ironed/dyed/repaireddał jej pospać he let her sleep→ dał jej odpocząć/poleżeć he let her rest/layn.a. - the dependents of the adposition are not lexicalized
разчитам на някого/нещо to rely on somebody/something is annotated as IAV because the object is not lexicalised,
but in the ID вземам на мушка някого/нещо take on target to critisise heavily somebody/something cannot be annotated as IAV because мушка is also lexicalized in the IDto stand for something is annotated as IAV because the object is not lexicalized,
but in the ID to take something for granted, to take for cannot be annotated as IAV because granted is also lexicalized in the IDentender de algo understand of somethingto know about something is annotated as IAV because the object is not lexicalised, whereas entender algo would not be any type of VMWE.n.a.pristati na kaj to land on (something) to agree (with something)is annotated as IAV because the object is not lexicalized,
but in the ID ostati na trdnih tleh to remain on solid ground to remain realistic ostati na to remain on cannot be annotated as IAV because trdnih tleh solid ground is also lexicalized in the ID - the adposition is integral, that is, "it cannot be omitted without markedly altering the meaning of the verb"
في رغب want to he has a desire to do something → رغب في * can occur without the preposition في * in , but it will never have a sense of رغب فيсчитам за to take for → *считам can never occur without the preposition за
разчитам на to rely on → разчитам can occur without the preposition, but it will never have a sense of to depend/rely onto rely on → *to rely can never occur without the preposition on
to count on → to count can occur without the preposition, but it will never have a sense of to depend/rely onentender de understand of somethingto know about something → entender to understandcan occur without the preposition, but it will never have a sense of to be an expert about something
contar con count withto rely on → contar to countcan occur without the preposition, but it will never have a sense of to rely on.n.a.grenzen aan → *grenzencan never occur without the preposition aan
behoren tot → behoren can occur without the preposition, but it will never have a sense of to belong totemeljiti na to be based on → *temeljiti can never occur without the preposition na
biti za to be for to agree with or support (something or someone)→ biti to be can occur without the preposition, but it will never have a sense of to agree with or to support - it is not an IAV
- annotate as an IAV
- a VMWE category may be universal or quasi-universal but it may require different tests in different languages,
- any category specific to a language must be associated with appropriate tests in the same language,
- universal tests can build upon more elementary language-specific tests (e.g. to distinguish a particle from a preposition).
- the candidate word is a particle
- go to the next test
- the candidate word is a particle
- it is not a VPC
- it is a particle
- other tests are needed
- it is a particle
- other tests are needed
- it is a particle
- it is not a particle
- we do not have to decide if it is an MWT (for the purpose of VMWE annotation)
- go to the next test
- it is an MWT
- go to the next test
- it is not an MWT
- it is an MWT
- Inherently clitic verbs ⇒ ANNOTATE as LS.ICV
- The verb without the CLI does not exist
infischiarsene (not worry about) vs *infischiare
- The verb without the CLI does exist, but has a very different meaning
darla (gl.: give it) (transl. fuck around) ≠ dare (give)
prenderle (gl.: take them) (transl. be beaten) ≠ prendere (take)
prenderci (gl.: take it) (transl. grasp the truth) ≠ prendere (take)
starci (gl.: stay there) (transl. agree) ≠ stare (stay) - The verb has more than one CLI of which the second one is an invariable object complement.
fregarsene (gl.: matter self of-it) (transl.don’t care about)
infischiarsene (transl. not worry about)
curarsene (gl.: take care self of-it) (transl. care about)
prendersela (gl.: take self it.FEM)(transl. be angry/upset)
sentirsela (gl.: feel self it.FEM) (transl. be in the mood of)
sentirselo (gl.: feel self it.MASC) (transl. feel)
vedersela (gl.: see self it.FEM)(transl. to manage something) - The verb has two non-reflexive invariable CLIs:
farcela (gl.: make there it.FEM) (transl. succeed)
- The verb has a different meaning with respect to an intensive use of the same two non-reflexive invariable CLIs:
andarsene (gl.: go away self from-there) (transl. die) ≠ andarsene (go away)
bersela (gl.: drink slef it.FEM) (transl. believe) ≠ bersela (drink)
- The verb without the CLI does not exist
- Apply test LS.ICV.1 - [CL-INHERENT]
- Annotate as LS.ICV
- Apply test LS.ICV.2 - [CL-DIFF-SENSE]
- Annotate as LS.ICV
- Apply test LS.ICV.3 - [CL-DIFF-SUBCAT]
- Annotate as LS.ICV
- Exit
- annotate as LS.ICV
infischiarsi ⇒ *infischiare
infischiarsene ⇒ *infischiare - next test
- annotate as LS.ICV
smetterla (gl.: quit it) (transl. knock it off) ≠ smettere (quit)
prenderle (gl.: take them) (transl. get beaten up) ≠ prendere (take)
prenderci (gl.: take it)(transl. grasp the truth) ≠ prendere (take)
starci (gl.: stay there)(transl. up for it) ≠ stare (stay)
curarsene (gl.: take care self of-it) (transl. care about) ≠ curare (take care)
prendersela (gl.: take self it.FEM)(transl. be angry/upset)≠ prendere (take)
sentirsela (gl.: feel slef it.FEM) (transl. be in the mood of) ≠ sentire (feel)
darla (gl.: give it.FEM) (transl. fuck around) ≠ dare (give) - next test
- annotate as LS.ICV
X se la prende con Y ⇔ X prende Y
- Exit
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the test IT.S.1 - [CLITICS-ONLY: Are all lexicalized dependents of the verb clitics?]
- Apply the LS.ICV-specific tests ⇒ LS.ICV tests positive?
- Annotate as a VMWE of category LS.ICV
- It is not a VMWE, exit
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Non-reflexive clitic ⇒ Apply LS.ICV-specific tests ⇒ LS.ICV tests positive?
- Annotate as a VMWE of category LS.ICV
- It is not a VMWE, exit
- Particle ⇒ Apply VPC-specific tests ⇒ VPC tests positive?
- Annotate as a VMWE of category VPC.full or VPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- apply LS.ICV tests
- next test
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Particle ⇒ Apply VPC-specific tests ⇒ VPC tests positive?
- Annotate as a VMWE of category VPC.full or VPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP or an adjective which is morphologically identical to an eventive noun ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- How to define an unexpected change in meaning?
- How to annotate lexicalized words which belong to contractions, compounds, and acronyms?
- How to annotate coordinated VMWEs sharing some components?
- How to annotate elliptical occurrences of VMWEs?
- How to annotate VMWEs that seem to belong to more than one category?
- How to annotate embedded VMWEs?
- Are existential expressions with there is/are considered VMWEs?
- How to categorize VMWEs which seem LVCs but do not pass all LVC tests?
- Why are verb+noun constructions with pure operator verbs (to commit, to make, to have etc.) considered LVCs?
- Does the IRV category include verbs with non-reflexive clitics?
- Should nominalizations of VMWEs be annotated?
- How to express hesitation between different VMWE categories?
- How can one decide what are the semantic arguments of a noun for borderline cases?
- How does one decide if a more or less frozen determiner is a lexicalized VMWE component?
- Should I annotate compound and serial verbs as VMWEs? Of which category?
- If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?
- In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?
- In the LVC decision tree, should I test that the noun keeps its original meaning?
Η Μαρία θα πάρει την απόφασή τηςi Maria tha pari tin apofasi tis the Maria wii take the decision hers Maria will decide
we are making a decision
γνώμην ἔσχεgnо̄mēn eskhe opinion.ACC have.AOR.3SG he had an opinion
eles haviam tomado uma decisão they had made a decision
Herzen die wir gebrochen haben hearts which we broke have hearts which we have broken
heart which we broke
no te imaginas la ilusión que le hizo not you imagine_you the excitement which to_him/to_her made_it you cannot imagine how excited he/she was by it
i cuori che abbiamo spezzato the hearts which we have broken hearts which we have broken
قرارالأخذ ربما تكرهني للمخاطرة و لكن كان علي you may hate me for taking risks but i have to make a decision
خروجا خرج he+exit exit he went out
ه نهضت تحقيق على المجتمع on the society achievement renaissance its The society must achieve its renaissance
броящ звезди who counts stars
неразбитите все още от него сърца the hearts not yet broken by him
вземайки това решение (while) making this decision
eine Entscheidung treffen a decision meet to make a decision
früher getroffene Entscheidungen earlier made decisions
παίρνοντας αποφάσεις pernontas apofasis taking decisions making decisions
κάνοντας πλάκαkanontas plaka making fun having fun
to break one's heart is easy
we avoid making decisions
heart breakingthe act of breaking hearts
decisions previously made
all hearts broken by him
breaking her heart
they passed a piece of watered-down legislation
solo te tiene que hacer ilusión only to_you has that to_make excitement you only need to be excited about it
las decisiones tomadas ayer son decisivas the decisions taken yesterday are decisive the decisions made yesterday are final
el trato hecho previamente será respetado the agreement made previously will_be observed the previously made agreement will be observed
erabaki asko hartu decision many take make many decisions
emandako pausoak given steps the steps (which were) taken
pauso-ematea step-give the step taking, the action of taking steps
les décisions prises hier sont bonnes the decisions taken yesterday are good the decisions that were made yesterday are good
les personnes subissant plusieurs opérations sont fragiles the people undergoing several surgical operations are fragile
γνώμην ἔχονταςgnо̄mēn ekhontas opinion.ACC have.PTC.ACT having an opinion
unaprijed donesena odluka the decision made in advance
decisioni prese precedentemente earlier made decisions
prendendo questa decisione by making this decision
een beslissing nemen to make a decision
eerder genomen beslissingen earlier made decisions
podejmowanie decyzji the making of a decisiondecision making
podejmujący trudne decyzje making hard decisions
eu evito tomar decisões precipitadas I avoid to-take decisions precipitated I avoid taking precipitated decisions
a decisão tomada ontem the decision made yesterday
a mulher tomando um banho the woman taking a shower
decizia recent luată the decision recently made
luând decizia making the decision
lomljenje src breaking (people's) hearts hurting (people's) feelings
nedavno zlomljeno srce recently broken heart
duke e marrë nëpër këmbë dikë while taking someone through feet to mistreat someone
доношење одлука donošenje odluka decision bringing decision making
έγιναν σημαντικές αλλαγές (middle alternation) were-made important changes important changes were made
las decisiones importantes se toman con calma (middle alternation) the decisions important SE_PARTICLE take with calm important decisions are made quietly
νόμον ἔθηκεnomon ethēke law put.AOR.ACT.3SG he established a law / legislated
le decisioni importanti si prendono con calma (middle alternation) the decisions important SE_PARTICLE take with calm important decisions are made quietly
decyzje nie podejmują się same decisions do not take SELF alone decisions are not taken on their own (middle alternation)
tomam-se decisões importantes aqui (middle alternation) take-SELF decisions important here important decisions are made here
vendimet e rëndësishme merren me qetësi (middle alternation) the decisions important take-themselves with calmness important decisions are made calmly
nie mieć cienia wątpliwości not to have a shadow of a doubt to have no doubt → apply tests to mieć wątpliwość have a doubt
Canonical form
For a given (candidate) VMWE occurrence, if its prototypical forms exist and keep the same meaning, these forms are called canonical.
a canonical form for нанасяйки тежки щети is нанасям щети
a canonical form for Wortbruch word-break a promise which has not been hold is Wort brechen to break the word not to hold a promise
a canonical form for making an impression on him is she makes an impression on him
a canonical form for making an impression on him is she makes an impression on him
δόξαν ἔχουσιdoxan ekhousi opinion.ACC have.3PL they hold an opinion (canonical form)
a canonical form for njegov piskrček je bil pristavljen k nečemu his little pot was added to someting to join something in order to profit from it is pristaviti svoj piskrček k nečemu to add (one's) pot (to something) to join something in order to profit from it
For some VMWEs, the only possible forms are not prototypical. For instance, some VMWEs appear in passive voice but never in active voice. If no prototypical form exists or does not preserve the meaning, the given occurrence is considered canonical itself.
κρέμομαι από μία κλωστήkremome apo mia klosti hung.1SG.PASS from a thread but not #με κρεμάνε από μία κλωστήme kremane apo mia klosti me.1SG.ACC hung.3PL from a thread
The linguistic tests for identification and categorization of VMWEs are always to be applied to a canonical form of the candidate VMWE. Note that, for brevity, many of the VMWE examples in these guidelines are given in their infinitive variants. Still, it is most often a canonical form that is implicitly meant.
Non-verbal variants (not annotated)
Expressions of the syntactic categories mentioned above are considered VMWEs only if they function as verb phrases (in prototypical forms) or nominal phrases (under meaning-preserving variants). Other kinds of variants are not considered VMWEs. This concerns nominalizations morphologically derived from verbs and describing a process, result, state, agent, etc.
удар в гърба a stab in the back
високо вдигната летва highly raised bar high bar
играч на карти card player
puesta a punto setting to point set-up
une mise à disposition the fact of making available
o tomador de decisão the decision-maker
We also do not annotate MWEs containing verbs but functioning as adverbials, adjectives or nominals that are not meaning-preserving variants:
разбира се (it) is understood of course
a run-down apartment
porte-feuille carry-sheets wallet
couru d'avance run in advance forgone conclusion
um faz-de-conta a make-as-story a make-believe
кафа за понети kafa za poneti coffee to take takeaway coffee
Particular language teams may decide to extend the annotation scope to these variants. It is recommended in this case to introduce a new category for them (e.g. NVPC: nominal verb-particle constructions) so as to keep the (quasi-)universal categories intact.
Section 1.5
Lexicalized components and open slots
Just like a regular verb, the head verb of a VMWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise.
Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone is not. The head verb of a VMWE is always considered lexicalized. When it can be replaced by another verb, like in to make/take a decision, we consider that these are two different VMWEs, although possibly synonymous.
Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:
Note on terminology: our definition of lexicalization applies to the component words of a VMWE, and not to the whole VMWE. This might be counter-intuitive, given the traditional definition of lexicalization as a diachronic process by which a lexeme (word or phrase) acquires the status of an autonomous lexical unit, that is, "a form which it could not have if it had arisen by the application of productive rules" (Bauer 1983, p. 50, apud Lipka et al. 2004, p. 6). In other words, traditionally linguistic studies would use the term "lexicalized" to refer to the whole VMWE, as it has idiosyncratic behavior and thus must be listed in the language's lexicon. Our definition, however, stems from computational linguistics and in particular from the parsing literature, in which lexicalized rules refer to rules containing terminal lexemes attached to non-terminal symbols, and a lexicalized grammar is a grammar in which the rules are lexicalized (Manning and Schütze 1999, p. 417; Jurafsky and Martin 2009, p. 507). In this sense, we regard VMWEs as syntactic subtrees in which some of the nodes are annotated with the corresponding terminal symbols that are always realized by the same lexeme (i.e. the lexicalized components) and others are non-terminal nodes that can be realized by any lexeme taken from a larger class (i.e. the open slots).
Special cases
Prepositions have a special status with respect to the notion of lexicalization. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE, i.e. it belongs to the valency properties of the verb. Selected prepositions were discarded in edition 1.0 of the guidelines, and are now re-introduced experimentally and optionally via the inherently adpositional verbs (IAV). If the language team decides to take them into account, they are to be considered in the post-annotation step (step 4), i.e. when all other categories have previously been identified and categorized in the given sentence.
Reflexive clitics in inherently reflexive verbs and possesive pronouns in verbal idioms also have a special lexicalization status (see also the note on more or less frozen determiners). In some languages, the same reflexive clitic or possesive pronoun is used regardless of the person and number, inflecting for case only:
намирам се find se.REFL to be (somewhere)
smiješ se laugh.2.SG self You laugh
smiju se laugh.3.PL self they laugh
znajdujesz się find.2.SG.PRES self you find yourself
znajdują się find.3.PL.PRES self they find themselves
pójdą na swoje they will go on ones's own they will establish their own household
pójdziemy na swoje we will go on ones's own we will establish our own household
smejiš se laugh.2.SG self You laugh
smejijo se laugh.3.PL self they laugh
радујеш се raduješ se look.2.SG.PRES forward to you look forward to
радује се raduje se look.3.SG.PRES forward to She/He looks forward to
In other languages, reflexive clitics and possesive pronouns agree with the subject and the verb:
ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
Τα παιδιά έκαναν την πλάκα τους Ta pedia ekanan tin plaka tus The kids made the fun their The kids had fun
tú te quejas you self.2.SG complain You complain
tu te trouves you self.2.SG find you find yourself
je vide mon sac I empty my bag I express my secret feelings
elle vide son sac she empties her bag she expresses her secret feelings
tu ti meravigli you self.2.SG woder you wonder
wij vergissen ons we are mistaken self.2.PL we are mistaken
tu te queixas you self.2.SG complain You complain
tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking
It this case, the clitic or the pronoun is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic and the possesive prounun is a unique lexeme (with lemma się, se, sich, etc. or swój, son, one's) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs and verbal idioms.
Section 1.6
Verbal multiword expressions versus collocations
Collocations are not considered VMWEs in this task and should not be annotated. However, the boundary between both categories is not always easy to define and should be handled with care.
We understand collocations as combinations of words whose idiosyncrasy is purely statistical. In other words, tokens in collocations tend to co-occur with each other more often than expected by chance, but they show no substantial orthographic, morphological, syntactic and (most notably) semantic idiosyncrasy. In this way we oppose MWEs to collocations.
Note that other authors understand collocations slightly differently. E.g. for Sag et al. (2002), collocations are any statistically significant cooccurrences, i.e. they include all forms of MWE. For Baldwin and Kim (2010), collocations form a proper subset of MWEs. According to (Melcuk, 2010), collocations are binary semantically compositional combinations of words subject to lexical selection constraints, i.e. they intersect with what is here understood as MWEs.
Some combinations happen to be very frequent and are perceived as "frozen":
كتاب إشترى buy a book
وجبة قدمserve a meal
the graphic shows
to take a bus
el gráfico muestra the graphic shows
coger el autobús to take the bus
galdera bati erantzun question one-to answer answer a question
autobusa hartu bus take to take the bus
il grafico mostra the graphic shows
prendere un bus to take a bus
entrar em cartaz enter into poster arrive in theaters (for a movie) (the MWE is em cartaz in poster in theaters, the verb just usually collocates with this MWE)
据 报道 according-to report according to what is reported
However, applying regular lexical alternations to them does not markedly impact their meaning.
فطور القدم serve a breakfast
جريدة إشترى buy a newspaper
παίρνω το τραίνοperno to treno take the train
el diagrama muestra the diagram shows
coger el tren to take the train
zalantza bati erantzun doubt one-to answer answer a doubt
trena hartu train take to take the train
il diagramma mostra the diagram shows
o recorde foi quebrado the record was broken
entrar/estar/permanecer/ficar/continuar/ter em cartaz enter/be/remain/stay/continue/have in poster
The difficulty of distinguishing collocations from VMWEs lies in the fact that lexical variability is relevant to some VMWEs:
имам твърда/дебела глава to have a thick head, to be stubborn and not listen to advice
darse/tomar una ducha give.self/take a shower take a shower
eskola/klasea eman class give to give a class →'eskola' and 'klasea' are synonyms in Basque
zamarznąć na kość/lód/sopel to freeze to bone/ice/icicle to freeze strongly
chutar o balde/pau da barraca to kick the bucket/the tent's stick to act irresponsibly
However, the extent of the vocabulary concerned by this variability is different for collocations and VMWEs. Namely, a head verb in a collocation usually selects a whole semantic class for each of its required arguments. For instance, the verb to take
Some Light-verb constructions (LVCs) and multiverb constructions (MVCs) belong to the gray zone between MWEs and collocations in the sense that some operator (light) verbs seem to select large classes of nouns, as in to make a speech/declaration/remark/etc. However, some studies (e.g. Bonial 2014) show that there is no such thing as truly productive light verbs (e.g. to give a look vs. to give a stare). Therefore, we do include LVCs and MVCs in our annotation scope.
Section 1.7
Verbal multiword expressions versus metaphor
Another phenomenon closely related to VMWEs is metaphor. According to (Shutova 2010), "a metaphor occurs when one concept is viewed in terms of the properties of the other. In other words it is based on similarity (presence of common characteristics) between two concepts".
Many VMWEs, especially idioms, are based on metaphors. For instance, to take the bull by the horns means to address a problem (the bull) starting with its most challenging aspect (the horns). To set the world on fire is to do something extraordinary and get the admiration (set on fire) of other people (the world), to put all one's eggs in one basket means to rely on one particular course of action (a basket) for success rather than giving oneself several possibilities.
However, verbal metaphors are not always VMWEs. Consider the newspaper title "simple steps to lift your dark cloud of stress", and the extract of a poem by Wordsworth, cited by Shutova: "and then my heart with pleasure fills, and dances with the daffodils". The metaphorical expressions to lift dark cloud of stress to relax and my heart ... dances with the daffodils I am happy are not semantically compositional. These expressions, however, were probably constructed for the needs of one article/poem only and are not sufficiently established in the common vocabulary to be considered VMWEs.
The distinction between MWEs and metaphors is a relatively unstudied and open question. There are few precise tests, other than statistical, which would allow human annotators to resolve it reliably. Gross (1982) gives some clues on the reproducibility and predictability of metaphors. It remains to be seen how heavily this problem will impact the annotation of texts selected for our shared task. We suggest that the annotators take notes of such cases and discuss them within their communities, both local and international.
Section 2
Textual annotation scope
In this annotation task, all occurrences of all syntactic types of VMWEs are to be annotated in the text.
We annotate, as integral parts of VMWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see VPC tests), while in (HU) döntést hoz valamiről
Similarly, auxiliairies and modals accompanying the main verb of a VMWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the VMWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.
Both continuous and discontinuous sequences of lexicalized components of VMWEs are annotated.
Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or verb-particle constructions. In this version of the guidelines, verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are re-introduced optionally and experimentally as via the inherently adpositional verbs (IAVs).
The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a VMWE or not. We do not annotate their internal syntactic structure. We do annotate, however, VMWEs embedded in other VMWEs. For instance, the VMWE to let the cat out of the bag contains the embedded VMWE let out and both are to be annotated as different VMWEs. Embeddings are discussed on each category's page, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).
Once identified in a text, VMWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single VMWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.
Section 3
Categories of verbal MWEs
In edition 1.1 of this task we distinguish the following categories of verbal MWEs:
We also introduce an optional experimental category which (if admitted by the given language) is to be considered in the post-annotation step:
излизам със становище come out with a statement
to rely on
mieć do czynienia z czymś to have to do with sth
odwieść kogoś od czegoś to dissuade someone from doing sth
In practice, to identify and categorize verbal MWEs during manual annotation, one must use the rigorous generic decision tree and the structural and category-specific cross-lingual tests provided.
For a summary of changes with respect to edition 1.0 of the guidelines, see the what's new file.
Section 4
Annotation process and decision tree
We propose the following methodology for VMWE annotation:
The decision tree below indicates the order in which tests should be applied in step 3. The decision trees are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.
Generic decision tree
If you are annotating Italian or Hindi, go to the Italian-specific decision tree or Hindi-specific decision tree. For all other languages follow the tree below.
Section 5
Specific tests for categorizing verbal MWEs
Once a candidate VMWE has been pre-identified in steps 1 and 2 of the annotation process, the confirmation of its status as a VMWE, as well as its categorization, is done according to the decision tree referring to the following cross-lingual tests:
Additionally, language-specific categories (LS) can be defined and tests for them can be used to annotate them in a given language or language group only.
Section 5.1
Structural tests (S)
Structural tests are quite simple preliminary tests that help determining the syntactic structure of the VMWE. This is required in order to point at the right category-specific identification tests. In practice, annotators will rarely need them since they will already have an intuition about the VMWE candidate category when they identify it.
Test S.1 - [HEAD] - Syntactic head
Does the candidate contain a unique verb functioning as the functional syntactic head of the whole?
The aim of this test is to categorize (as VID or no VMWE) those candidates which have no single clearly identified head verb. This is necessary because all other tests refer to the single head verb v and its dependents. Note that for VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.
δόξαν ἣν ἔνιοι ἔχουσι περὶdoxan hēn enioi ekhousi peri opinion.ACC which some have.3PL about the opinion which some hold about is a variant and passes the test
Test S.2 - [1DEP] - Single dependent
Does the VMWE contain exactly one lexicalized (functional) syntactic dependent d of the head verb v?
The test covers only lexicalized dependents. There may be other, non-lexicalized dependents, which the test ignores. We explicitly call the non-verbal elements dependents instead of arguments or complements because argument-adjunct distinction is irrelevant. The outcome of the test is positive if the verb has a single lexicalized dependent, which can be the subject, the direct or indirect object, but also an adverbial complement, adverb, particle, relative clause, etc.
Test S.3 - [LEX-SUBJ] - Lexicalized subject
Is the single lexicalized (functional) syntactic dependent d of the head verb v its subject?
This test captures the fact that VMWEs with lexicalized subjects always belong to the VID category. Note that for the VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.
Test S.4 - [CATEG] - Category of the dependent
What is the morphosyntactic category of the (functional) dependent d that co-occurs with the head verb v?
радвам се feel joy myself.REFL to feel joy
I found myself in a difficult situation
a se holba to stare with obligatory ACC reflexive clitic
откравити се otkraviti se to melt SELFto relax, to cheer up
ich schlage vor I propose
ik stel voor I propose
The aim of this test is to determine which category-specific identification tests should be applied. Note that for the VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.
Section 5.2
Light verb constructions (LVC)
Light verb constructions (LVC) constitute a universal category. We retain the following key characteristics:
The following decision tree should be applied to decide whether a candidate should be annotated as a LVC.full, LVC.cause or none.
LVC-specific decision tree:
Note: test 10 [N-SEM] from the previous version of the guidelines (1.0) was considered unnecessary and has been abandoned in the current version of the guidelines.
Note: LVC tests are often hard to apply. If you hesitate at some intermediary test, continue to the next one, since the last tests of LVC.full and LVC.cause will help you reach your final decision.
Test LVC.0 - [N-ABS] Noun is abstract
Is the noun n abstract?
Some concrete nouns may be predicative (test LVC.1). For instance, a relational noun such as daughter is semantically incomplete without its argument: daughter of X, so daughter is predicative. However, concrete predicative nouns should not pass test LVC.0.
Some nouns may have both concrete and abstract interpretations. For instance, money is concrete when it refers to banknotes (paper money, bills): I didn't have money so I paid by credit card. However, money is abstract when referring to a conventional value used in transactions between people: He spent a lot of money in the mall. If one cannot be sure that the noun is used in its concrete interpretation, test LVC.0 passes.
Test LVC.1 - [N-PRED] Noun is predicative
Does the noun n have at least one semantic argument, implying that it is a predicative noun?
We only retain nouns n that have at least one semantic argument, which we define as a semantically mandatory and specific participant of the event or state expressed by the predicative noun.
Sometimes, it might be useful to consider verbs and adjectives derivationally related to the noun to reason about its semantic arguments.
Test LVC.2 - [N-SUBJ-N-ARG] Verb's subject is noun's semantic argument
Is the subject of the verb a semantic argument of the noun? In other words, is the verb linking the predicative noun to one of its semantic arguments that occurs as the subject of the verb?
Президентът получи покана за посещение в Германия The president received an invitation to visit Germany → Президентът president is the subject of the verb and a semantic argument (the receiver) of the invitation
Президентът получи награда Тhe president received an award→ Президентътpresident is the subject of the verb and a semantic argument (the receiver) of наградаaward
Ο Γιάννης πρόβαλε αντίσταση στις αρχέςo γianis provale antistasi stis arches The John presented resistance to the authorities John resisted to the authorities
Susjed jedobio dozvolu za gradnju Neighbour received a permission for construction → Neighbour is the subject of the verb and a semantic argument (the receiver) of the permission
聴衆が彼を高く評価した(こと)audience.nom he.acc highly evaluation.made (the fact)The audience gave him high praise → The subject is a 'praiser'
Piotr dostał pozwolenie and budowę Piotr received a permission for construction → Piotr is the subject of the verb and a semantic argument (the receiver) of the permission
Beata ma marzenia o spokoju Beata has dreams about peace → Beata is the subject of the verb and a semantic argument (the possessor) of the dreams
wyborcy ponoszą za to winę the electorate bears the responsibility for this→ wyborcy electorate is the subject of the verb and a semantic argument (the agent) of the guilt
ustawa budzi zastrzeżenia the law wakes-up reservationsthe law raises reservations→ ustawalaw is the subject of the verb and a semantic argument (the theme) of zatrzeżeniareservations
Јелена је Бранку узвратила посету Jelena je Branku uzvratila posetu Jelena returned Branko's visit. → Jelena is the subject of the verb and a semantic argument of the visit (visitor)
The report provides information about the economy → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
El informe facilita información clave the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
Le rapport fournit des informations cruciales the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is the subject of the verb and but not a semantic argument of the confidence
komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is the subject of the verb but not a semantic argument of wybory vote, which only requires the voters and the matter of the vote
O relatório traz informações polêmicas the report provides polemic information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
комисија је спровела гласање komisija je sprovela glasanje the committee carried out the vote→ комисија komisija committee is the subject of the verb but not a semantic argument of гласање glasanje vote, which only requires the voters and the matter of the vote
It is not always easy to determine if the verb's subject is an argument of the noun. You can use the former syntactic version of this test to verify your intuitions.
Test LVC.3 - [V-LIGHT] Verb with light semantics
Is v semantically light, that is, is the semantics that v adds to n restricted to: (i) what stems from its morphological features (e.g. future, plural, perfective aspect, etc.), (ii) pointing at the semantic role of n played by v's subject?
معروف قدم present a favor to give a favor → قدم to give adds no meaning to معروف favorbesides that of performing activity
زيارةبقام to do a visit to pay a visit → قام to do adds no meaning to visit زيارة besides that of performing an activity
държа реч to make a speech → държа adds no meaning to реч besides that of performing an act
поемам отговорност to take responsibility → поемам adds no meaning to отговорност besides that of having a property
Angst haben to have fear → haben adds no meaning to Angst besides that of having a property.
παίρνω μία απόφαση → παίρνω take adds no meaning to απόφαση decision besides that of performing an activity
δίνω μία απάντηση → δίνω give adds no meaning to the noun απάντηση besides that of performing an activity
διενεργώ έλεγχο perform a check → διενεργώ perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
διαπράττω ένα έγκλημα → διαπράττω commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
ασκώ δριμεία κριτική → ασκώ commit adds no meaning to the noun κριτική besides that of performing a cognitive activity
νιώθω πολύ άγχος → νιώθω feel adds no meaning to άγχος besides that of being in a mental state
έχω άγχος have anxiety → έχω have adds no meaning to άγχος anxiety besides that of being in a mental state
προβαίνω σε καταγγελία to make a complaint, to complaint → προβαίνω make adds no meaning to καταγγελία complaint besides that of performing an activity
make a decision → make adds no meaning to decision besides that of performing an activity
have fear → have adds no meaning to fear besides that of having a property
perform a check → perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
commit a crime → commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
pay a visit → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
deliver a speech → the verb in its usual sense means 'to move from one place to another', but here it is not used in this sense and does not add any semantics to the "speech" event
undergo a surgery → undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
tomar una decisión to make a decision→ tomar adds no meaning to decisión besides that of performing an activity
tener miedo to have fear → tener adds no meaning to miedo besides that of having a property
lo egin sleep do to sleep → the verb egin adds no meaning to the noun lo besides that of performing an activity
ils reçoivent l’ordre de partir they receive the order of leavingthey are ordered to leave → receive adds no meaning to order besides indicating that the subject is the recepient of the order
il a subi une intervention chirurgicale he has undergone an intervention surgery he underwent surgery → undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → ποιέομαι adds no meaning to τιμωρίαν besides that of performing an activity
donijeti odluku to make a decision → donijeti in its usual sense means 'to bring', but here it is not used in this sense and does not add any semantics to event
prendere una decisione → prendere adds no meaning to decisione besides that of performing an activity
avere paura → avere adds no meaning to paura besides that of having a property
eseguire un controllo → eseguire is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
commettere un crimine → commettere is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
fare una visita → the verb in its usual sense means 'make', but here it is not used in this sense and does not add any semantics to the "visiting" event
fare un discorso → the verb in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to the "speech" event
een wandeling maken to take a walk → maken adds no meaning to wandeling besides that of performing an activity
schrik hebben to have fear → hebben adds no meaning to schrik besides that of having a property
wystąpić z wnioskiem to stand out with a proposal to put forward a motion → wystąpić z stand out with adds no meaning to wniosekmotion besides that of performing an activity
apresentar uma lesão present a lesion to have a lesion → to present adds no meaning to lesion besides that of having a property
estar com medo be with fear to be afraid → to be with adds no meaning to fear besides that of being in a state
a lua o decizieto make a decision → lua adds no meaning to decizie besides that of performing an activity
изрећи казну izreći kaznu to pronounce a sentence → изрећи izreći to pronounce adds no meaning to казну kaznu sentence besides that of performing an activity
donner son avis to give one's opinion→ donner adds the information that the opinion is communicated
Ce fait attire l'attention de la justice This fact attracts the attention of the justice → attirer indicates the attention starts
πολέμου παύσασθαιpolemou pausasthai war end to stop fighting → παύσασθαι adds an aspectual meaning to the noun πολέμου
przejść na emeryturęto cross to retirementto take retirement→ przejść adds an inchoative (change-of-state) meaning to the noun
dopełnić obowiązkuto fulfill one's duty→ dopełnićfulfill adds a fulfillment meaning to obowiązekduty
dar uma opinião to give an opinion → to giveadds the meaning of communication which is not present in the name itself (one can ter uma opinião to have an opinion without ccommunicating it).
испунити дужност ispuniti dužnost to fulfill one's duty → испунити ispuniti fulfill adds a fulfillment meaning to дужност dužnost duty
Note that this light semantics of the verb is either usual for that verb (i.e. the verb is a pure syntactic operator, like commit, perform), or occurs in the context of the particular noun (e.g. for pay in to pay a visit). Both types of verbs pass the test.
In our view of LVCs, we do not require a light verb to be "bleached", as it is sometimes described in the literature. We simply do not take into account the relation between the verb's use as a light verb and its other uses. While the specific meaning added by light verbs to the predicative nouns have been extensively studied and described (e.g. by Miriam Butt and Tafseer Ahmed), we do not adopt any fine-grained classification here. If you have a doubt about a verb's "lightness", proceed to the next test: if you can evoke the same event/state without using the verb, then it is considered light.
Test LVC.4 - [V-REDUC] - Verb reduction
Try to build an NP without the verb, in which v's subject s becomes n's dependent. You might need to test several prepositions (of, by, for, from), possessives (my, her, somebody's), postpositions, case markers, as long as you use no verb. Can this verbless NP refer to the same event or state as the candidate v+n construction does?
This test has a simple formulation but its application has some important subtleties which are central to our definition of the LVC.full category. The goal of this test is to keep only constructions in which the predicative noun is an event or state , excluding "gray-zone" predicates.
First, if it is not possible to build an acceptable NP where the verb v's subject s becomes a dependent of the noun n, e.g. using any preposition, postposition and/or case marker, this means that the verb is not light, and the construction cannot be annotated as LVC.full. This may remove constructions in which there is control, that is, both the noun and the verb share the same subject. However, control is not sufficient to characterize an LVC.full. In other words, LVC.4 fails, the verb is not completely light, and you cannot annotate the construction as LVC.full, even if intuitively it resembles an LVC.full due to control:
Paul a eu l'occasion de dormir Paul has had the oportunity to sleep Paul had the oportunity to sleep → *l'occasion de Paul de dormir is unacceptable
Politik jedal napoved The politician made a forecast → njegova napoved his forecast both refer to the same event
Second, the fact that the NP is acceptable does not suffice to characterise an LVC.full. Furthermore, the NP version in which the verb was omitted, if acceptable, must evoke the same event or state as the LVC. Here are some tricky examples and some recommendations about how to interpret them:
отправих покана към приятелите си I sent an invitation to my friends→ покана invitation can be interpreted both as the act of inviting and as its contents; for the first reason we count this candidate as LVC.full
Η Μαρία έστειλε ένα γράμμα Maria send.03.SG a letter → Το γράμμα της Μαρίας refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Η Μαρία έχει την άποψηi maria echi tin apopsi Maria has the opinion Maria believes and more generally, cases of έχω + a noun refering to the state of having a mental content (άποψη, γνώμη, πεποίθηση) → η άποψη της Μαρίας is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Η Μαρία έδωσε την υπόσχεσηi maria eδose tin iposchesi the maria give.3.PST the promise Maria promised and more generally, cases of δίνω + a noun refering to a speech act (υπόσχεση, διαταγή, απάντηση, κατάθεση) → Η υπόσχεση της Μαρίας refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Η Μαρία πήρε μία απόφαση I maria pire mia apofasi The Maria take.03.PR a decision Maria decided → απόφαση can refer to the deciding event (μία δύσκολη απόφαση) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
Mary sent a letter → Mary's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Mary has an opinion and more generally, cases of have + a noun refering to the state of having a mental content (opinion, belief) → Mary's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Mary made a speech and more generally, cases of make + a noun refering to a speech act → Mary's speech refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Mary made a decision → decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
María envió una carta María sent a letter → La carta de María María's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
María dio un discurso María made a speech and more generally, cases of dar + a noun refering to a speech act → el discurso de María refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
María tomó una decisión María made a decision → decisión decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
οὐκ ἂν ἐπιστολὴν ἔπεμπονouk an epistolēn epempon not PRT letter.ACC send.3pl they would not have sent a letter → ἐπιστολήν refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
τὴν γὰρ γνώμην εἶχεtēn gnо̄mēn eikhe the thus opinion have.3SG he thus held the opinion and more generally, cases of have + a noun referring to the state of having a mental content → γνώμην is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
ὁ δὲ Σιτάλκης πρός τε τὸν Περδίκκαν λόγους ἐποιεῖτοho de Sitalkēs pros te ton Perdikkan logous epoieito the Sitalkes to also the Perdikkas speech.ACC do.3SG Sitalkes spoke to Perdikkas and more generally, cases of make + a noun refering to a speech act → λόγους refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Maria wysłała wiadomość Maria sent a message→ wiadomość Marii Maria's message refers to the contants of the message sent by Maria, rather than to the sending event itself
Maria jest zdania, że Mary has the opinon that... → zdanie Marii Mary's opinion refers to the content of the opinion, and not to the state of having an opinion
miał na celu awans He had promotion on the aim His aim was a promotion→ jego cel refers to the aim inself, and not to the state of having a aimta partia w wyborach miała większość this party had a majority in the elections→ #większość tej partii the majority of the party provokes a considerable shift in meaning
złożył zeznania na policji he gave testimony on the police office→ jego zeznania can be interpreted both as the act of testimony and as its contents; for the first reason we count this candidate as LVC.full
Finally, some nouns, especially nominalisations, are ambiguous between events and their participants. For instance, a costruction may be an event (the construction of the bridge took 2 years) or its result (this bridge is a spectacular construction). In that case, if the verbless NP can refer to the event, then you should prefer this reading over the "participant" interpretation. For example, in John made a construction, you may ask if John's construction refers to the construction event or to its result. In this case, it can refer to the event, so it should be annotated as LVC.full.
Test LVC.5 - [V-SUBJ-N-CAUSE] Verb's subject is noun's cause
Is the subject of the verb expressing the cause of the predicate expressed by the noun? In other words, does the verb bring an additional participant to the scene, representing the source or cause of the event or state referred to by the noun?
to give a headache → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provoke, which is a prototypical causative verb. Here, the subject is not the agent of destruction, but its cause. Notice that if the sentence was the explosion provoked the destruction of the building, then the construction would be an LVC.full
residents seek to build consensus on the development of the territory → the semantic argument of consensus is the topic on which everybody agrees, the subject of build consensus expresses an external participant responsible for the consensus to exist.
dar dolor de cabeza → X has a headache, the cause of the headache, indicated as the subject of dar is not a semantic argument
la nueva ley provocó la destrucción del edificio the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provocar to provoke, which is a prototypical causative verb. Here, the subject is not the agent of destrucción destruction, but its cause. Notice that if the sentence was la explosión provocó la destrucción del edificio the explosion provoked the destruction of the building, then the construction would be an LVC.full
dać podstawy prawne to give legal foundation
nakładać na kogoś powinność to put a duty on sb.
narazić kogoś na straty to expose someone to losses
stawiać komuś cel to set an aim to someone
ślady krwi wzbudziły podejrzenia policji the traces of blood raised suspicion to the police
to give birth → tricky case, since the subject of give actually is a semantic argument of birth, so it cannot be its cause. This construction must be annotated as VID (it does not pass test LVC.4 either).
excessive heat provokes fire → even though provoke prototypically expresses a cause, in this case fire is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
dar a luz to give birth→ tricky case, since the subject of dar to give actually is a semantic argument of a luz, so it cannot be its cause. This construction must be annotated as VID (it does not pass test VPC.4 either).
un calor excesivo provoca incendios excessive heat provokes fires→ even though provocar prototypically expresses a cause, in this case incendios is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is neither a semantic argument of wybory vote not its cause
mocny zapach uśpił czujność psów the strong scent lulled the vigilance of the dogs → the scent is the opposite of the cause of vigilance
Marija je poslala pismo Marija sent a letter → Marijino pismo Marija's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Marija ima mnenje Marija has an opinion and more generally, cases of imeti to have + a noun refering to the state of having a mental content (mnenje, predstava, dvom opinion, idea, doubt ) → Marijino mnenje Marija's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Marija je postavila vprašanje/trditev Marija posed a question/statement and more generally, cases of postaviti make + a noun refering to a speech act → Marijino vprašanje Mary's question refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Constructions annotated as LVC.cause involve:
When the construction involves a typically causative verb (e.g. cause, provoke), it might seem counter-intuitive to annotate it as VMWE because it looks perfectly regular, not presenting any VMWE idiosyncrasy. However, it turned out difficult to distinguish idiosyncratic from regular LVC.cause, so both should be annotated, like for LVC.full. In other words, some LVC.cause constructions are compositional and can be understood as complex predicates with a causal support verb, regardless of their compositionality.
Typically causative verbs (e.g. cause, provoke) can sometimes be light. In this case, according to the LVC decision tree, LVC.full has priority over LVC.cause. For instance, the announcement provoked an unexpected reaction should be annotated as LVC.full and not LVC.cause, although provoke is a typically causative verb. Indeed, reaction has two arguments (reaction of X to Y), one of which is the subject of the verb (test LVC.2 passes). In other words, typically causative verbs may be used in either LVC.full or LVC.cause, depending upon whether the cause subject of the verb is a normal, canonical argument to the predicative noun (LVC.full) or an "external" non-canonical cause (LVC.cause).
Some verbs could be considered causative, but their interpretation goes beyond purely indicating the cause of the event/state. Therefore, you should NOT annotate as LVC.cause constructions involving:
Problematic cases and remarks
The (single or compound) noun n functions as a regular syntactic dependent, so LVCs exhibit regular syntactic variants.
δόξαν ἔχουσιdoxan ekhousi opinion.ACC have.3PL they hold an opinion is the canonical form
narediti konec nečemu to make an end (to something) to end (something) → the result of this action is that something is finished, which is caused by the subject of narediti to make
As explained in the section on syntactic variants of VMWEs, all LVC tests should be applied to the canonical form, that is, one in which the verb is in active voice and in finite form. If there is no canonical form, this is an indication that the target construction might not be an LVC, but a verbal idiom instead.
In many cases of LVCs, it can be said that there is some degree of selection of the verb by the noun.
имам право to be right vs *притежавам право
παίρνω απόφαση vs.#κάνω απόφαση
run a race vs *run a walk
bisita egin visit do to pay a visit vs. bisita eman visit give
mieć rację to have rightto be right vs. *posiadać rację to possess right
Yet some regularities exist. For example, large classes of nouns function with have (e.g. +property) or commit (+negative achievement). Therefore, we chose not to retain the selection of the verb as a criterion for LVC categorization. Instead, the decision tree should be applied to decide whether a candidate should be annotated as LVC.
Many authors distinguish support verbs from light verbs, still others differentiate between true light verbs and vague action verbs.
On the one hand, we take a narrower scope than what is usually considered in the literature by ignoring aspectual support verbs (except when aspect is morphological). We believe that aspectual verbs do contribute an additional (change of state) meaning to the expression, and most of the time they are completely productive, not forming interesting VMWEs. For instance, for the predicative noun walk, we will consider the light verb to have, but not the aspectual verbs to start, to pursue, to stop a walk. Thus, to have a walk is an LVC.full. Note that for some nouns such as bloom, which are in itself inchoative, we do consider to come into bloom as LVC.full, as both the verb and the noun are inchoative, so the verb does not add any semantics to the noun.
On the other hand we take a broader scope than what is usually considered in the literature by taking in cases in which the verb has light semantics per se (it only bears morphology, such as the tense and mood, in any case), which hence cannot be described as "bleached" as is usually said of support verbs. For instance, whereas to pay does not have its usual meaning in to pay a visit, it cannot really be said that commit does not have one of its meanings in commit a crime (note that commit can be used with any negatively charged achievement noun, e.g. suicide, crime, fraud, felony...). Nonetheless, we annotate to commit a crime as LVC.full since it passes all tests.
One test often used in the literature is the existence of a morphologically related verb or adjective that means the same as the LVC. For instance, to make a visit is equivalent to to visit, to have an illness is equivalent to to be ill. Note however that it is neither sufficient nor compulsory:
Nonetheless, it might be useful to reason about the derivationally-related equivalents to decide whether a noun is predicative in test LVC.1. Therefore, here are some useful questions that might help deciding about the predicative nature of the noun in the LVC candidate
Verb paraphrase Is the abstract noun derivationally related to a verb with the same semantics? Then, there is probably a semantic argument, which coincides with the subject of the verb, so test LVC.1 passes:
правя грешка to make a mistake = греша/сгрешавам to make a mistake
ο Γιάννης κάνει ένα ταξίδι John makes a trip = o Γιάννης ταξιδεύει
ο Γιάννης έχει θάρρος John has courage = ο Γιάννης είναι θαρραλέος John is courageous → and, more generally, characteristics and attributes
ο Γιάννης έχει πείνα/δίψα John has hunger/thirst = ο Γιάννης πεινάει/διψάει John is hungry/thirsty → and, more generally, physical sensations
ο Γιάννης έχει πάθος/φόβο/θυμό John has passion/fear/anger = ο Γιάννης παθιάζεται/φοβάται/θυμώνει John is passionate/afraid/angry → and, more generally, feelings, emotions, states
John has a walk = John walks
Juan da un paseo Juan takes a walk = Juan pasea Juan walks
Janica jeodnijela pobjedu Janica carried away a win = Janica je pobijedila Janica won
Ewa odniosła zwycięstwo Eva carried away a victory = Ewa zwyciężyła Eva won
Марко је узео учешће Marko je uzeo učešće Marko took participation = Марко је учествовао Marko je učestvovao Marko participated
Adjective paraphrase: Is the abstract noun derivationally related to an adjective with the same semantics? Then, there is probably a semantic argument, which coincides with the noun that is modified by the adjective, so test LVC.1 passes.
нямам търпение to not have patience = съм нетърпелив to be impatient
нося отговорност to carry responsibility = съм отговорен to be responsible
Ο Γιάννης έχει δύναμη = Ο Γιάννης είναι δυνατόςO Γianis echi δinami = O Γianis ine δinatos → and, more generally, characteristics and attributes
John has hunger/thirst = John is hungry/thirsty → and, more generally, physical sensations
John has passion/fear/anger = John is passionate/afraid/angry → and, more generally, feelings and emotions
John has problems/difficulties = Something is problematic/difficult for John → and, more generally, states
Juan tiene hambre Juan has hunger = Juan está hambriento Juan is hangry → and, more generally, physical sensations
Anek = Ane gosetuta Ane hunger has = Ane hungry is Ane has hunger = Ane is hungry→ and, more generally, physical sensations
nositi odgovornost to carry responsibility = biti odgovoran to be responsible
mieć straty to have losses = być stratnym to have lost sth
mieć sens to have a sense to make sense = być sensownym to be reasonable
Synonym verb/adjective paraphrase: Does the abstract noun have a synonym/hypernym derivationally related to a verb or adjective with the same semantics? Then, the questions above can be applied to the synmonym verb/adjective.
John has a chance to do something = John is likely to do something → chance has no corresponding verb or adjective, but likelihood is a synonym
dokonać inwazji to perform an invasion = wtargnąć to invade
The existence of a related verb is not a definitive tests, but a hint that the noun is probably predicative. Since determining whether a noun is predicative is tricky, we advise language teams to provide additional documentation and examples for borderline cases.
The previous version of the guidelines had a syntactic test which you can still use to verify if the verb's subject is an argument of the noun. However, this test was considered hard to apply in the previous guidelines, and is not mandatory anymore.
The syntactic test consists in trying to add the semantic argument as a complement of the noun in the presence of the verb. In other words, does the noun n, in the presence of v, prohibit at least one syntactic argument a which it normally licensed in the absence of v?
An alternative formulation for this test is the following: Let s be the subject of v, and let r be the semantic role that s plays with respect to the noun n. Is it prohibited for r to be realized both by s and by a syntactic argument a of n, except when a is in the whole–part relation with s?Paul hat eine Entscheidung über das Budget getroffen Paul made a decision on the budget + die Entscheidung des Rates über das Budget the council's decision on the budget → *Paul traf die Entscheidung des Rates über das Budget *Paul made the committee's decision on the budget — the decision maker cannot modify decision
ο πρωθυπουργός έκανε επίσημη επίσκεψη του υπουργού στον Αμερικανό πρόεδρo proθypurgos ekane episimi episkepsi tu ypurgu ston amerikano proedro — the visitor cannot be a modifier of επίσκεψη
Paul made a decision on the budget + the committee's decision on the budget → *Paul made the committee's decision on the budget — the decision maker cannot modify decision
Paul had a discussion with Mary+ Peter's discussion → *Paul had Peter's discussion with Mary
Bjarnson scored a goal + Arnason's goal → *Paul scored Arnason's goal but Paul scored the goal of Iceland — the scoring entity can only modify goal in the last case, when they are part of the Iceland team
Pablo tomó una decisión con respecto al presupuesto Pablo made a decision on the budget + la decisión del comité con respecto al presupuesto the committee's decision on the budget→ *Pablo tomó la decisión del comité con respecto al presupuesto Pablo made the committee's decision on the budget— the decision maker cannot modify decisión
Bjarnson a marqué un but + le but d'Arnason → *Paul a marqué le but d'Arnason but Paul a marqué le but de l'Islande — the scoring entity can only modify but (goal) in the last case, when they are part of the Iceland team
Paweł prowadzi rozmowy → *Paweł prowadzi rozmowy Piotra Paweł leads Piotr's talks , Paweł prowadzi rozmowy komisji Paweł leads the talks of the commission - the discussing entity komisjacommission can only modify rozmowytalks if Paweł belongs to the commission.
Jan otrzymał wymówienieJan received a dismissal + wymówienie dla Pawła dismissal for Paweł → *Jan otrzymał wymówienie dla Piotra
Pedro sofreu prejuízo com a compra Pedro suffered finantial loss with the purchase + o prejuízo do José José's finantial loss → *Pedro sofreu o prejuízo do José com a compra — the financial loss cannot be modified by the affected entity
A Maria fez um aborto Maria made an abortion + o aborto da Joana Joana's abortion → #A Maria fez o aborto da Joana — the noun cannot be modified by another patient
O médico realizou o parto com sucesso The doctor performed the childbirth with success + o parto do Dr. Pedro Dr. Smith's childbirth → *O médico realizou o parto do Dr. Pedro com sucesso — the childbirth could be modified by the mother (patient) but not by another doctor (agent).
The rationale for this tests is that a semantic argument n cannot be realized as its syntactic dependent, since it is already realized as v's syntactic dependent instead (usually as v's subject). For instance the noun visit takes two semantic arguments, the visitor and the visited entity, as in the visit of the Queen to the Prime Minister. When used in to pay a visit, the visitor semantic argument is realized as the subject of to pay (The Queen paid a visit to the Prime Minister), and cannot be realized at the same time within the NP headed by visit (*The Queen paid a visit of the Lady to the Prime Minister).
Note that the syntactic formulation may be tricky to apply. It is sometimes possible to add the semantic argument as a complement of the noun in the presence of the verb, if we change the interpretation of the argument (and thus its thematic role). For instance, even though the construction John took Luke's decision may be acceptable, the interpretation would be comparative (John took a decision that Luke should have taken). Therefore, the test passes since the verb is still connecting a predicate (decision) to its argument (John, the decider).
Section 5.3
Verbal idioms (VID)
Verbal idioms constitute a universal category. A verbal idiom (VID) has at least two lexicalized components including a head verb and at least one of its dependents. The dependent can be of different types. Here are some examples:
κόβει το μάτι μου kovi to mati mu cut the eye my to notice
Бог некога погледао Bog nekoga pogledao God looked at someone to be lucky
ђаво је умешао прсте đavo je umešao prste the Devil mixed in his fingers an unfavorable outcome
пао некоме мрак на очи pao nekome mrak na oči darkness fell on someone's eyes to blow a fuse
λαμβάνω μέροςtake part
κρατάω τα μπόσικαkratao ta bosika
добити ногу dobiti nogu to get a leg to get dumped
држати банку držati banku to hold a bank to dominate the conversation
правя сам да си говори make (someone) to talk to himself to drive (someone) crazy
ir ao ar go to the air to go on air
ударити на велика звона udariti na velika zvona to bang on big bells to spread the news
бити као запета пушка biti kao zapeta puška to be like a tense rifle to be ready for action
It is often challenging to distinguish VIDs from other VMWE categories if only one dependent of the head verb is lexicalized. The VMWE categorization depends on the category of this dependent:
With a dependent of any other category, the VMWE is always a VID, including the following:
κρατάω πισινή
tykać cudze to touch someone else'sto take something that does not belong to you
dopiąć swego to button up one's ownto fulfill one's plans
правя сам да си говори make someone talk to himself to drive someone crazy → сам да си говори is a clause
mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revenge
не знати где је некоме глава ne znati gde je nekome glava not to know where one's head is to be out of one's mind
дај шта даш дај šta daš give what you give be satisfied with anything that is given to you
την πατάωtin patao
a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletive
Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of VIDs.
Fortune favors the bold
The pleasure is mine
I beg your pardon!
Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to late
било па прошло bilo pa prošlo happened and it's done let bygones be bygones
рекла казала rekla kazala said and told hearsay
If more than one dependent of the head verb is lexicalized, then the candidate VMWE is always classified as a VID.
κάνω τη ζωή ποδήλατοkano ti zoi poδilato make.1SG the life bicycle to torture
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a ride
бежати као ђаво од крста bežati kao đavo od krsta to run away like Satan from a cross to run like a bat out of hell
забити главу у песак zabiti glavu u pesak to stick your head in the sand to bury your head in the sand
ићи линијом мањег отпора ići linijom manjeg otpora to go with the line of least resistance to take the path of least resistance
att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)
Cases when there is no single clearly identifiable head verb, because of coordinated verbs or of an irregular syntactic structure, are also covered by the VID category.
coś kogoś ani ziębi, ani grzeje something neither cools nor warms someonesomeone is indifferent to something
badż tak dobry i zrób cośbe so good and do somenthingbe so good as to do something
seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
нити смрди нити мирише niti smrdi niti miriše neither stinks nor has a nice scent neither good nor bad
to pretty-print
to short-circuit
to tumble dry
In case of several lexicalized dependents, special care must be taken to identify and also annotate embedded VMWEs.
a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IRV has to be annotated as well - a three-level embedding
Idioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.
съм на червено be on redto be in debt → non-VMWE because the copula can be omitted, as in в края на месеца винаги оставам на червеноat the end of the month I always get into debt
to be somebody → idiom because #somebody loses the meaning of being important or successful
it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutch
być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
być w trakcie (czegoś) to be in the road (of sth)to be doing sth → non-VMWE because the copula can be omitted, as in wyszedł w trakciezebraniahe went out during the meeting
não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando grego
a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaning
бити једном ногом у гробу biti jednom nogom u grobu to be with one leg in the grave to be close to death →idiom because #једном ногом у гробу with one leg in the grave loses its meaning
бити зелен biti zelen to be green to be a greenhorn/to be inexperienced → idiom because #зелен green loses its meaning
Note that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.
Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings.
VID-specific decision tree:
In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a VID. Note however that this tree is to be applied only after it was referred to by the generic decision tree containing structural tests.Test VID.1 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
правя на бъзе и коприва to turn into elder and nettle to scold, to tell off → бъзе is an old word, very rarely used independently
вземам предвид, имам предвид to → предвид (as adverb) is only used in MWEs
стоя диван чапраз to stay upright as in Osman council to stay ready to serve → чапраз is an old word, very rarely used independently
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
fare lo gnorri to play dumb → gnorri is not a stand-alone word
scendere in lizza to enter the lists → lizza is not a stand-alone word
矢面に立つ arrow.face.LOC standto face direct attack → 矢面 is not a stand-alone word
wyjść na jaw to come-out to light to transpire, to become known
читати (некоме) вакелу čitati (nekome) vakelu to read somebody a scolding to scold somebody → вакела vakela is not a stand-alone word
имати на претек imati na pretek to have an abundance → претек pretek is not a stand-alone word
не часити ne časiti don't jump the gun → часити časiti is not a stand-alone word
Test VID.2 - [LEX] - Lexical inflexibility
Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
всяка жаба да си знае гьола every frog to know its own puddle → #всяка жаба да си знае локвата
eine Entscheidung treffen to meet a decision to make a decision → #eine Entscheidung machen/herstellen a decision make/produce #to make/produce a decision
φέρω βαρέως → #φέρω ελαφρώς
μπαίνει το νερό στ' αυλάκι → #μπαίνει το νερό στο ποτάμι
to go on → *to go upon
to stand firm/fast → *to stand hard/rigid/solid
tomar una decisiónto_take a decision to make a decision → #hacer/coger/producir una decisión to_make/grab/produce a decision #to make/grab/produce a decision
sputare il rospo spit the toad spit it out → #sputare la rana#spit the frog
生計を立てる means.of.living.acc stand earn an income →生計を*起こす
een beslissing nemen to meet a decision to make a decision → #een beslissing produceren a decision make/produce #to make/produce a decision
nie wchodzić w rachubę not to come into count to be out of question → #wchodzić w liczenie/rachunek
wodzić kogoś za nos to lead someone by the nose to cheat on someone → #wodzić za nozdrza/ucho/wargi
iti rakom žvižgat to go whistling to crabs to fail, to die → #iti jastogom pet to go singing to the lobsters
пустити буву pustiti buvu to let go of the fly to start a rumour/to spread news → #пустити вашку #pustiti vašku to let go of the lice
отети се контроли oteti se kontroli to break away from control to lose control → #отети се провери #oteti se proveri to break away from the examination
Usual modifications for [LEX] include replacing content words in the candidate by synonyms, hypernyms, hyponyms, antonyms, troponyms, meronyms, and related words in general.
Test VID.3 - [MORPH] - Morphological inflexibility
Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
хващам бика.DEF за рогата take the bull by the horns → #хващам бик.INDEF за рогата
не мога да си намеря място cannot find a place for myself to be extremely nervous → only exists in negative form
to pretty-print → *to prettier-print
to take turns → #to take a turn
entrar en vigor to_enter in vigor to come into effect → #entrar en vigores to_enter in vigors #to come into effects
cercare il pelo nell'uovo to look for the hair in the egg to be pedantic → #cercare i peli nell'uovo
mucha kogoś ugryzła a fly bit someone someone is in a bad temper→ #mucha kogoś ugryzie a fly will bite someone
wyciągnąć nogito stretch.PERF legsto die→ #wyciągać nogi to stretch.IMPERF legs (imperfective aspectual variant prohibited)
бити у свакој чорби мирођија biti u svakoj čorbi mirođija to be the dil in every broth to meddle → #бивај у свакој чорби мирођија bivaj u svakoj čorbi mirođija be the dil in every broth
дође као кец на једанаест dođe kao kec na jedanaest comes as an ace on an eleven an unfavorable outcome → #дође као кечеви на једанаест dođe kao kečevi na jedanaest comes as aces on an eleven
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, tense, mood, aspect, etc. - depending on the target language's morphology.
Test VID.4 - [MORPHSYNT] - Morpho-syntactic inflexibility
Does a regular morpho-syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
аз си продавам душата I sell my soul → #аз продавам неговата душа (I sell his soul)
Ο Γιάννης έριξε μαύρη πέτρα πίσω του → #Ο Γιάννης έριξε μαύρη πέτρα πίσω μας
I give you my word for that → #I give you his word for that
he was pulling my leg → #I was pulling my leg
Io ti do la mia parola→ #Io ti do la sua parola
eu perdi meu tempo I wasted my time → eu perdi teu/seu/nosso tempo
Pojdi se solit! to go salt oneself Get lost! → *Pojdi ga solit go salt him
Usual modifications for [MORPHSYNT] involve agreement or loss of agreement between some components in the candidate.
Test VID.5 - [SYNT] - Syntactic inflexibility
Does a regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
бълвам змии и гущери → #бълвам гущери и змии
to go bananas to get crazy → #bananas are gone
to drink and drive → #drive and drink
to kick the bucket → #the bucket was kicked
perder la cabeza to_loose the head to go bananas → #perder las cabezas to_loose the heads
andare in malora go to ruin go to ruin → #nella malora è andata in ruin was gone
vivi e lascia vivere live and let live → #lascia vivere e vivi let live and live
robić bokami to do with-sidesto have serious financial problems→#robić swoją robotę bokami to do one's job with sides (regular modification blocked)
dobrze komuś z oczu patrzy well someone.DAT from eyes lookssomeone looks like a good person → #uprzejmość dobrze komuś z oczu patrzy kindness well someone.DAT from eyes looks (subject prohibited)
nie zagrzać miejsca w pracy not to warm a place at worknot to stay long at one work → #zagrzać miejsce w pracy to warm a place at work (negation is compulsory)
zdechł pies! died the dog!it is a lost cause→ #pies zdechł the dog died (a regular word order variability is blocked)
wziąć w łebto take into headto fail → #wziąć porażkę w łeb to take failure into head(direct object prohibited for the normally transitive verb wziąćto take)
ведрити и облачити vedriti i oblačiti to brighten and to cloud to call the shots → #облачити и ведрити oblačiti i vedriti (regular word order variability is blocked) to cloud and to brighten
не вредети пишљивог боба ne vredeti pišljivog boba to not be worth a single bean to be worthless → #вредети пишљивог боба vredeti pišljivog boba (negation is compulsory) to be worth a single bean
носити на души nositi na duši to carry something on one's soul to carry the burden of guilt → #ношење на души nošenje na duši (nominalization blocked) carrying on a soul
jogar futebol to play football → ?futebol é jogado football is played
Section 5.4
Inherently reflexive verbs (IRV)
Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.
Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.
Namely, we will only annotate a REFLV as an inherently reflexive verb (IRV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verbs constitute a quasi-universal category.
IReflVs are a difficult category to annotate due to various problematic cases. Note in particular that in some languages, e.g. Slavic, the reflexive clitics inflect and should be considered not only in their most frequent case, i.e. accusative.
We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IRV.
IRV-specific decision tree
Test IRV.1 - [INHERENT] Inherent clitic
Does the verb only exist with the RCLI and never occurs without it?
Test IRV.2 - [DIFF-SENSE] - Different sense
Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?
Test IRV.3 - [DIFF-SUBCAT] - Different subcategorization frame
Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same syntactic argument as the RCLI in the REFLV version?
Test IRV.4 - [IMPERS ] - Impersonal
When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?
Test IRV.5 - [MIDDLE-INCHO ] - Middle or Inchoative
When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?
Test IRV.6 - [REFL ] - Reflexive
When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?
Test IRV.7 - [REFL-MUTUAL ] - Reflexive-mutual
Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning ?
Test IRV.8 - [RECIPRO ] - Reciprocal
Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning ? That is:
Problematic cases and remarks
Keep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IRV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.
In some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.
The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.
Some idiomatic constructions include reflexive clitics. Two cases are possible:
It is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IRV:
In this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as VID, since there are two syntactic arguments:
Notice that annotating only the verb and the RCLI as IRV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IRV:
In some languages, e.g. Polish, clitics inflect for case. Most cases of IRV seem to be restricted to the accusative case:
a se sfiito RCLI.ACC be.shy to be shy
a se căito RCLI.ACC repent to repent
However, other cases can appear in IRV:
a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative clitic
Some expressions can have double clitics. Only the first two words belong to the IRV:
radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneself
This category does not cover other types of pronouns and clitics. They are covered by regular VID tests and should be annotated as such. Examples of constructions that should be annotated as VID rather than IRV include:
s'en aller to self from-it go to leave
en avoir marre to have from-it enough to be fed up
il y avoir it at-it haveto exist
prender-le to take it to be beaten
a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
a-i repugnato CL.DAT loathe to loathe
a-i priito CL.DATto be favourable to sb.
Section 5.5
Verb-particle constructions (VPC)
Verb-particle constructions (VPCs), sometimes called phrasal verbs or phrasal-prepositional verbs, like
constitute another quasi-universal category. They have the following general characteristics:
VPCs are pervasive in English, German, Swedish, Hungarian and possibly some other languages but irrelevant to or infrequent in Romance and Slavic languages or in Farsi and Greek for instance.
In some Germanic languages and also in Hungarian, verb-particle constructions can be spelled either as one (multiword) token or separated. Both types of occurrences are to be annotated:
Herr Müller, passen Sie auf! Mr. Müller, be careful
Ongelukken kunnen voorkomen Accidents can happen
The first challenge in identifying a VPC is to properly distinguish the particle from a possibly homographic preposition, e.g.:
or a verbal prefix:
Namely, a particle, contrary to a preposition, cannot govern a complement. This can be tested depending on the verb's subcategorization frame:
transitive The fire did in the whole block or The fire did it in
???transitive Hans is zijn moeder aan het opbellen or Hans is zijn moeder op aan het bellen
Prefixes, contrary to particles, can never be spelled separately from the verb, nor can the past tense of prefixed verbs be formed with the infix -ge-
*er hat den See umgefahren, instead: er hat den See umfahren he drove around the lake but: er hat das Schild umgefahren he run over the sign
See the language-specific tests for more details on distinguishing particles from prepositions and verbal prefixes.
Note that in this shared task we do not account for compositional verb-particle combinations, i.e. those whose meaning can be deduced from the meaning of the preposition and of the verb:
Some combinations may have both compositional and non-compositional meanings depending on the context and only the latter should be annotated:
the following decision tree should be applied to decide whether a candidate should be annotated as a VPC or not.
VPC-specific decision tree:
Test VPC.1 - [PART-REDUC] - Verb without the particle refers to the same event/state
Can a sentence without the particle refer to the same event/state as the sentence with the particle? Special care must be taken when the same construction might or might not be a valid VPC depending on its context.
Die Bäuerin hat sich wieder eingefangen the farmer’s wife has herself again catched the farmer’s wife has calmed down again does not imply #Die Bäuerin hat sich wieder gefangen the farmer’s wife has catched herself again
Der Schüler legt die Prüfung ab the pupil lays the exam off the pupil takes the exam does not imply #der Schüler legt die Prüfung the pupil lays the exam
Das Schiff legt vom Hafen ab the boat lays from the harbor off the ship leaves the harbor does not imply #das Schiff legt vom Hafen the boat lays from the harbor
to check in upon arrival does not imply #to check upon arrival
Nem jött be ez a koktél nekem I didn’t like this cocktail → Bejött ez a koktél nekem I liked this cocktail does not imply #Jött ez a koktél nekem this cocktail bumped into me
Der Lehrer legt das Buch auf dem Tisch ab the teacher lays the book on the table apart the teacher puts the book away on the table implies Der Lehrer legt das Buch auf den Tisch the teacher puts the book on the table
Der Lehrer legt den Mantel ab the teacher lays the coat off the teacher takes off his coat implies Der Lehrer legt den Mantel the teacher puts the coat
to eat up the cookies implies to eat the cookies
Nem jött be a szobába He did not come into the room → (Bejött a szobába he entered the room implies Jött a szobába he came into the room
Test VPC.2 - [PART-SPATIAL] - Spatial particle
Is the particle spatial in the context of the verb, i.e. does it express direction or position?
to give something back
to stay up tonight
You may go in now
to mix ingredients together
aankijken look at
iets optillen to lift something up
slijm ophoesten cough up phlegm
to mix ideas together
Test VPC.3 - [PART-SPATIAL-LIT] - Spatial particle in a literal reading
Does the VPC candidate have a literal counterpart in which the particle is spatial, i.e. expresses direction or position?
Section 5.6
Multi-verb constructions (MVC)
Multi-verb constructions (MVC) constitute a quasi-universal category. They are VMWEs composed by a sequence of two adjacent verbs (in a language-dependent order), a functionally governing verb V-gov (also called a vector verb) and a functionally dependent verb V_dep (also called a pole/polar verb), which have the following characteristics:
The behavior of MVCs is very heterogeneous across languages. Therefore, most tests for the detection of MVCs are language specific. The current tests were designed for Indonesian, Hindi, Japanese and Chinese. The generalization of these tests cross-lingually is planned as future work.
MVC-specific decision tree for Hindi
MVC-specific decision tree for Chinese
MVC-specific decision tree for Indonesian and Japanese
- TODO (in the meantime, follow the tests one by one)
MVC-specific decision tree for any other language
Test MVC.1 - [MVC-STRUCT] MVC-like structure
Does the candidate respect the necessary structural (language-dependent) requirements for an MVC?
Hindi
Test MVC.1.BASE [MVC-STRUCT-BASE]: Is V-dep non finite and does V-gov carry the tense, aspect and agreement inflections?
Japanese
Test MVC.1.IMORPH: Does the first verb (V-dep) contain the i-morph suffix?
Any other language
Go to the next test
Test MVC.2 - [INS-DISCARD] Insertion which discards
Does the candidate sequence appear, or could it appear, with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that this candidate is a regular combination and should be discarded?
Chinese
Test MVC.2.ASPECT - [INS-DISCARD-ASP]: Can the aspect marker 了 -leperfective or 过 -guoprovide the meaning of the prefix be inserted between between V-gov and V-dep (or the opposite)?
Indonesian
Test MVC.2.PRON - [INS-DISCARD-PRON]: Can a pronoun like dia he/she be inserted between the first [AS: between V-gov and V-dep or the opposite?] and second verb?
Test MVC.2.CLAUSE - - [INS-DISCARD-CLAUSE]: Can a that-clause like bahwa that, or a whether-clause like apakah whether be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?], where the first verb [AS: V-gov?] is a saying verb like mengatakan say or an asking verb like menanyakan ask?
Test MVC.2.PURPOSE - [INS-DISCARD-PURP]: Can untuk for/to be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?]?
Japanese
Test MVC.2.HONOR - [INS-DISCARD-HONOR]: Is the first verb [AS: V-gov or V-dep?] preceded by the honorific particle お o and is the second verb する/できるsuru/dekiru?
Any other language
Go to the next test
Test MVC.3 - [INS-REDIRECT] Insertion which redirects
Does the candidate sequence appear with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that a particular test should be applied next?
Hindi
Test MVC.3.KAR - [INS-REDIRECT-KAR]: Does conjunctive participle kar or ke appear attached to or immediately after V-dep?
Any other language
Go to the next test
Test MVC.4 - [SHARE-ARGS] Shared arguments
Do V-gov and V-dep share arguments?
Test MVC.5 - [MODAL] Modal or auxiliary verb
Chinese
Is V-gov a modal or an auxiliary verb?
Any other language
Go to the next test
Test MVC.6 - [MANNER] Manner verb
Chinese, Hindi, Indonesian, Japanese
Does V-gov indicate the manner or means (and possibly a direction) of the action expressed by V-dev (in Chinese: or vice versa)?
Any other language
Go to the next test
Test MVC.7 - [REASON] Reason verb
Hindi and Chinese
Does V-gov indicate the reason of the action expressed by V-dep (in Chinese: or vice versa)?
Any other language
Go to the next test
Test MVC.8 - [SEQ] Temporal sequence
Hindi, Indonesian, Japanese
Are the verbs bound by a temporal sequence?
Any other language
Go to the next test
Test MVC.9 - [SIMULT] Simultaneous actions
Do the verbs indicate rapid and simultaneous actions (without resorting to a coordination conjunction)?
Test MVC.10 - [LIGHT] Light verb
Hindi
Does V-gov belong to a closed list of light verbs: aa come, baiTh sit, chal go, chuk finish, choR leave, Daal throw, de give, ja go, jataa declare, khaa eat, lagaa put, le take, maar hit, paa get/obtain, paRh fall, rakh keep, uTh rise?
Any other language
Go to the next test
Test MVC.11 - [PREP-LIKE] Preposition-like verb
Chinese
[Hongzhi Xu: this test is not very clear and is only specific to one particular MVC (it should probably be deleted in future editions)] Is the second verb in the candidate [AS: V-gov or V-dep?] a preposition-like verb like 成 chéng become?
Any other language
Go to the next test
Test MVC.12 - [NOUN-LIKE] Noun-like verb
Japanese
Are any of the components [AS: V-gov or V-dep?] in the candidate noun-like arguments?
Any other language
Go to the next test
Test MVC.13 - [V-LEX] Lexical inflexibility
Does a regular replacement of V-dep by a related verb taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
ce mot veut dire autre chose this word wants say other thing this word means something else → #ce mot veut chuchoter/communiquer/crier autre chose this word wants whisper/communicate/scream another thing
Section 5.7
Inherently adpositional verbs (IAVs)
Inherently adpositional verb (IAV) is a special optional and experimental category (corresponding to the IPrepV category in the first pilot annotations), and to what is also sometimes called in English prepositional verbs. It consists of a verb or VMWE and an idiomatic selected preposition or postposition that is either always required or, if absent, changes the meaning of the verb of VMWE significantly. Language teams who decide to annotate IAV should do so after annotating other categories (step 4 of the annotation process), since overlapping can be quite frequent with other categories, as detailed below. Language teams are not required to use this category.
Our definition of inherently adpositional verbs is a generalization (applying to many languages) of the annotation guidelines of the English STREUSLE corpus, which define guidelines for annotating prepositional verbs.
IAVs are verb+adposition combinations in which:
Note that idiomatic adpositional valency, in which the adposition opens a slot for a complement, should not be mistaken for verb-particle constructions. Tests distinguishing particles from prepositions can be used to disambiguate these categories.
Particles can occur after the object: to wake somebody up but prepositions cannot *to come a new restaurant across
Not only single verbs but also VMWEs may be inherently adpositional. This is why IAV annotation needs to be the last step, after all other VMWEs in a sentence have been identified and categorized. In case of overlap between another category and IAV, the whole VMWE annotation needs to be repeated with the addition of the lexicalized adposition, and the whole is annotated as an IAV.
1. to put up is annotated as VPC
2. the whole sequence to put up with is annotated as IAV
1. atenerse is annotated as IRV
2. the whole sequence atenerse a is annotated as IAV
1. ubadati se to deal RCLI is annotated as IRV, since the verb without the RCLI does not exist
2. the whole sequenceubadati se z to deal RCLI withis annotated as IAV, since the verb also does not exist without the preposition
Test IAV.1 - [CIRCUM-QUEST] Circumstantial question with no adposition
This is an adaptation of STREUSLE's guideline on prepositional verbs by Nathan Schneider and Meredith Green.In response to a declarative sentence with the verb+adposition combination, is there a natural way to query the circumstances of the verbal event using the verb, but not the adposition?
- Why do you care?
→ to care about is not annotated as IAV
- ¿Por qué te preocupas?why you worry.you? Why are you worried?
→ preocuparse por is not annotated as IAV
- Se lahko zaneseš, da ti bo kdo pomagal? Can you rely that someone will help you?Can you rely on that someone will help you?
→ zanesti se to rely on is not annotated as IAV
- #When did you come?
to come across is annotated as IAV
- #¿Desde cuándo entiende? Since when understands.she?Since when does she know?
entender de is annotated as IAV
- #Kaj gre? #What goes?
gre za is annotated as IAV
Section 6
Language-specific tests
Language-specific tests may be necessary in one of 3 cases:
Section 6.1
Language-specific categories (LS)
Language-specific categories can be proposed for annotation in this task provided that they are carefully defined and accompanied by linguistic tests that allow to distinguish them from other categories. We recommended not redefining the universal and quasi-universal categories described here, but introducing new names and abbreviations in order to answer such needs.
When a new language(-group)-specific category is introduced, we encourage the use of the LS category with a dotted extension, e.g. LS.SIM or LS.PROV (for "language-specific simile" or "language-specific proverb").
Section 6.2
Particles versus prepositions and prefixes
The following tests allow to properly identify prepositional verb particles in cases where they might be homographic with prepositions in prepositional phrases (PPs) or with verbal prefixes. The word to be discriminated is referred to as a candidate word. The tests are language-specific and concern English, German and Swedish.
English-specific test for distinguishing particles from preposition
The following tests concern English words which can be either a preposition or a particle depending on the context, e.g. up, on, through, etc. If a candidate word passes any of the two tests it can be categorized as a particle.
Test PREP.EN.1 - [FIN-PART] - Sentence-final particle
Can the sentence be reformulated so that the candidate word w occurs at the end of a clause which is: (i) affirmative or imperative, (ii) headed by the verb governing w, and (iii) not a relative clause?
I took off my clothes. I took my clothes off.
She tries to take in her clients. She tries to take her clients /in.
He has been off alcohol*He has been alcohol off.
Test PREP.EN.2 - [AD-INS] - Adjunct insertion
Is an insertion of a circumstantial adjunct prohibited between the governing verb and the candidate word?
I took off my clothes at once. *I took at once off my clothes.
She always tries to take in her clients. *She tries to take always in her clients.
He has been off alcohol recently. He has been recently off alcohol.
This test might be redundant with respect to test PREP.EN.1. It it occurs to be so (after a large-scale annotation), it may be deleted.
German-specific tests for distinguishing particles from prepositions and verbal prefixes
The following tests concern German words which can be both a particle and either a preposition or a verbal prefix, depending on the context, e.g. mit, um, vor, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.
Test PREP.DE.1 - [FIN-PART] - Sentence-final particle
Does the candidate word occur at the end of the sentence or can the sentence be reformulated so as to put the candidate word at the end?
Ich schlage vor allen zu verzeihen. I propose to forgive everyone Ich schlage es vor I propose it
Der Mülleimer wurde umgefahren. The trash bin was knocked down Er fuhr den Mülleimer um. He knocked down the trash bin
Er umfuhr den ganzen See mit dem Fahrrad. He drove around the whole lake with a bike *Er fuhr ihn um.
Test PREP.DE.2 - [SEP-PART] - Separable particle
Can the verb and the candidate word be spelled both separately and together?
Er fuhr das Schild um. He drove over the sign Er sollte das Schild nicht umfahren He should not drive over the sign
Sprechen Sie mit ihm! Speak with him! *Sie sollen ihm mitsprechen.
Swedish-specific tests for distinguishing particles from prepositions and verbal prefixes
Many words are ambiguous between particles and prepositons, e.g. för, upp, … Accordingly, the following sentence may have two different senses:
The difference can only be judged by the stress/intonation pattern. In the first case, with a particle, the stress is not on the verb but on the particle. In the second case, with a prepositional object, the main stress is on the verb, with only secondary stress on the preposition.
Test PART.SV.1 - [PART-STRESS] - Stress on the particle
Is the main stress on the candidate word rather than on the verb?
Section 6.3
Identifying multiword tokens
The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.
Swedish-specific tests for identifying MWTs
Test MWT.SV.1 - [VERB-MWT] - Verbal MWT
Does the candidate token function as a verb?
sysselsättning task-settingemployment
förklara for-clearexplain
klargöra clear-makeclarify
Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT
Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)?
avbryta off-breakcancel, bryta av break offbreak off
Test MWT.SV.3 - [CRAN-MWT] - Cranberry component in a MWT
If you split the token into its component words, is any of these words a cranberry word (i.e. it cannot be used as a standalone word, with the same part-of-speech)?
erbjuda er-offer offer → er is possible as a pronoun but not as a particle
försvåra for-difficult make difficult → svåra is possible as an adjective but not as a verb
jämföra compare → jäm is not used as a stand-alone word
för|klara for|clear explain
klar|göra creal|make clarify
Section 6.4
Language-specific inherently clitic verbs (LS.ICV)
Inherently Clitic Verbs (LS.ICV) together with the Inherently Reflexive Verbs (IRV) are pronominal verbs. LS.ICV are formed by a full verb combined with one or more non-reflexive clitic that represents the pronominalization of one or more complement (CLI). LS.ICV is annotated when (a) the verb never occurs without one non-reflexive clitic, e.g. entrarci to be relevant to something colloquial form, or (b) when the LS.ICV and the non-clitic versions have clearly different senses or subcategorization frames.
LS.ICVs represent a specific category for some Romance languages, and they are particularly frequent in the Italian language. It is often challenging to distinguish LS.ICV from IRV, particularly because some clitics may be ambiguous, like se/si which is a polyfunctional clitic pronoun and grammatical marker (and has many functions such as reflexive, reciprocal, impersonal, passivizing, aspectual, middle).
If the CLI has a clear reflexive meaning the VMWE might be an IRV.
We start by listing the various categories of LS.ICVs before providing tests to decide whether to annotate a given occurrence as an LS.ICV.
LS.ICV-specific decision tree
Test LS.ICV.1 - [CL-INHERENT] Inherent clitic
Does the verb only exist with the CLI and never occurs without it?
Test LS.ICV.2 - [CL-DIFF-SENSE] - Different sense
Given the same verb without the CLI/CLIs, are all of its meanings clearly different from the inherently clitic form?
Test ICV.3 - [CL-DIFF-SUBCAT] - Different subcategorization frame
Is the subcategorization frame of the simple verb without the CLI different from the subcategorization frame of the LS.ICV?
Section 6.5
Italian-specific decision tree
For Italian, a language-specific category called inherently clitic verbs (LS.ICV) has been defined. This implies a modified version of the annotation decision tree.
Steps 1-4 are still valid in Italian. But Step 3 should be realized with the decision tree below instead of the generic decision tree.
Test IT.S.1 - [CLITICS-ONLY] Clitics only
Are all lexicalized dependents of the verb clitics??
Section 6.6
Hindi-specific decision tree
For Hindi, LVCs can be formed by a verb and a noun, or by a verb and an adjective which is morphologically identical to an eventive noun. This implies a modified version of the annotation decision tree.
Steps 1-4 are still valid in Hindi. But Step 3 should be realized with the decision tree below instead of the generic decision tree.
Section 7
Annotation management
This section groups the documentation on practical aspects of the annotation campaign management. Some of these aspects are specific to this shared task, such as the edition of examples by language leaders and the use of the annotation platform FLAT. Others are more generic and concern the guidelines in general, such as the FAQ section.
Section 7.1
Frequently Asked Questions (FAQ)
Annotators often face questions and challenging examples. When several annotators ask the same question, we will update the list of frequently asked questions.
However, we suggest that language teams set up another communication platform to deal with questions that are specific to a language. This can take the form of a shared online document, a wiki, a dedicated bug tracking system or mailing list. We also suggest keeping track of decisions taken considering borderline examples (with a list of expressions to which the decision applies). These should be kept in a centralized document or page that all annotators can access.
Whenever you think that a question can also be interesting to other languages, please notify the organizers and we will try to update this page.
Check the glossary entry that defines unexpected change in meaning
In some languages adpositions (pre- or post-positions), clitics and determiners are subject to contractions (i.e. they yield multiword tokens, MWTs). If they are properly split by the tokenizer, only the lexicalized parts of each contraction should be annotated. If you use FLAT for annotating, the display of split contractions is twofold: both in its folded and unfolded version. Only the latter should be subject to annotation, e.g. Jean bénéficie du de le traitement Jean benefits from the treatment, Jean donne du de le grain à moudre à son fils Jean gives grain to grind to his sonJean gives an occasion to act to his son.
Sometimes, however, tokenizers might not handle contraction splitting properly. In this case, a lexicalized component of a VMWE can be merged with an external word:
A similar problem occurs in languages with productive compounding, where a lexicalized component of a VMWE and a free modifier can build up a multitoken word (since compound splitting might not be a standard feature of a tokenizer):
Heisshunger haben to have hot hunger to be ravenously hungry
Yet another related phenomenon concerns acronyms whose spelled-out versions may contain predicative nouns which in the abbreviated versions boil down to single letters:
the book underwent OCR (optical character recognition)
the program carries out a PCA (principal component analysis)
le patient fait un AVC (accident vasculaire cérébral)
Since the current annotation format is token-based, we prohibit correcting tokenization errors and compound splitting by the annotators for the sake of coherence. Therefore the annotation of such contractions, compounds and acronyms finds no fully satisfactory solution in our schema. We propose to annotate a whole MWT each time it contains a word which is part of a VMWE. Annotators should add a textual comment about the mixed status of this MWT:
Heisshunger → MWT containing a lexicalized VMWE Hunger and an additional modifier heiss
A component shared by two or more coordinated VMWEs should be annotated as belonging to both of them.
Such hesitation issues should normally be solved by the structural tests. For instance, consider the German expression sich eine Frage stellen SELF a question put to doubt. It may seem to belong to both IRV, since sich is required only if stellen co-occurs with Frage, and LVC, since Frage keeps its original meaning and stellen brings no additional meaning. However, test S.2 [1DEP] indicates that an expression like this should be annotated as a VID, since the verb has more than one lexicalized syntactic dependent.
Similarly, the French expression avoir peur have fear to be afraid seems to have features of a VID. Unlike most LVCs, it does not allow a determiner *avoir une peur have a fear , except when the noun is modified avoir une grande peur have a great fear . However, test S.4 [CATEG] in the generic decision tree 2, and the LVC-specific decision tree indicate that it belongs to the LVC category.
Candidate VMWEs embedded in other VMWEs should be annotated only if they have a VMWE status also outside the particular context. For instance, the VMWE to let the cat out of the bag should be annotated as a VID, and its embedded VMWE to let out as a VPC.
On the other hand, the French expression se faire des idées SELF make DET.PL ideas to imagine things which are not true, se faire should not be annotated as IRV, since it is not inherently reflexive as a standalone verb+clitic combination.
Hesitations about a possible LVC status can arise with respect to existential constructions with nouns introducing events or properties (see test LVC.1 [N-PRED]) as in:
Namely, the noun keeps its original sense and the existential verb to be or to have brings no additional meaning. However, a candidate LVC must also pass test LVC.4 [V-REDUC]. This requires the modification of the noun by the verb's subject, which is impossible with impersonal and empty subjects like there. Therefore, such candidates cannot be LVCs.
Note, however, that existential expressions themselves can be VMWEs of the VID type. For instance, in the French example il y a des plaintes it there has complaints there are complaints, two dependents of the verb a has are lexicalized: il it and y there , therefore it is a VID (see test S.2 [1DEP]).
If at least one of the five LVC tests (9 to 13) is not passed, the candidate is not considered an LVC. For the sake of a deterministic VMWE categorization and higher inter-annotator agreement, we admit a definition of an LVC which might seem more restrictive than some linguistic studies usually assume. Thus, we exclude from the LVC scope:
- expressions in which the verb's syntactic subject is not necessarily the noun's semantic subject, like to give courage or to make an impression. These candidates do not pass test LVC.4 [V-REDUC].
- expressions where the lexicalized nominal dependent of the verb is its subject, as in the problem lies in something; these candidates do not pass test LVC.4 [V-REDUC].
- expressions with aspectual verbs, as in to start, to pursue, to stop a walk. These do not pass test LVC.3 [V-LIGHT] since they add (aspectual) semantics to the noun. The only exception is when the noun itself is already aspectual, as in to come into bloom
Pure operator verbs, i.e. such verbs which never have any semantics per se but only carry the grammatical (tense, mood etc.) information, seem to contradict the intuition behind a VMWE. Namely, they usually select a whole semantic class of nouns. For instance to commit selects any negative act (a crime, a suicide, a theft) and to perform selects any activity (a task, an experiment, a miracle). In this sense, their complements resemble open slots and the whole combinations resemble collocations. However, for the sake of a deterministic VMWE categorization and higher inter-annotator agreement, we do include verb+noun combinations with pure operator verbs, such as to commit a crime and to perform a task, into the LVC category. This is because such combinations pass all tests (LVC.0 through LVC.4). We found no other reliable tests which would distinguish such productive cases from less productive ones like to make a decision. In particular, some studies (e.g. Bonial 2014) show that there exist no truly productive light verbs. Therefore, all examples cited here to be classified as LVCs.
No, the IRV category only includes (some) combinations of a head verb with a reflexive clitic. As indicated in the borderline cases page of IRV category, other pronouns, whenever lexicalized, trigger the VID category. Recall that whenever more than one dependent of the verb is lexicalized (including or not a reflexive clitic), the VMWE is always categorized as an ID
The only nominal VMWE variants within our annotation scope are those:
- headed by the gerund stemming from the head verb of the VMWE - taking of the decision, and
- in which a noun stemming from a VMWE is modified by a participle or a relative clause headed by the verb stemming from the same VMWE - the decisions taken yesterday, the decision which he took.
Other nominalizations are excluded:
puesta a punto setting to point set-up
For practical reasons (e.g. compatibility with an existing annotation, or usefulness for a particular application) they can be considered language-specific VMWEs but then a new category should be defined for them, so as to keep the universal and the quasi-universal categories intact
Once identified in a text, each VMWE is to be assigned to exactly one category. Note that in this version of the guidelines we no longer admit "hesitation labels" (e.g. LVC/VID) used in the pilot annotation. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.
The goal of test LVC.1 is to identify whether a noun is predicative, that is, whether it requires at least one semantic argument. For many classes of abstract nouns, however, it can be tricky to apply the test. We advise listing in a separate document those classes of nouns that pass test LVC.1 in your language. Language teams can also provide links to the documentation of semantic annotation projects such as NomBank for English, which usually include tests and descriptions that help identifying semantic arguments.
We suggest considering that the following categories pass test LVC.1:
Ο Γιάννης έχει συνάχι = ο Γιάννης είναι άρρωστος (αρρώστεια is a hypernym of συνάχι)
Relations:
Ο Γιάννης έχει σχέση με κάποιον = Ο Γιάννης σχετίζεται με κάποιον
Ο Γιάννης έχει επαφές με κάποιον = Ο Γιάννης επικοινωνεί με κάποιον (επικοινωνία is a synonym of επαφή)
Mental content (internal to a cognizer):
Ο Γιάννης έχει ανησυχία = Ο Γιάννης ανησυχεί
Ο Γιάννης έχει μια ιδέα = Ο Γιάννης σκέφτεται (σκέψη is a synonym of ιδέα)
Ο Γιάννης έχει την άποψη = Ο Γιάννης κρίνει (κρίση is a synonym of άποψη)
John has a flu = John is ill (illness is a hypernym of flu)
Relations:
John has contact with somebody = John contacts somebody
John has an affair with somebody = John is involved with somebody (involvement is a synonym of affair)
Mental content (internal to a cognizer):
John has a worry = John worries
John has an idea = John thinks (thought is a synonym of idea)
John has an opinion = John believes (belief is a synonym of opinion)
Miha je v dvomih Miha is in doubts = Miha dvomi Miha doubts
Miha je mnenja Miha is of opinion = Miha meni Miha believes
Miha ima predstavo/pojma Miha has an idea = Miha meni Miha thinks (predstava, pojem are synonyms of idea in this context)
Please notice that events and states that have no semantic arguments do not pass test LVC.1, even if they have verbal/adjectival paraphrases:
Informational content (external to a cognizer): information, news
Informational content (external to a cognizer): informacije, novice information, news
Finally, notice that not any verb + predicative noun combination forms an LVC. Additionally, the verb needs to be "light", not adding semantics to the noun. The remaining LVC tests guarantee this.
Most of the time, it is easy to test whether a determiner is lexicalized by searching alternatives in corpora (or on the web). For instance, the is lexicalized in to kick the bucket because searches for other determiners (this, a, some, three, many, etc.) either do not return any result or return only literal uses of this verb phrase.
However, borderline cases do exist, in which alternatives are rare but possible, specially for LVCs and decomposable VIDs. For instance, while the standard form of the idiom spill the beans forbids some determiners (#spill three/twenty beans), it is possible to find some variation (spill these/many/all/my/his/more/no beans).
We argue that the selection of some determiners (but not all) by a VMWE is comparable to selected prepositions for verbs. Thus, it can be seen as a regular grammatical phenomenon, suggesting that when the determiner varies, then it should not be included in the annotation scope. Possesive pronouns (my, her, their, etc.) and reflexive clitics (myself, herself, themselves, etc.) are exceptions to this rule (see also Section 1.4). Namely, when they are constrained to agree in number and person with the subject (I do my best, *I do your best), they are realized by different lexemes, i.e., strictly speaking, they are not lexicalized. We consider, however, that - with respect to lexicalization - they constitute single lexemes inflecting for number and gender.
Patricular language teams may of course adopt their own criteria for annotating partly frozen determiners. Then, these decisions should be documented in language-specific guidelines.
It depends. In many Indo-European languages (including Germanic, Romance and Balto-Slavic families), verbal chains using auxiliary and modal verbs are used to express tense, mood, modality and aspect. This is a regular linguistic phenomenon, fully productive, that can be applied to any verb and should not be annotated at all.
On the other hand, some languages have idiomatic compound and serial verbs, that is, VMWEs whose lexicalized components are two verbs, and where of them does not express tense, mood, modality and/or aspect with respect to the other one. Therefore, we have created a new category in edition 1.1 to annotate these constructions, called multi-verb construction (MVC), covering examples such as:
to make do
vouloir dire want say to mean
voler dire want say to mean
można wytrzymaćone can standthe situatiion is reasonably good
ouvir falar hear speak to know/remember vaguely
The guidelines determine that only lexicalized components should be annotated. Therefore, we suggest that, in such cases, if the NP is compositional, only the head of the NP is included in the scope of the LVC. This may lead to the annotation of odd LVCs that actually never occur by themselves without a modifier. This is not a problem and is already the case for other VMWEs, e.g. the ones that only occur with a determiner, but the determiner is not lexicalized. The only cases where the NP should be included as a whole is if the complement is a non-compositional MWE, so that it would not make any sense to annotate only the head.
κάνω στάση εργασίας to-make stop work.SG.GEN to go on strike, to strike → the expression στάση εργασίας is non-compositional (term)
mener une vie de débauche to have a life of pleasures
faire un faux pas make a false step to commit a faux pas → the expression faux pas is non-compositional
fazer roleta russa to make russian roulette to play russian roulette → the expression roleta russa is non-compositional
ter uma situação financeira/profissional/estável to have a financial/professional/stable situation
Notice that these suggestions also apply to LVCs whose nominal complements are introduced by prepositions (i.e. verb+PP LVCs). As usual, the preposition should be included if it is lexicalized and then the NP introduced by the preposition is analyzed exactly as described above.
If the complex dependent is an acronym, you may want to add the textual comment "PART" to indiate that only part of the full version is lexicalized (generally, the head), just like for contractions and compounds.
Depending on the language, aspect can be realised by various lexical, morphological and syntactic means.
- We consider aspect a morpological feature in the following cases:
- Perfective or continuous aspect introduced by inflection and/or analytical tenses:
- Perfective or imperfective aspect inherent to the verb (independently of its inflected form), recognisable either by a prefix or by an ending:
John was making a presentation
he called her while having a walkJan was een presentatie aan het maken Jan was making a presentationpełnić rolęfulfil.IMPERF a roleto play a role
wypełnić rolęfulfil.PERF a roleto play a role
wypełniać rolęfulfil.PERF a roleto play a roleTaja je postavljala vprašanjaTaja was asking questions
ves čas je dajal napačne napovedi he was always giving wrong forecasts - We consider aspect a semantic feature in the following cases:
- Starting, continuation or completion is expressed by precise verbs which usually modify other verbs:
η Μαρία άρχισε τη συζήτηση Maria started the conversation
ο Γιάννης διέκοψε την κουβέντα John interrupted the discussionAnthony started his presentation in advance
the weather interrupted the transmission twice
we kept our show regardless of the reactionsde regen onderbrak de wedstrijd the rain interrupted the matchTomaž je začel svoje predavanje Tomaž started his lecture
Politik je nadaljeval svojo napoved reform the politician continued his forecast about reforms
naredili bomo konec onesnaževanju we will make end to pollution we will put an end to pollution
In Test LVC.3, we verify whether the verb adds "light" semantics to the predicative noun. When aspect is expressed as a morphological feature, such as in the first item above, we consider that the verb is light and test LVC.3 passes. However, when aspect is a semantic feature rather than a morphological feature, test LVC.3 fails and we do not have an LVC.
The previous version (1.0) of the annotation guidelines contained Test 10 [N-SEM], which checked if the noun in an LVC candidate preserves one of its original senses. If it did not, the candidate was not an LVC.
In the current version of the guidelines we have abandoned this test because:
- it proved hard to establish the list original senses of a noun,
- this test was superfluous with respect to Test LVC.4 [V-REDUC],
- in some verbal idioms (VIDs) the noun also keeps its original sense, so the test can be misleading for the LVC vs. VID distinction.
Section 7.2
Adding new examples in your language
It is often useful to have examples of a phenomenon shown in your own language. Examples in the guidelines are presented as in the template below:
Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. Languages use color codes according to their language groups. See the section on notation for more information.
In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.
Editing or adding examples
The shared examples edition spreadsheet used in previous versions of the guidelines is not used any more, all modifications are done on online and are visible immediately. To edit or add examples to the guidelines, you need to create an account on the guidelines 1.3 examples edition platform. You also have to ask Carlos Ramisch or Agata Savary to grant you the edition rights for your language.
Once you are logged in, you will see some buttons close to each example.
- The 'copy' button copies the source of the example, and is useful if you want to copy the example of another language and then translate it.
- The 'source' button is always available for languages you have the right to edit, and allows you to edit the example's XML-like source code, as described below.
- The 'edit' button is only shown for examples that follow the formatting rules, and allows you to edit the example using a user-friendly interface.
Instructions to create well formatted examples (or correct the ill-formatted ones in 'source') are available in the example edition instructions.
When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language (use the 'copy' button), and then adapting it to your language. You can then paste the example in your language's 'source' mode. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example, you should always check if it is relevant in the context.
If there is something wrong or suspicious with your example, the interface will show an error or warning message.
If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the example empty or add a n.a. comment.
Examples with tags
Let us analyse the English example below, shown in 'source' mode:
MWEs with <lex>their lexicalized components</lex> in English are indicated like this.
As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex>
and </lex>
. When writing an example, you will often have to use XML tags. We describe below the most important ones.
Bold: you should surround lexicalized components with the tags <lex>
and </lex>
. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>
. This code is presented as follows:
- He will take a shower
Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe>
and </nmwe>
tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe>
yields the following:
- This is not an MWE
Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u>
and </u>
. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe>
yields the following:
- This is not an MWE
Latin-script transcription:
You can optionally provide latin-script transcription if your language does not use latin characters.
Latin-script transcriptions must be surrounded by the tags <latin>
and </latin>
.
For example, the code الدرس <latin>ad-dars</latin>
generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.
- الدرس ad-dars
Gloss icon:
You should also provide English glosses and translation for your examples.
Glosses and translations should always be provided in English, and never in another language.
Glosses must be surrounded by the tags <gl>
and </gl>
.
Translations must be surrounded by <trans>
and </trans>
.
English examples can also use the tag <trans>
to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans>
generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>
.
- défendre son bifteck defend one's beefsteak to defend one's interests
Comments:
Some examples are presented followed by an explanation or comment, in normal font (black color). This is done by using the tags <n>
and </n>
. For example, the code some words <n>→ further details</n>
generates this:
- some words → further details
Newline:
Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they can be presented on separate lines using the tag <br/>
. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). This tag will be treated by the 'edit' interface to break examples that can be edited separately. For example, the code example 1 <br/> example 2 <br/> example 3
will be rendered as follows:
- example 1
example 2
example 3
Inside normal text, you may also use tags such as <i>
(italics), <strong>
(bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.
Section 7.3
Annotation platform FLAT
The annotation will be performed using the online annotation platform FLAT. The documentation of the platform annotation is provided in a separate document. Check the useful links below:
- The FLAT user manual for the PARSEME annotation guidelines version 1.2
- Link to the PARSEME shared task FLAT platform
Section 7.4
Best practices
Annotating VMWEs in text is a hard task. Many tests are semantic and require not only a strong knowledge about the language, but also knowledge of advanced notions in linguistics. As a consequence, ensuring annotation quality and, above all, intra- and inter-annotator consistency, is a challenge. We provide here a set of hints that you can use to try to optimize the annotation effort and ensure the quality of the resulting corpus.
Resources and people
This website only covers the annotation guidelines. Do not forget that many other resources are available on the PARSEME shared task 1.1 website. That website is not for system authors, but for language leaders, annotators and organizers. It contains many useful data, notably the names and contacts of people that can help you, and user manuals for FLAT, for the language leaders, etc. Also, you can use the mailing lists if you need to ask questions that could be relevant for other teams as well. In short, don't be shy to ask if you would like to do something but you're not exactly sure where to start :-)
NotVMWE label
The new FLAT configurations for edition 1.1 allow you to use an optional annotation label called NotVMWE. This is not a new VMWE category, but an auxiliary label which simply means "this is not a VMWE". NotVMWE is an optional and useful label you can use to indicate that something should not be annotated, specially if it is a borderline case. Adding this annotation allows you to add a textual comment saying why you decided not to annotate this construction (e.g. after discussing it with fellow annotators and recording the decision in the list of solved cases).
While you don't need to use this label, we recommend that you use it for challenging/hard cases which, in the end, you decide not to annotate as a VMWE. This kind of annotation will be useful when performing consistency checks. Of course, NotVMWE labels will all be removed in the final released corpora, since this kind of information is irrelevant for shared task participants.
List of solved cases
In edition 1.0, some languages have ensured consistency by keeping a separate shared document (e.g. a Google spreadsheet) where hard/challenging cases were documented. We advise language leaders to implement such a list of solved cases. This allows all annotators to contribute to the discussion of hard cases, and to reach a common decision that can be later applied systematically to all occurrences of the expression and for similar expressions. From our experience, this greatly enhances the satisfaction of annotators and saves some valuable time during the consistency checks. Even for languages that have a single annotator, she/he can keep a personal list of difficult cases and their decisions, to ensure intra-annotator consistency.
Consistency checks
Once all files have been annotated, language leaders will perform the final consistency checks using semi-automatic tools. During these consistency checks, all occurrences of a single expression annotated by all annotators will be shown together. There, language leaders may change annotations performed by individual annotators if they are incoherent with the other annotations. Therefore, do not worry too much if you are unsure about an annotation. Try to be as consistent as possible, but if you do not remember a particular annotation performed earlier, it is not necessary to search through the corpus on FLAT (this is quite time-consuming). If there is some minor inconsistency, it will probably be corrected later by the language leader. But note your decision down on the list of solved cases so that next time you come across the same expression (or a similar one) you do not spend so much time thinking about it.
Intuition and tradition vs. guidelines
You may sometimes (often) find that the guidelines do not reflect your intuition about a given construction, or that they contradict the linguistic tradition and literature in your language. We understand that this is frustrating, but please, remember that our main objective is achieving universal modelling of MWEs while preserving diversity. Therefore, please refrain from using undocumented criteria (a.k.a. intuition), or tests that are only known/documented in your language.
The guidelines were designed taking feedback from many language teams into account. They are also meant to continuously evolve, and we do count on you to play an active role in this process. Therefore, if you disagree with their current version, please, choose one of the two options:
- Follow the guidelines anyway to ensure the corpus-to-guidelines consistency, but express your criticism (documented with glossed and translated examples in your language), best via Gitlab issues. You may also add comments to those annotations which you would like to modify once the guidelines have been enhanced.
- Create a language-specific section for the guidelines, describing your own tests and decision trees. We will be happy to publish it online.
Inter-annotator agreement
Usually, data annotation campaigns require measuring inter-annotator agreement (e.g. kappa) to verify that the guidelines are clear and that the annotators are well trained. We encourage language teams to measure inter-annotator agreement. However, in the PARSEME shared task, the organizers do not set any hard threshold on the kappa value required to accept your annotations as part of the shared task. This is a collaborative effort, so we do not feel comfortable with making such requirements to language teams.
Furthermore, VMWE annotation is a very hard task so inter-annotator agreement is expected to be low. We recommend that language teams use complementary tools and resources to compensate for the low agreement, such as the list of solved cases and consistency checks mentioned on this page. After the annotation is completed, we may ask you to double-annotate a sample of your data so that we can calculate inter-annotator agreement, for instance, to report it on a corpus description article. But you should not worry too much about this: do your best in trying to understand the guidelines, do not hesitate to suggest improvements, and try to train annotators as much as possible, for instance, with pilot annotations and discussions. This way, you will ensure that the data released in the shared task for your language will be of high quality. And remember you will have the opportunity to improve it incrementally for the next shared task.
TODO label
We have introduced a new label on FLAT called "{change-me} TODO". This label is a temporary mark-up used to indicated that a given VMWE must be dealt with by a human annotator. It will be used when a corpus is automatically converted and some annotations must be manually checked. For instance, the OTH category from shared task 1.0 disappeared in edition 1.1. Therefore, all VMWEs annotated as OTH in the 1.0 corpora will be automatically converted using the TODO label. This means that all TODO labels must be changed into a valid new category (e.g. VID). In the final annotated corpora, any remaining TODO label will be removed, since this is not actually a VMWE category but just an auxiliary label.
Existence questions and corpus queries
Some tests ask if is possible/impossible to find some attested variant of a candidate. While for many cases this is straightforward (the variant can be easily found), some borderline cases will inevitably occur in which it is hard to tell if a given variant is impossible or just very rare.
Decisions for hard cases like this should not be made based solely on introspection and intuition. In case of doubts, we recommend that annotators:
- check existing lexicons for their languages
- perform corpus queries using any available large raw monolingual corpus
- run web queries, e.g. using Sketch Engine, Linguee or plain Google
- discuss the case with other annotators, reach a decision and mark it in the list of solved cases
In all cases, the list of lexicons, monolingual corpora and/or web platforms to consult should be agreed upon in advance by all annotators.
Section 8
Glossary
Candidate VMWE
A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.
Collocation
A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:
играя футбол to play football
drastically drop
el diagrama muestra the diagram shows
coger el tren to take the train
przyznać rację to admit right to admit that someone is right
uprawiać sport to practice sports
wzruszać ramionami to shrugging one's shoulders
drastično zmanjšati drastically reduce
Cranberry word
A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:
jemanden einen Besuch abstatten
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
sprawiedliwości stało się zadośćjustice has been done
Extended nominal phrase
An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:
- noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
- prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
- noun phrases with case markers
- noun phrases with postpositions
преди всичко before everything
dla wszystkich for everyone
z prawdziwego zdarzeniafrom a true event genuine
ENP is close to the UD understanding of the nominal phrase.
Particles
Particles are hard to distinguish from homographic prepositions:
ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
to get up a hill
jestem za ustawąI an for the lawI am in favor of the law
The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:
den See umfahren to drive around the lake
Ongelukken kunnen worden voorkomen accidents may be prevented
Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.
Reflexive clitics
Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:
Semantic argument
A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.
- Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to
specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit
implies the existence of its semantically mandatory participants. For instance, a visit cannot hold
if there is no visitor or no visitee, courage is a property of a being,
a presentation implies the existence of a presenter, of an audience and of a
presented topic. Some participants are not semantically mandatory, for instance the addressee is
not semantically mandatory for a whisper because one can whisper without an addressee.
We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps
delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to
LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the
predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.priti v poštev to come into consideration to be considered
imeti mnenje to have an opinion to believe - Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.
Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.
Subcategorization frame
A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:
- return: [NP]subject + [NP]direct object + [to
NP]oblique
- Example: [my sister]subject returned [the book]direct-object [to the library]oblique
Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).
Syntactic argument
Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.
Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.
Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.
Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.
Syntactic operator
A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:
Angst haben to have fear
ein Verbrechen begehen to commit a crime
to have fear
to commit a crime
tener miedo
hacer ilusión
een misdrijf plegen to commit a crime
Unexpected change in meaning
An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:
the same does not hold for:
That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:
is a VMWE.
Similarly, Test VPC.1 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:
Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
to look up into the sky implies to look into the sky (it is not an IVPC)
Ungrammaticality
Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).
Section 9
Contact
These guidelines were written by many authors. If you have questions, comments, suggestions, you can contact the people in charge of the PARSEME corpora initiative.
You are welcome to also contribute to this initiative in other ways - see why and how.