Annotation guidelines (version 2.0)
Used by the
corpora annotated for multiword expressions
Welcome to the official annotation guidelines of the PARSEME corpora version 2.0.
This version extends the annotation guidelines to all syntactic types of multiword expressions. For previous versions, you can check the index of versions. See also what is new in the guidelines version 2.0 as compared to version 1.3.
Here, you'll find detailed definitons, examples and linguistic tests to guide your decision as to whether a given combination in your language is a multiword expression. Use the table of contents on the left to navigate between sections and the header buttons to show/hide examples.
In addition to these general guidelines, language teams may also provide extra documentation, like lists of borderline cases and decisions taken concerning them. They should all be compatible with these general guidelines.
If you spot errors or if something remains unclear after reading the guidelines, please contact us and we'll do our best to correct the problems.
Authors and contributors (alphabetical order)
Chérifa Ben Khelil, Archna Bhatia, Claire Bonial, Marie Candito, Fabienne Cap, Silvio Cordeiro, Kaja Dobrovoljc, Vassiliki Foufi, Polona Gantar, Voula Giouli, Najet Hadj Mohamed, Carlos Herrero, Uxoa Iñurrieta, Mihaela Ionescu, Iskandar Keskes, Alfredo Maldonado, Stella Markantonatou, Verginica Mititelu, Johanna Monti, Joakim Nivre, Mihaela Onofrei, Viola Ow, Carla Parra Escartín, Manfred Sailer, Carlos Ramisch, Renata Ramisch, Monica-Mihaela Rizea, Agata Savary, Nathan Schneider, Ivelina Stonayova, Sara Stymne, Ashwini Vaidya, Veronika Vincze, Abigail Walsh, Hongzhi Xu.
Developers (alphabetical order)
Quentin Barrouyer, Carlos Ramisch, Agata Savary, Baptiste Souche
Table of contents
- 1 Definitions and scope
- 2 Textual annotation scope
- 3 Categories of MWEs
- 4 Annotation process - entry point
- 5 Tests for VERBAL MWEs
- 6 Tests for NOMINAL MWEs
- 7 Tests for ADJECTIVAL and ADVERBIAL MWEs
- 8 Tests for FUNCTIONAL MWEs
- 9 Language-specific tests
- 10 Annotation management
- 11 Glossary
- 12 Contact
Section 1
Definitions and scope
This document aims at formalising idiomaticity in language via guidelines for manual annotation of multiword expressions (MWEs) in running texts. They were defined with several objectives in mind:
- Universality: the typology, terminology and methodology are unified across many languages (currently about 30), while leaving room for truly language-specific features
- Tractability: the cross-linguistic formalisation of idiomaticity should be done in a computationally tractable way
- Reproducibility: the annotation process should be as much reproducible as possible.
- The annotation flow follows a decision diagram driven by linguistic tests. For two annotators examining the same MWE candidate, if their answers to the tests are the the same, the outcome of the annotation is also the same.
- Semantic non-compositionality is considered as the major property of MWEs to be modeled. From linguistics we know that non-compositionality is a matter of scale but for the sake of tractability annotation decisions must be binary.
- Semantic non-compositionality is hard to test directly, therefore it is approximated by lexical and morpho-syntactic inflexibility.
- Inflexibility tests are partly driven by the syntactic structure, therefore there is strong dependence on the underlying syntactic theory. PARSEME annotation largely relies on the Universal Dependencies for the annotation of morpho-syntax, due to the shared objectives of universality.
Section 1.1
Notation
The notational convention used throughout the document is the following:
- Italic is used to display example sentences and expressions.
- Bold is used to highlight the lexicalized components of a candidate MWE inside an example (positive or negative).
- Underline is used to focus the reader's attention on the important part of an example
- An asterisk (*) precedes ungrammatical examples.
- A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
- Different colors are used to display examples:
- Red is used for counter-examples, that is, expressions which look like MWEs but are not one, whatever the language.
- According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
- Shades of green are used for positive examples in Germanic languages.
- Shades of blue are used for positive examples in Romance languages.
- Shades of orange are used for positive examples in Slavic languages.
- Shades of pink are used for positive examples in other language families.
- Examples are preceded by the 2-letter language code in parentheses
- Examples can be shown and hidden using the toggle buttons in the header.
Section 1.2
Words and tokens
While the definition of an MWE inherently relies on the notion of a word, manual annotation is performed on texts which are automatically tokenized. It is therefore important to understand the distinction between words and tokens in the context of MWEs.
A word is a linguistically (notably semantically) motivated unit. The detection of words is, thus, language-dependent and annotation experts should have a clear idea of how to define it for their own language (even if this definition proves hard in general).
See also the UniDive task on harmonizing the definition of a “syntactic word” across languages.A token is a technical and pragmatic notion, defined according to more or less linguistically motivated clues and depending on the particular tokenization tool at hand. Note that the notion of a token is ambiguous in NLP. It can also mean an individual occurrence of a certain linguistic unit, as opposed to a type, i.e. the set of all surface realisations of a unit. In these guidelines, we refrain from using this second sense.
Tokens should ideally be as close as possible to words. However, in practice - due to the hardness of the (automatic) tokenization task - the relation between tokens and words is not always 1-to-1. The following cases occur:
- A token coincides with a word:
- Several tokens build up one word, like in abbreviations, possessive markers, words with "accidental" separators, inflected or derived forms of foreign names, etc. In this case we speak of a multitoken word (MTW): The pipe symbol '|' indicates token separation in these examples
- One token can contain several words, like in contractions and compounds. In this case we speak of a multiword token (MWT). Identifying MWTs is important because they can be potential candidates for MWEs. However, defining what is a word and a MWT is a hard question and language-specific MWT tests are needed to this end. Examples of MWTs include: See also the representation of MWTs in Universal Dependencies. The precise word forms cannot always be straightforwardly deduced from the MWT containing them and vice versa, as in don't, della, du, etc.
παίρνωperno take
έναςenas a
απόφασηapofasi decision
καλός kalos beautiful beautiful
περί peri about about
ხატავსxatavs draws
ჩვენčʻven we
გულიguli heart
год|. year
Wie geht|'|s How goes it How are you
υπΔρ υποψήφιος διδάκτορας PhD candidate
pp|. pages
Pandora|'|s
a|/|f|. a favor in favor
Rte|. remitente sender
გაეროgaero United Nations Organization, UN
ბ-ნიb|-ni ბატონი, Mister
Pandora|'|s Pandora's
SMS|-|ować to write an SMS
d|-|voastră polite "you"
str|. pages
le|-|to
tweet|-|овање tweet|-|ovanje to write tweets
Apfelbaum = Apfel+Baum apple treeapple tree
al = a+el to+the to the
compárese = compare+se compare SE_PARTICLE be it compared
suicidarse = suicididar+se suicide SELF to commit suicide
jarleku = jar(ri)+leku sit+place seat
b'fhearr = ba+fhearr be.COND better prefer
მაგიდაზე = მაგიდა+ზეmagidaze = magida+ze table+on, on the table
appelboom = appel+boom apple treeapple tree
pannenkoek = pan + koek pancake
robiłem=robi+łem do.3.SG.PRES+be.1.SG.PAST.AGLI did
żeśmy = że+śmy that+be.1.PL.AGL that-we
новосадски = ново + садски novosadski = novo + sadski Novi Sad (an adjective from a city name)
While a MWE always contains at least two words, the relation between MWEs and tokens can be twofold:
- A MWE contains several tokens, whether each of them coincides with a word or not:
- A MWE contains one (multiword) token:
прочитам от корица до корица to read from cover to cover (5 words, 5 tokens)
wie geht's (2 words, 4 tokens) how goes it how are you
παίζω στα δάχτυλαpezo sta dachtyla play in-the fingers to know very well (3 words, 4 tokens)
to open Pandora's box (3 words, possibly 5 tokens)
dar por sentado 3 words, 3 tokens to give for seated to take for granted
irse de rositas 3 words, 4 tokens to go_self of little_roses to get off scot free
cavalcare l'onda (3 words, 4 tokens) ride the wave ride the wave
ფარდას ჩამოაფარებსpʻardas čʻamoapʻarebs Will cover it with a curtain 'Will make it invisible to others' (2 words, 2 tokens)
robił|em z igły widły made.3.SG.M1+be.1.SG.AGL a pitchfork out of a needle I made a mountain out of a molehill (4 words, 5 tokens)
cair de pára-quedas to fall with parachute to arrive unprepared in the middle of a situation (3 words, possibly 5 tokens) According to new orthography rules, this word would be written 'paraquedas'. Old spelling may still be found in annotated texts, though.
queixar-se-ia complain-self-would would complain (2 words, possibly 5 tokens)
vreči puško v koruzo throw a rifle in the corn to give up (4 words, 4 tokens)
hedh një sy (3 words, 3 tokens) throw an eye take a look
причати на|памет pričati na|pamet to talk by heart to talk not relying on facts (3 words, 2 tokens)
anfangen at-catch to begin
aanvangen at-catch to begin
Note finally that multitoken words are not considered MWEs since they contain one (multitoken) word only:
Whenever the distinction between a word and a token is judged by a particular language team as hard to tackle, a possible option is to consider these two notions equivalent for the needs of corpus annotation.
Section 1.3
Multiword expressions
A multiword expression (MWE) is a (continuous or discontinuous) sequence of words with the following compulsory properties:
- It contains at least two component words which are lexicalised, i.e. always realized by the same lexemes. Only these lexicalized components are annotated. For instance in he paid several important visits to the president, we annotate only the components highlighted in bold.
- Its neutral form forms a weakly connected graph, i.e., in its dependency graph, every (lexicalized) component is achievable from every other component, if directions of the dependencies are disregarded. For instance, in the following MWE
the highlighted components do not form a weakly connected graph but this form in not a neutral one. When transforming it to a neutral form
the connectivity condition is fulfilled.
- It shows some degree of orthographic, morphological, syntactic and/or semantic idiosyncrasy with respect to what is considered general grammar rules of a language. This condition is tested by the decision diagrams documented in in sections 5 to 9. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not considered MWEs.
Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to straightforwardly deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that
However, as non-compositionality is a subjective notion and is hard to test directly, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that MWEs have some degree of semantic non-compositionality that implies limited flexibility.
Depending on the distribution of its neutral form, a MWE can be verbal, nominal, adjectival, adpositional, etc.Verbal MWEs
A verbal MWE (VMWE) is a multiword expression whose neutral form is such that: (i) it has a distribution of a verb, a verbal phrase or a verbal clause, (ii) its syntactic head is a verb.
Note that reasoning in terms of neutral forms is crucial here. A MWE may occur in a variant whose distribution is non-verbal. But when its neutral form is retrieved, the verbal distribution becomes apparent, and such a MWE is considered verbal.
Conversely, some MWEs derive from VMWEs but their neutral forms are not verbal. Such MWEs are considered deverbal nominal, adjectival or adverbial MWEs:
(a) run-down (apartment) - adjectival MWE deriving from to run down
une mise à disposition the fact of making available - nominal MWE deriving from mettre à dispositionmake available
porte-feuille carry-sheets wallet - adverbial MWE
couru d'avance run in advance forgone/predictable - adjectival MWE
Nominal MWEs
A nominal MWE (NMWE) is a multiword expression whose neutral form has a distribution of a noun.
This was a real wild goose chase a foolish and hopeless search for or pursuit of something unattainable.
W antykwariacie znalazła kilka białych kruków In the antique shop she found a few white ravens In the antique shop she found a few very rare books
It may or may not be headed by a noun:
A major challenge in annotating NMWEs is to distinguish them from proper names and multiword terms. Proper names have a special semantic status because they function as names of entities rather than their descriptions. Proper names may contain MWEs and vice versa but most proper names do not pass the linguistic tests proposed here and thus we do
UN Secretary-General - entity name containing a NMWE
Jego Królewska Mość Król Belgii His Royal Majesty the King of Belgium - entity name containing a NMWE
Mutiword terms overlap with MWEs. Examples include:
rok świetlny light year a distance covered by a light ray in 1 year
But many mutiword terms do not pass inflexibility tests either and we consider them semantically compositional (i.e. non-MWEs), as in:
Note that some MWEs whose internal structure is the one of a nominal phrase have a distribution of an adverb, an adposition or an adjective, etc. Those should not be annotated as NMWEs but as functional/adjectival/adverbial:
je fais ça toute seule, les doigts dans le nez I do it alone, fingers in my nose I do it easily
Recall that a MWE may be a multiword token. Deciding what is a word is notoriously difficult, especially in languages exhibiting frequent closed compounds, like Germanic languages. Closed compounds (i.e. compounds in which components are spelled together, possibly with some phonological changes on the border of morphemes) can be idiomatic:
or fully compositional:
or partly idiomatic and partly compositional:
We consider closed compounds as containing several words, and submit them to the PARSEME decision diagrams and annotate them as NMWEs if the tests are passed. We hypothesize that, most of the time, it is straightforward to annotators to identify word boundaries in a closed compound. If this is not the case, language-specific rules must be added. Splitting closed compounds directly in the corpus, if they are not split already, is not recommended, so as to keep the tokenization consistent with the underlying morpho-syntactic annotation.
See also the UniDive task on harmonizing the definition of a “syntactic word” across languages.It happens that only part of a closed compound is idiomatic. For such cases, a UD/PARSEME white paper proposes subtoken spans, e.g.:
This feature is not implemented yet. In the meantime, we suggest annotating the whole token as belonging to the MWE.
We consider that nominal MWEs embrace pronominal MWEs:
I expect no one to come
we love each other
Similarly to functional MWEs (below), pronominal MWEs constitute closed lists of cases, and their inflexibility is hard to test. They are also frequently ambiguous with idiomatic determiners.
I saw a few examples - a DetID
dažs labs šoferis jūtas svarīgs few good driver feels important some drivers feel important - a DetMWE
powtarzał ciągle to samo pytanie he repeated always this the same question he repeated always the same question - a DetMWE
Adjectival and adverbial MWEs
The class of adjectival and adverbial MWEs (AMWEs) includes adjectival idiom (AdjID) and adverbial idiom (AdvID). Those are multiword expressions whose neutral form has a distribution of an adjective or an adverb, respectively.
aiz restēm behind bars in prison - an AdvMWE
średnio na jeża averagely on a hedghog not great - an AdvMWE
They do not have to be headed by adjective or adverbs, as in:
Additionally, we cover AMWEs which derive from verbal MWEs but their neutral form has an adjectival/adverbial distribution (see above), rather than a verbal one. The extent of such MWEs is yet unknown.
Functional MWEs
A functional MWE (FuncMWE) is a multiword expression whose neutral form has a distribution of a function word. We consider four subcategories of FuncMWEs:
- determiner idiom (DetID)
- adposition idiom (AdpID)
- conjunction idiom (ConjID)
- interjection idiom (IntjID)
katru otro dienu every second day every other day
przekaż mu te oto słowa transfer him these here words transfer him these words
co do pierwszego pytania what to the first question as to the first question
neskatoties uz not looking at nevertheless
Functional MWEs constitute relatively short closed lists of cases. We recommend establishing such lists for each language and apply them consistently to corpus annotation (while paying attention to possible ambiguity), like in:
I recognized her by the way she was walking.
rozumiesz co do ciebie mówię? do you understand what to you I say? do you understand what I'm telling you?
Of course, we still need criteria to decide which candidates should occur in such lists. But testing functional MWE candidates for non-compositionality is notoriously hard because they contain few content words (nouns, verbs, adjectives or adverbs) and have syntactic structures in which little flexibility is allowed, even with no presence of idiomaticity. The solution is to be consistent with the FuncMWE-specific decision diagram [add the link] (which is deterministic, whenever the answers to atomic tests remain stable), even if it does not fully conform to out intuitions.
Section 1.4
Neutral forms of MWEs
MWEs occurring in a corpus can have various syntactic structures. For instance, to take someone by surprise can be inflected (they took me by surprise), negated (they did not take me by surprise), passivised (I was taken by surprise), subject to extraction (the surprise by which I was taken). Similarly, a brain washing, can be transformed into a structure with a nominal-adpositional modifier (washing of a brain), an extraction (brain whose washing [did not succeed]), etc..
Since the linguistic tests are structure-driven (cf. e.g. structural tests), there is a necessity to neutralize variation before the tests are applied. In this section we introduce definitions answering these needs.
Neutral form
A neutral form (previously called canonical form) of a MWE or a MWE candidate is its least syntactically marked form which preserves its meaning. We consider that:
- a form with a finite verb is less marked than with an infinitive, a participle, an analytical tense or a modal
- active voice is less marked than passive and other diathesis alternations,
- a non-negated form is less marked than a negated one,
- a form with an extraction is more marked than one without it,
- a form with an adpositional modifier is more marked than one without it,
- a form with interposed complex determiners and quantifiers is more marked than one without it,
- a form with coordination is more marked than one without it,
she has taken him by surprise - the neutral form is she took him by surprise [just now]
she was taking him by surprise - the neutral form is she took him by surprise [and this happened at the same time as ...]
she wants to take him by surprise - the neutral form is she takes him by surprise [, that's her plan]
będą ją pociągać do odpowiedzialności they will pull her to responsibility they will accuse her - the neutral form is pociągną ją do odpowiedzialności they will pull her to responsibility they will accuse her
będą ją pociągali do odpowiedzialności they will pull her to responsibility they will accuse her is a neutral form
chcą ją pociągnąć do odpowiedzialności they want to pull her to responsibility they want to accuse her - the neutral form is pociągną ją do odpowiedzialności they will pull her to responsibility they will accuse her
pociągnęli ją do odpowiedzialności they pulls her to responsibility they accused her is a neutral form
bo by pociągnęli ją do odpowiedzialności they would pull her to responsibility they would accuse her is a neutral form
pociągnęliby ją do odpowiedzialności they would pull her to responsibility they would accuse her is a neutral form; only the finite verb in the conditional form is annotated
pociągając ją do odpowiedzialności pulling her to responsibility accusing her - the neutral form is pociągną ją do odpowiedzialności they will pull her to responsibility they will accuse her
w takich warunkach decyzje podejmują się same under such circumstances decisions take themselves on their own under such circumstances no effort is needed to take decisions - the neutral form is w takich warunkach ludzie podejmują decyzje [bez wysiłku]
the brain whose washing did not succeed - the neutral form is [there was] a brain washing[, it did not succeed for a brain]
la decisione che è stata presa è giusta - the neutral form is hanno preso la decisione giusta
nie mieli cienia wątpliwości they didn't have a shade of a doubt - the neutral form is [nie jest prawdą, że] mieli jakąkolwiek wątpliwość it is not true that they had any doubt
nie od razu Kraków zbudowanoCracow was not built at once Rome was not built in a day - this is the neutral form on its own rather than Zbudowali Kraków od razu they built Cracow at once
metoda kija i marchewki the method of a stick and a carrot offer people things in order to persuade them to do something and punish them if they refuse to do it - this is the neutral form on its own rather than metoda kija i metoda marchewki the method of a stick and the method of a carrot
Neutral form in MWEs containing deverbal forms
We consider that the existence of deverbal nouns, masdars, adjectives and adverbs in MWEs does not imply syntactic marking. For instance, a wild goose chase, a decision maker and a heartbreaking story are neutral forms on their own. Consequently, they are considered nominal and adjectival MWEs, rather than verbal MWEs. Their connection to the corresponding VMWEs, if any (make a decision and break hearts, in the last 2 cases) is made explicit though their subcategories (NV.VID, NV.IVPC.full and AV.VID, respectively).Other examples of such cases include:
Wortbruch word-break a promise which has not been hold - this is a neutral form on its own (a deverbal nominal MWE deriving from ein Wort brenchen to break a word to fail holding a promise)
a wild goose chase - this is a neutral form on its own (nominal MWE); it is not deverbal since chase a wild goose is not a VMWE
during take-off and landing - this is a neutral form on its own (a deverbal nominal MWE, here NV.IVPC.full, deriving from took off)
a run-down apartment - this is a neutral form on its own (a deverbal adjectival MWE, here AV.IVPC.full, deriving from run down)
porte-feuille carry-sheets wallet - this is a neutral form on its own (nominal MWE); it is not deverbal since porter des feuilles not is not a VMWE
couru d'avance run in advance forgone conclusion - this is a neutral form on its own (adjectival MWE); it is not deverbal since courir d'avance not is not a VMWE
la prise en compte the fact of taking into account- this is a neutral form on its own (a deverbal nominal MWE, here NV.VID, deriving from prendre en compte to take into account)
une mise à disposition putting into disposal the fact of making available - this is a neutral form on its own (a deverbal nominal MWE, here NV.LVC.cause, deriving from mettre à disposition to put into disposal to make available)
una storia strappalacrime - this is a neutral form on its own (nominal MWE); it is not deverbal since 'strappare le lacrime' is not a VMWE.
zabawa czyimś kosztem a play at someone else's expenses - this is a neutral form on its own (a deverbal nominal MWE, here NV.VID, deriving from bawić się czyimś kosztem to enjoy oneself at someone else's expenses)
Note that it is notoriously hard to distinguish deverbal nouns, adjectives and adverbs from verbal inflected forms like gerunds, participles, etc.
she was breaking his heart - verbal MWE (VMWE) vs. heart-breaking story - deverbal adjectival MWE (AV.VID)
ogni volta mi rompi le scatole verbal MWE (VMWE) vs. sei un rompiscatole - deverbal nominal MWE (NV.VID)
łamać serca to break hearts - verbal MWE (VMWE) vs. łamanie serc breaking of hearts - deverbal nominal MWE (NV.VID)
The underlying morpho-syntactic annotation might help in decision making.
Non-unicity of a neutral form
Note that a given MWE type often has more than one neutral form:In previous versions of these guidelines, a neutral form was called canonical form.
Section 1.5
Lexicalized components and open slots
Just like a single word, notably a verb, the headword of a MWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise. Similarly, the possessive modifier is compulsory in the NMWE someone's right-hand man.
Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone/someone's is not. The headword of a MWE, in its neutral form, is always considered lexicalized. When it can be replaced by another word, like in to make/take a decision, we consider that these are two different MWEs, although possibly synonymous.
Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:
- Max took the bull by the horns.
- The news took John by surprise.
- Bob took part in the inquiry
- Money burns a hole in Bob’s pocket.
Note on terminology: our definition of lexicalization applies to the component words of a MWE, and not to the whole MWE. This might be counter-intuitive, given the traditional definition of lexicalization as a diachronic process by which a lexeme (word or phrase) acquires the status of an autonomous lexical unit, that is, "a form which it could not have if it had arisen by the application of productive rules" (Bauer 1983, p. 50, apud Lipka et al. 2004, p. 6). In other words, traditionally linguistic studies would use the term "lexicalized" to refer to the whole MWE, as it has idiosyncratic behavior and thus must be listed in the language's lexicon. Our definition, however, stems from computational linguistics and in particular from the parsing literature, in which lexicalized rules refer to rules containing terminal lexemes attached to non-terminal symbols, and a lexicalized grammar is a grammar in which the rules are lexicalized (Manning and Schütze 1999, p. 417; Jurafsky and Martin 2009, p. 507). In this sense, we regard MWEs as syntactic subtrees in which some of the nodes are annotated with the corresponding terminal symbols that are always realized by the same lexeme (i.e. the lexicalized components) and others are non-terminal nodes that can be realized by any lexeme taken from a larger class (i.e. the open slots).
Special case of adpositions
Adpositions have a special status with respect to the notion of lexicalization in verbal MWEs. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE, i.e. it belongs to the valency properties of the verb and is not lexicalized. Other cases include:
In functional MWEs, however, we consider that selected prepositions have a different status: they are lexicalized in FuncMWEs if they are always realized by the same lexemes. This concernes prepositions both preceeding and succeeding the headword:
so that
given that
a lot of
in addition to
in spite of
in presence of (in its presence)
au sein de
suite à
lors de
avant que
avant de
alors que
bien que
en l'absence de (en son absence)
à l'époque
à l'époque de
dato ciò
dato che
This difference in considering adpositions as lexicalized in functional MWEs, but not in verbal MWEs, is justified by several factors:
- headwords in functional MWEs usually subcategorize for one preposition
- the whole Functional MWEs, together with its selected prepositions, can be most often replaced by a single word (which shows the lexicalized character of the whole string), which is not the case with verbal MWEs
- this choice better aligns with the principles of Universal Dependencies, where some of such functional MWEs, together with their selected prepositions, are annotated with the fixed relation
Special case of reflexive clitics
Reflexive clitics in inherently reflexive verbs and possesive pronouns in verbal idioms also have a special lexicalization status (see also the note on more or less frozen determiners). In some languages, the same reflexive clitic or possesive pronoun is used regardless of the person and number, inflecting for case only:
намирам се find se.REFL to be (somewhere)
smiješ se laugh.2.SG self You laugh
smiju se laugh.3.PL self they laugh
znajdujesz się find.2.SG.PRES self you find yourself
znajdują się find.3.PL.PRES self they find themselves
pójdą na swoje they will go on ones's own they will establish their own household
pójdziemy na swoje we will go on ones's own we will establish our own household
smejiš se laugh.2.SG self You laugh
smejijo se laugh.3.PL self they laugh
радујеш се raduješ se look.2.SG.PRES forward to you look forward to
радује се raduje se look.3.SG.PRES forward to She/He looks forward to
In other languages, reflexive clitics and possesive pronouns agree with the subject and the verb:
ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
Τα παιδιά έκαναν την πλάκα τους Ta pedia ekanan tin plaka tus The kids made the fun their The kids had fun
tú te quejas you self.2.SG complain You complain
tu te trouves you self.2.SG find you find yourself
je vide mon sac I empty my bag I express my secret feelings
elle vide son sac she empties her bag she expresses her secret feelings
wij vergissen ons we are mistaken self.2.PL we are mistaken
tu te queixas you self.2.SG complain You complain
tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking
It this case, the clitic or the pronoun is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic and the possesive prounun is a unique lexeme (with lemma się, se, sich, etc. or swój, son, one's) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs and verbal idioms.
Section 1.6
Multiword expressions versus collocations
Collocations are not considered MWEs in this task and should not be annotated. However, the boundary between both categories is not always easy to define and should be handled with care.
We understand collocations as combinations of words whose idiosyncrasy is purely statistical. In other words, tokens in collocations tend to co-occur with each other more often than expected by chance, but they show no substantial orthographic, morphological, syntactic and (most notably) semantic idiosyncrasy. In this way we oppose MWEs to collocations.
Note that other authors understand collocations slightly differently. E.g. for Sag et al. (2002), collocations are any statistically significant cooccurrences, i.e. they include all forms of MWEs. For Baldwin and Kim (2010), collocations form a proper subset of MWEs. According to (Melcuk, 2010), collocations are binary semantically compositional combinations of words subject to lexical selection constraints, i.e. they intersect with what is here understood as MWEs.
Some combinations happen to be very frequent and are perceived as "frozen":
كتاب إشترى buy a book
وجبة قدمserve a meal
the graphic shows
to take a bus
el gráfico muestra the graphic shows
coger el autobús to take the bus
galdera bati erantzun question one-to answer answer a question
autobusa hartu bus take to take the bus
il grafico mostra the graphic shows
prendere un bus to take a bus
entrar em cartaz enter into poster arrive in theaters (for a movie) (the MWE is em cartaz in poster in theaters, the verb just usually collocates with this MWE)
据 报道 according-to report according to what is reported
However, applying regular lexical alternations to them does not markedly impact their meaning.
فطور القدم serve a breakfast
جريدة إشترى buy a newspaper
παίρνω το τραίνοperno to treno take the train
el diagrama muestra the diagram shows
coger el tren to take the train
zalantza bati erantzun doubt one-to answer answer a doubt
trena hartu train take to take the train
il diagramma mostra the diagram shows
o recorde foi quebrado the record was broken
entrar/estar/permanecer/ficar/continuar/ter em cartaz enter/be/remain/stay/continue/have in poster
The difficulty of distinguishing collocations from MWEs lies in the fact that lexical variability is relevant to some MWEs:
имам твърда/дебела глава to have a thick head, to be stubborn and not listen to advice
darse/tomar una ducha give.self/take a shower take a shower
eskola/klasea eman class give to give a class →'eskola' and 'klasea' are synonyms in Basque
zamarznąć na kość/lód/sopel to freeze to bone/ice/icicle to freeze strongly
chutar o balde/pau da barraca to kick the bucket/the tent's stick to act irresponsibly
However, the extent of the vocabulary concerned by this variability is different for collocations and MWEs. Namely, a head verb in a collocation usually selects a whole semantic class for each of its required arguments. For instance, the verb to take
Some Light-verb constructions (LVCs) and multiverb constructions (MVCs), as well as the corresponding devarbal nominal, adjectival and adverbial MWEs (VMWENom, VMWEAdj and VMWEAdv [add links to the pages of these categories]), belong to the gray zone between MWEs and collocations in the sense that some operator (light) verbs seem to select large classes of nouns, as in to make a speech/declaration/remark/etc. However, some studies (e.g. Bonial 2014) show that there is no such thing as truly productive light verbs (e.g. to give a look vs. to give a stare). Therefore, we do include LVCs and MVCs in our annotation scope.
Section 1.7
Multiword expressions versus metaphor
Another phenomenon closely related to MWEs is metaphor. According to (Shutova 2010), "a metaphor occurs when one concept is viewed in terms of the properties of the other. In other words it is based on similarity (presence of common characteristics) between two concepts".
Many MWEs, especially idioms, are based on metaphors. For instance, to take the bull by the horns means to address a problem (the bull) starting with its most challenging aspect (the horns). To set the world on fire is to do something extraordinary and get the admiration (set on fire) of other people (the world), to put all one's eggs in one basket means to rely on one particular course of action (a basket) for success rather than giving oneself several possibilities.
However, verbal metaphors are not always MWEs. Consider the newspaper title "simple steps to lift your dark cloud of stress", and the extract of a poem by Wordsworth, cited by Shutova: "and then my heart with pleasure fills, and dances with the daffodils". The metaphorical expressions to lift dark cloud of stress to relax and my heart ... dances with the daffodils I am happy are not semantically compositional. These expressions, however, were probably constructed for the needs of one article/poem only and are not sufficiently established in the common vocabulary to be considered MWEs.
The distinction between MWEs and metaphors is a relatively unstudied and open question. There are few precise tests, other than statistical, which would allow human annotators to resolve it reliably. Gross (1982) gives some clues on the reproducibility and predictability of metaphors. We suggest that the annotators take notes of such cases and discuss them within their communities, both local and international.
Section 2
Textual annotation scope
In this annotation task, all occurrences of all syntactic types of MWEs are to be annotated in the text.
We annotate, as integral parts of MWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see IVPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.
Similarly, auxiliairies and modals accompanying the main verb of a MWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the MWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.
Both continuous and discontinuous sequences of lexicalized components of MWEs are annotated.
Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or idiomatic verb-particle constructions. Verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are annotated optionally and experimentally as inherently adpositional verbs (IAVs). On the other hand, prepositions selected by functional MWEs, such as in spite of, according to, etc. are considered lexicalized.
The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a MWE or not. We do not annotate their internal syntactic structure. We do annotate, however, MWEs embedded in other MWEs. For instance, the MWE to make a faux pas contains the embedded MWE faux pas and both are to be annotated as different MWEs. Embeddings are discussed on some category's pages, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).
Once identified in a text, MWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single MWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.
Section 3
Categories of MWEs
The top level of MWE categories is motivated by a mixture of morphosyntactic and functional criteria, inspired from the classification of syntactic relations in Universal Dependencies, and includes:
- verbal MWEs (VMWEs), with several subcategories (defined and annotated in versions 1.0 to 1.3 of these guidelines)
- nominal MWEs (NMWEs), including nominal idioms and nominal MWEs derived from VMWEs
- adjectival and adverbial MWEs (AMWEs), including adjectival and adverbial idioms, with separate subcategories for those derived from VMWEs
- functional MWEs (FuncMWEs), including multiword determiners, adpositions, conjunctions and interjections
This classification, covering all syntactic types of MWEs, is new in version 2.0 of the guidelines. Previous versions covered verbal MWEs only. For a summary of changes with respect to edition 1.3, see the what's new file.
In practice, to identify and categorize MWEs during manual annotation, one must start at the unique entry point and follow the decision diagrams specific for the distribution of a MWE candidate:
Section 3.1
Categories of verbal MWEs
We distinguish the following categories of verbal MWEs:
- Two universal categories, i. e. valid for all languages participating in the task:
- Light verb constructions (LVCs) with two subcategories:
- LVCs in which the verb is semantically totally bleached (LVC.full)
- LVCs in which the verb adds a causative meaning to the noun (LVC.cause)
حكم أصدر pronounce judgmenthe pronouncd a judgmentдържа под контрол to keep under controleine Rede halten a speech holdto give a speech(OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)παίρνω μία απόφασηperno mia apofasi take-1SG a decision to decide
δίνω μια εξήγησηdino mia exigisi give.1SG an explanation to explain
ασκώ κριτικήasko kritiki to criticiseto give a lecturehacer una promesa to_make a promise to make a promisemin hartu pain take to hurt oneself
lo egin sleep do to sleepavoir du courage to have couragebain triail as extract trial from tryδιάνοιαν ἔχεινdianoian ekhein thought.ACC have.INF to have a thought
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish
λόγοις χράομαιlogois khraomai words.DAT use.1SG I speak
ἐν νῷ ἔχωen nо̄ ekhо̄ en mind.DAT have.1SG I have in mind
ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angrydržati govor hold a speech to give a speechfare un discorsoto_make a speechto give a speech
fare una promessa to_make a promise to make a promiseგავლენას ახდენსgavlenas axdens he/she performs influence he/she affects
ზიანს აყენებსzians aqenebs he/she puts damage he/she harmspieņemt lēmumu to take a decisionto make a decisionħa deċizjoni took a decisioneen toespraak houden a speech hold to give a speechpodjąć decyzję to take a decisionfazer uma promessa to make a promisea lua o decizie to take a decisionto make a decisionimeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinionjap mësim give lesson give a lecture
bëj një premtim do a promise make a promiseдонети одлуку doneti odluku to bring a decision to take a decisionhålla ett tal hold a speechto give a speech做 讲座 do speech to give a speechقيمه أعطى give a value to give a value for somth or someoneдавам възможност give an opportunity(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)δίνω προτεραιότηταto grant rights
to give a headache
to provoke the destruction of the buildingdar dolor de cabeza to_give pain of head to give a headache
hacer ilusión to_make excitement to make excited/to look forward tocuir lúcháir ar put joy on give delight toτιμωρίαν ἀποδίδωμιtimо̄rian apodidо̄mi punishment.ACC give.1SG I inflict punishment
ὀργὰς παρασκευάζομαιorgas paraskeuazomai anger.ACC.PL cause.1SG I make angry
δίκην ἐπιτίθημιdikēn epitithēmi justice.ACC impose.1SG I fine (sb)
τιμωρίαν ποιέωtimо̄rian poieо̄ punishment.ACC do.1SG I inflict punishmentzadati glavobolju komu to give a headache to someone, izazvati nezadovoljstvo to cause dissatisfactiondare il mal di testa to_give pain of head to give a headache
dare noia to_give trouble to annoynest nelaimi to carry misfortuneto bring misfortunerechten verlenen rights grant to grant rightsnakłada obowiązek na użytkowników put a duty on the users
dać prawo to give the rightto grant the right
narazić na straty expose to losses
stawiać komuś celto put an aim to someone to set a goal to someoneda cuiva bătăi de cap give sb. a hard timedati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)jap të drejtë give the right grant rightsзадати главобољу некоме zadati glavobolju nekome nekome to give a headache to someone to make problems to someone
створити прилику stvoriti priliku nekome create an opportunity授予 权力 give power to grant power - verbal idioms (VIDs):
إجتماععقدtie a meeting to lead a meetingправя се на дръж ми шапката to behave myself as 'hold my hat' pretend to be naive and innocent
цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
река и отсека to say and cut to say firmly, decisivelyschwarz fahren to drive black take a ride without a ticket, in Kraft treten into force step to come into effect, in die Waagschale werfen in the weighing pan throw to bring to bear
einen drauf setzen going one better(OEG) 𓐣𓂝𓏝 𓃹𓈖𓇋𓋴 𓌃𓅱𓏝 𓈖 𓋹𓈖𓐍𓅱 wč̣ꜥ Wnꞽś mṭw n ꜥnḫ.w Unas (Wnꞽś) shall-separate (wč̣ꜥ) the word (mṭw) for (n) the living (ꜥnḫ.w). Unas shall judge the living (PT 273b, W)κόβω φλέβεςkovo fleves cut vains to be at a complete state of boredom
απορώ και εξίσταμαι wonder1SG.PST and be-amazed1SG.PST to wonder
παίρνω των ομματιών μουperno ton omation mu take the eyes mine to leave (in dispair)
χάνω τα αυγά και τα καλάθιαchano ta avga ke ta paschalia loose-1SG the eggs and the baskets to be at a complete and utter loss
κόβει το μάτι μουkovi to mati mu cut.3SG the.SG.NOM eye.SG.NOM my to be sharp-eyed
παίρνουν τα μυαλά μου αέραpernun ta miala mu aera take.3PL the.PL.NOM brain.PL.NOM air.SG.ACC to become arrogant
δεν δίνω του αγγέλου μου νερόden dino tu agelu mu nero not give my angel water to be stingyto go bananas
fortune favors the bold
to drink and drive
to voice act
to pretty-print
to short-circuit
to tumble dryhacer de tripas corazón make of intestines heart to pluck up the courage
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a rideadarra jo horn play to pull (somebody's) leg, to be kidding
burua hautsi head break to rack one's brains, to think very hard
ikusi eta ikasi see and learn
hortxe dago koska just-there is the-crux that's the crux of the matterdéfendre son bifteck defend one's beefsteak to defend one's interests
court-circuiter to short-circuitag cur is ag cúiteamh arguing and debating arguing back and forthπερὶ πολλοῦ ποιέομαιperi pollou poeomai above much.GEN do.1SG I hold in high esteem
οἷον τ'ἦνhoion t’ēn of.what.sort.NOM and was.3SG it was possible
δίκην δίδωμιdikēn didо̄mi justice.ACC give.1SG I get punishedmlatiti praznu slamu to beat empty straw to talk aimlessly, mazati komu oči to blur eyes to someone to cheat someonegettare le perle ai porci to_throw the pearls to the pigs to waste something good on someone who doesn't care about it
andare e venire to_come and goback and forth
corto-circuitare
to short-circuitშიშს ჭამსšišs čams he/she eats horror to be startled, to panic; to be horrified / deeply shocked
უარს აცხადებსuars acʻxadebs He/she declares a refusal to refuse
აღმართს ახვნევინებსaġmartʻs axvnevinebs he/she makes (someone) plow uphill He/she/she forces their will on others
გაივლის ვინმეს ხელშიgaivlis vinmes xelši he/she will pass through someone's hand he/she will go through someone's control or possessionatstiept kājas to strech one's legs to diegħasfur żgħir qalli a bird small told me to hear something from the grapevine
iqum u joqgħod jump and stay to fidgethet ijs breken ice break to break the icerzucać grochem o ścianę throw peas agains a wall to try to convince somebody in vain
pluć i łapać to spit and catch to be lazy, to do nothing usefulfazer das tripas coração transform the tripes into heart to try everything possible
pintar e bordar paint and knit to abusea trage pe sfoară to pull on rope to fool
a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock togetherubiti dve muhi na en mah to kill two flies with one strike to achieve two aims at once, spati kot ubit to sleep like dead to sleep soundlyi bie murit me kokë hit the wall with head to try the impossible
i vë flakën to it put flame to cause troubleдржати реч držati reč to hold a word to keep a promise
храбре срећа прати hrabre sreća prati fortune follows the bold fortune favors the bold
китити се туђим перјем kititi se tuđim perjem decorate oneself with someone else's feathers steal someone's thunder / take credit for someone else's accomplishments吃 闭门羹 eat closed-door-soup to be locked out
哑巴 吃 黄连 dumb-person eat bitter-medicine a dumb person eats bitter medicine, and he cannot speak out the bitterness - Three quasi-universal categories, valid for some language groups or languages but non-existent or
very exceptional in others:
- inherently reflexive verbs (IRV):
усмихвам се to smilesich bemühen to endeavour, sich enthalten himself contain to abstain(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).- NA in Modern Greekto find oneself in a difficult situation
to to help oneself to the cookiessuicidarse to suicide
quejarse to complainn.a.se suicider to suicide
se soucier to worryn.a.–– This category does not apply to Ancient Greek.smijati se to laughsuicidarsi to suicide
lamentarsi to moanzich bemoeien to get involved, zich vergissen to be mistakenbać się to fear SELFto be afraidse queixar to complaina se gândi to thinkbati se to be afraid, smejati se to laugh, drzniti si to dare to do somethinggëzohem rejoice myself to be happy
pendohem repent myself to regret
kujdesem to care myself to take careбојати се bojati se to be afraid
коцкати се kockati se to gamble - idiomatic verb-particle constructions (IVPC) with two subcategories:
- fully non-compositional IVPCs (IVPC.full), in which the particle totally changes the meaning of the verb
- semi non-compositional IVPCs (IVPC.semi), in which the particle adds a partly predictable but non-spatial meaning to the verb
not applicable to Bulgarianer gibt auf he gives up, er wirft ihr das vor he throws her that against he reproches that to herμπαίνω μέσα get in get in to go bankrupt
βάζω μπροςvazo bros put forward to startto do inn.a.n.a.cas chuig turn towards happen to have–– This category does not apply to Ancient Greek.postaviti za to set for to appointbuttare giù to_throw down to swallowhij geeft op he gives upnot applicable to Polishjogar fora This seems to be the only VPC in Portuguese. We annotate it as ID and do not use the VPC category.n.a.n.a.hedh poshtën.a.not applicable to Bulgarianκάνω πίσωkano piso do back to back offto eat upn.a.tabhair suas give up–– This category does not apply to Ancient Greek.andare avanti to_go forward to move onopeten to eat up
opdrinken to drink upn.a.n.a.eci paran.a.把握 住 机会 grasp hold opportunity to grasp the opportunity successfully → a Chinese Resultative Verbal Construction (RVC) - multi-verb constructions (MVC):
will sagen want to say that is to say(MEG) 𓁹𓏏 𓀀 𓈝𓅓𓏏𓂻 𓅓 𓏃𓈖𓏏𓇋𓇋𓏏𓊛 ꞽr.t (⸗ꞽ) šm.t m ḫnt.yt My (⸗i) making (ir.t) of going (šm.t) southwards (m ḫnt.yt) I made a departure southwards. (Sin. B 5-6)έχω να κάνωecho na kano have to do to cope
έδωσα πήραedosa pira give.1PST take.1PST I struggledto let go
to make doquerer decir to_want to_say to mean?laisser tomber let fall to give up
vouloir dire want say to mean?φθάνουσι ἐρχόμενοιphthanousi erkhomenoi overtake.3PL go.PTC they go first
τυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to gomože biti can be it is possiblelasciar andare to_let go to unhand
voler dire to_want say to meanwil zeggen want to say that is to say
laten vallen let fall to give up
leren kennen to learn know to become acquainteddać komuś żyćto let someone livenot to bother someone
można wytrzymaćone can standthe situatiion is reasonably goodquerer dizer want say to mean
ouvir falar hear speak to know/remember vaguelyn.a.n.a.do të thotëдај шта даш daj šta daš give what you give to be satisfied with small (from someone)
ићи куда некога ноге носе ići kuda nekoga noge nose to go where one's feet carry somone to go without an aim排列 成 arrange become to arrange to be
试试 看 try see to try and see
- inherently reflexive verbs (IRV):
- language-specific categories, defined for a particular language in a separate documentation.
We also introduce an optional experimental category which (if admitted by the given language) is to be considered in the post-annotation step:
- inherently adpositional verbs (IAVs)
излизам със становище come out with a statement
to rely on
mieć do czynienia z czymś to have to do with sth
odwieść kogoś od czegoś to dissuade someone from doing sth
Section 3.2
Categories of nominal MWEs
We distinguish two classes of nominal MWEs (NMWEs):
- Nominal idiom (NID)- a universal category, caracterized by lexical, morphological or syntactic irregularity:
(OEG) 𓇓𓏏 𓆤𓏏 nsw - bꞽtꞽ The king of Upper Egypt (𓇓𓏏) and Lower Egypt (𓆤𓏏). The king of Egypt (PT 776a, P) → For the meaning of nsw-bꞽtꞽ see Schenkel, Das Wort für 'König' (von Oberägypten), 1986.φακός επαφήςfakos epafis lense contact.GEN.SG contact lensea big fish an important person
a hot dog a sandwich with a hot sausageun pesce grosso
il braccio destroაბრამის ბატკანიabramis batkani Lamb of Abraham Completely innocent, simple person Lamb of God
ადამის ჟამისadamis žamis Of Adam's time Old, very old personbaltais zvirbulis the white sparrow a person who stands out from the crowdblinde vink small roll of minced meat, wrapped in a slice of veal or beef
hotdog a sandwich with a hot sausage
zwarte markt black marketbiały kruk a whit raven a rare thingkokë e madhe big head an important personосиње гнездоosinje gnezdo wasps' nest dangerous placeакула бізнесуakula biznesu business shark an agile, goal-oriented person with excellent business skills and undeniable advantages over competitors
біла воронаbila vorona white crow is a person who is different from the rest
об’ємний звукob’jemnyj zvuk surround sound sound coming from all directions
холодна війнаxolodna vijna cold war a period of prolonged tension between countries that did not involve direct military action but included economic and political competition, espionage, etc. - Pronominal idiom (PronID) - a universal category constituting a closed lists of cases:
(OEG) 𓅱𓌡𓏤 𓊪𓈖 𓇋𓅓 𓎡 wꜥ pn ꞽm(.ꞽ) ⸗k this (pn) one (wꜥ) who-is-in (ꞽm(.ꞽ)) you (⸗k). This one who is in you. (PT 254a)I saw just a few
I expect no one to come
we love each otherje ne suis pas capable de manger quoi que ça soit I am not able to eat what that this be I cannot eat anythingci amiamo l'un l'altroviens otrsone othersome people
tas patsthet selfthe same
kaut kassomething
dažs labsfew goodsomebody; some peoplepowtarzał ciągle to samo he repeated always this the same he repeated always the same
coś tam jeszcze something there more something more
byłoby to co innego it would be what different it would be something elseNe duam njëri-tjetrin. We love each other. We love each other.сам по себиsam po sebi by itselfкохаємо один одногоkoxajemo odyn odnoho love each other means that two or more people feel deep love, affection and mutual love for each other
сама собоюsama soboju by herself of course - Deverbal nominal MWE (NV) with subcategories corresponding to the categories of VMWEs from which the nominal MWE can be derived:
- universal subcategories:
- Deverbal nominal stemming from an LVC.full (NV.LVC.full)
- Deverbal nominal stemming from an LVC.cause (NV.LVC.cause)
- Deverbal nominal stemming from a VID (NV.VID)
a decision maker - deriving from the LVC.full to make a decisionlēmuma pieņēmējs a decision maker - derives from the LVC.full pieņemt lēmumu to take a decision to make a decisionsianie zgorszenia sowing scandal provoking scandal - derives from the LVC.full siać zgorszenie sow scandal provoke scandalmarrës vendimesh "a decision maker" → from "marr një vendim" ("to make a decision").пружање подршкеpružanje podrške providing support - derives from the LVC.full пружати подршку pružati podršku to provide supporta doubt-raiser - deriving from the LVC.cause to raise doubtsnelaimes nešana bringing of misfortune - derives from the LVC.cause nest nelaimi to bring misfortunedostarczanie wrażeń delivering of impressions giving impressions - derives from the LVC.cause dostarczać wrażeń deliver impressions give impressionsngjallës dyshimesh raiser of doubts a doubt-raiser derives from the LVC.cause ngjall dyshime (to raise doubts)изазивање реакцијеizazivanje reakcije provoking a reaction - derives from the LVC.cause изазивати реалцију izazivati reakciju provoke a reactionWortbruch word-break a promise which has not been hold - derives from the VID ein Wort brechen word break not to hold a promise(OEG) 𓅓𓎕 𓄣 𓈖 𓇓𓏏𓈖 mḥ ꞽb n(.ꞽ) nsw the-one-who-fills (mḥ) the heart (ꞽb) of (n(.ꞽ)) the king (nsw) The king's confidant (Urk. I 190, 11) = > mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) '(My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ)' 'My lord trusted me' → It is an NV.VID.full.a heart breaker - deriving from the VID to break one's heartla prise en compte the fact of taking into account - derives from the VID prendre en comptetake into account
une mise à disposition the fact of making available - derives from the VID mettre à dispositionmake availableuno spezzacuori - deriving from the VID spezzare un cuorekāju atstiepšana stretching of one's legs dying - derives from the VID atstiept kājas to stretch one's legs to diezabawa czyimś kosztem a play at someone else's expenses - derives from the VID bawić się czyimś kosztem to enjoy oneself at someone else's expensesthyerës zemrash breaker of hearts heartbreaker derives from the VID thyej zemrën (break the heart)долазак на светdolazak na svet coming to the world birth - derives from the VID dolaziti na svet
одузимање животаoduzimanje života depriving of life deprivation of life - derives from the VID одузети живот oduzeti život take a life - quasi-universal subcategories:
- Deverbal nominal stemming from an IRV (NV.IRV)
- Deverbal nominal stemming from an IVPC.full (NV.IVPC.full)
- Deverbal nominal stemming from an IVPC.semi (NV.IVPC.semi)
- Deverbal nominal stemming from an MVC (NV.MVC)
cackanie sięz przestępcami dealing too mildly with bandits - derives from the IRV cackać się dealing too mildly with someonea take-off - deriving from the IVPC.full to take off - optional experimental subcategory:
- Deverbal nominal stemming from an IAV (NV.IAV)
Section 3.3
Categories of adjectival and adverbial MWEs
We distinguish three classes of adjectival and adverbial MWEs (AMWEs, previously also called modifier MWE or ModMWEs):
- Adjectival idiom (AdjID) - a universal category, caracterized by lexical, morphological or syntactic irregularity:
mistake made by students, sorry :) (OEG) 𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹𓊹 𓋴𓆓𓄔𓏏𓅓 𓌃𓅱𓂧 𓇋𓍘𓅱 pśč̣.(w)t śč̣m.t mṭw ꞽtꞽ.w The Enneads (pśč̣.(w)t) which-hear (śč̣m.t) the word (mṭw) of the monarch (ꞽtꞽ.w). The Enneads which interrogate the monarch (PT 511c, W)a well-worn coat
to be up in arms to be very angry
a bottom-up algorithm an algorithm starting from details and moving on to more general principlesun argomento trito e ritrito repeated over and over
una fregatura bella e buona a real fraudмртав пијанmrtav pijan dead drunk dead drunk
нов новцијатnov novcijat new new brand new
вредан поменаvredan pomena worthy of mention worth mentioning - Adverbial idiom (AdvID) - a universal category, caracterized by lexical, morphological or syntactic irregularity:
(OEG) 𓆓𓏏𓇿 𓂋 𓈖𓅘𓎛𓎛 č̣.t r nḥḥ for the linear-eternity (č̣.t) to (r) the circular-eternity (nḥḥ). for ever and ever (PT 414c, W)by and large generally speakingpar la force des choses by the strength of the things inévitablytutto sommatoeverything summed up all in allულუკმოდ დარჩენილიulukmod darčʻenili without a bite left left hungry; left starvingaiz restēm behind bars in prison
droši viensure only sure enoughzrobić coś raz dwa to do something one two to do something quickly
pod kluczem under the key in prisonочас послаočas posla immediatly work in the blink of an eyeзробити для галочкиzrobyty dlja haločky to do something just for show to do something perfunctorily
збирати по крихтахzbyraty po kryxtax collecting crumbs assemble something in small, often insignificant parts
останнім часомostannim časom recently it is a phrase that indicates a certain period of time that has recently ended - Deverbal adjectival/adverbial MWE (AV) - with subcategories corresponding to the categories of VMWEs from which the AMWE can be derived:
- universal subcategories:
- Deverbal AMWE stemming from an LVC.full (AV.LVC.full)
- Deverbal modifier stemming from an LVC.cause (AV.LVC.cause)
- Deverbal modifier stemming from a VID (AV.VID)
żołnierz wzięty do niewoli a soldier took into castody emprisoned soldier - wziąć do niewoli is an LVC.fullдонет законdonet zakon brought law passed law derives from the LVC.full донети закон doneti zkon to pass a law - quasi-universal subcategories:
- Deverbal AMWE stemming from an IRV (AV.IRV)
- Deverbal AMWE stemming from an IVPC.full (AV.IVPC.full)
- Deverbal AMWE stemming from an IVPC.semi (AV.IVPC.semi)
- Deverbal AMWE stemming from an MVC (AV.MVC)
a run-down apartment - adjectival MWE deriving from the IVPC.full to run down - optional experimental subcategory:
- Deverbal AMWE stemming from an IAV (AV.IAV)
Section 3.4
Categories of functional MWEs
We distinguish four classes of functional MWEs (FuncMWEs), all of them universal:
- Determiner idiom (DetID) :
I work from home roughly every other daytas pats cilvēks that self personthe same person
katru otro dienuevery second dayevery other dayzadałem sobie to samo pytanie I asked myslef this same question I asked myslef the same question
przekaż mu te oto słowa transfer him these here words transfer him these wordsтой чи інший бікtoj čy inšyj bik one side or the other one of several
той чи той випадокtoj čy toj vypadok this or that case in each of the two options - Adposition idiom (AdpID):
(OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N)in front of the stationdi fronte alla stazioneგანზე გაგონილიganze gagonili heard from the side heard unintentionally
თითზე ჩამოსათვლელიtʻitʻze čʻamosatʻvleli countable on (one’s) fingers a fewlīdz pat until evenup to; untilhij speelde één grastoernooi ter voorbereiding op het Grand Slam he played one grass tournament in preparation for the Grand Slamgwarancji nie ma nawet w przypadku arcymistrza there is no guarantee event in the case of a grandmasterсмештај у близини лукеsmeštaj u blizini luke accommodation in proximity of the port accommodation near the portпід час вечеріpid čas večeri during dinner when an action or event is in progress
у межах співпраціu mežax spivpraci within the framework of cooperation within the framework of something
за допомоги друзівza dopomohy druziv with the help of friends using something or someone to achieve a goal - Conjunction idiom (ConjID):
(OEG) 𓈖 𓈖𓏏𓏏 n-n.tt “for (n) (the fact) that (n.tt) because (PT 716e, T)she was fortunate in that she had friends to help herla cérémonie sera projetée sur grand écran afin que tout le monde puisse suivre the ceremony will be projected on a big screen so that everyone can followlei è fortunata in quanto ha amici che la aiutanovārdnīca, kā arī locījumu tabula a dictionary, as also an inflection table a dictionary as well as inflection tablezmęczony mimo źe dzień się dopiero zaczynał tired although that the day was only beginning tired although the day was only beginningПазите само да не оштетите кип.Pazite samo da ne oštetite kip. Just be careful not to damage the statueдля того, щобdlja toho, ščob in order to with the aim of, in order to, in order to achieve something
не тільки, але йne til'ky, ale j not only, but also not only ... but also ...
чи то…, чи то…čy to…, čy to… either..., or... indicates the possibility of several options, but is not precisely defined - Interjection idiom (IntjID):
damn it!bon sang! good blood! damn it!mannaggia!ვაი შენს ტყავს!vai šens tqavs! Wow to your skin! You're in trouble! or, Oh, poor you!pie velna! at the devil! Damn it!do diabła! To the devil! Damn it!алал вераalal vera blessing faith congratulationsСлава Богу!Slava Bohu! Thank God! expresses gratitude or relief when something good has happened or danger has passed
До дідька!Do did'ka! Damn it! 1) a lot. 2) used to express dissatisfaction with someone's behavior, actions, deeds, etc. 3) goes away
Дідька лисого!Did'ka lysoho! Damn bald guy! 1) used as a categorical denial of something 2) absolutely nothing
Якщо хочете, …Jakščo xočete, … If you want, … a polite suggestion or invitation to do something
Section 4
Annotation process
We propose the following methodology for MWE annotation:
- Step 1 - identify a candidate, that is, a combination of at least two words which could form a MWE. Recall that a candidate can be composed of only one token if it contains several words (cf. the MWT tests). Find the neutral form of the candidate. The following steps should be applied to this neutral form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
- Step 2 - determine which components of the candidate (in its neutral form) are lexicalized, that is, if they are omitted, the MWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
- Step 3 - depending on the syntactic structure of the candidate's neutral form, formally check if it is a MWE using the generic and category-specific decision diagrams and tests decribed below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
- Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.
The unique entry point to Step 3 above is the following test:
Top test - [DIST] - Distribution
What is the distribution of the neutral form of the candidate in the particular context? This can be tested by replacing the MWE candidate with a single word having the given part of speech, and checking if such a replacement, although possibly changing the meaning, does not lead to a loss of grammaticality or acceptability. If such a replacement test passes for a large class of single words of the same POS, the candidate is considered as having the distribution of this POS.
- Determiner, conjunction, adposition or interjection ⇒ Apply the functional MWE tests ⇒ FuncMWE tests positive?
- Annotate with the FuncMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Adjectival or adverbial phrase ⇒ Apply the adjectival and adverbial MWE tests ⇒ AMWE tests positive?
- Annotate with the AMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Verb, verbal phrase or verbal clause ⇒ Apply the verbal MWE tests ⇒ VMWE tests positive?
- Annotate with the VMWE subcategory determined via the guidelines
- It is not a MWE, exit
- Noun or nominal phrase ⇒ Apply the nominal MWE tests ⇒ NMWE tests positive?
- Annotate with the NMWE subcategory determined via the guidelines
- It is not a MWE, exit
- universal subcategories:
Section 5
Specific tests for categorizing verbal MWEs
Once a candidate VMWE has been pre-identified in steps 1 and 2 of the annotation process, and its distribution was established as verbal, the confirmation of its status as a VMWE, as well as its categorization, is done according to the decision diagrams and tests described in the following sections:
- Generic decision tree with structural tests (S)
- Light-verb constructions (LVCs)
- Verbal idioms (VIDs)
- Inherently reflexive verbs (IRVs)
- Idiomatic verb-particle constructions (IVPCs)
- Multi-verb constructions (MVCs)
- Inherently adpositional verbs (IAVs) - optional and experimental
Additionally, language-specific categories (LS) can be defined and tests for them can be used to annotate them in a given language or language group only.
Section 5.1
Generic structural tests for verbal MWEs (S)
Structural tests are quite simple preliminary tests that help determining the syntactic structure of the VMWE candidate. This is required in order to point at the right category-specific identification tests.
The decision diagram below indicates the order in which the structural tests should be applied when the candidate MWE has a verbal distribution established in the DIST test. The decision diagrams are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.
Generic decision tree for verbal MWE candidates
If you are annotating Italian or Hindi, go to the Italian-specific VMWE decision diagram or Hindi-specific decision diagram. For all other languages follow the tree below.
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Particle ⇒ Apply IVPC-specific tests ⇒ IVPC tests positive?
- Annotate as a VMWE of category IVPC.full or IVPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
Test S.1 - [HEAD] - Syntactic head
Does the candidate contain a unique verb functioning as the functional syntactic head of the whole?
- Apply the VID-specific tests
تنلاصبرbe patient you getif you stay patient you will get what you want →non of the verbs is clearly the head, as there in no universally accepted syntactic representations of coordinationsцъфна и вържа → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationleben und leben lassen live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationέδωσε πήρεedose pire gave3SG.PA took3SG.PA he succeeded none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationto pretty-print → there is an unusual case of an adjective modifying a verb
to drink and drive → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationcoser y cantarto_sew and to_singeasy as pie, a piece of cakeikusi eta ikasi see and learn → none of the verbs is clearly the headag cur is ag cúiteamh arguing and debating arguing back and forth → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationἠντεβόλει καὶ ἱκετεύεēntebolei kai iketeue supplicate.3SG and beseech.3SG he begged and beseechedžariti i paliti to stoke and to burn to be powerful , vedriti i oblačiti to brighten and to cloud to be poweful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationvivi e lascia vivere live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationstāvēt un krist to stand and to fallto be very sure (of something); to defend with confidence → none of the verbs is clearly the head, they are coordinatedleven en laten leven live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationpluć i łapać to spit and catchto be lazy, to do nothing useful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationpintar e bordar paint and knit to abuseživi in pusti živeti to live and let live to live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationhyr e dil come and go come and go none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationведрити и облачити vedriti i oblačiti to brighten and cloud to be very powerful
што не иде не иде što ne ide ne ide what doesn't go, doesn't go don't force something → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordinationdet knallar och går it trots and walks it is OK/as usual → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination - continue to the next test
ريح للرجليه أسلمhe gave his feets to the wind he runs away so fast → أسلمto give is the head and the NP depends on itгушна букета to hug the bunch of flowers to die → гушна is the head and the NP depends on it
правя на салата to make into salad to scold → правя is the head and the PP depends on iteine Fratze ziehen a grimace pull to make a face → ziehen is the head and the NP depends on it
er gibt auf he gives up → gibt is the head and auf is the particle depending on itκάνω γκριμάτσαkano grimatsa to make grimace to make a face κάνω is the head and the NP depends on it
παίρνω μία απόφασηperno mia apofasi take a decision to make a decision, to decide παίρνω is the head and the NP depends on it
βάζω μπροςvazo bros put forward to start βάζω is the head and μπρος depends on itto make a face → make is the head and the NP depends on it
to give up → give is the head and up is a particle depending on itdar la cara to_put the face face the consequences → dar is the head and the NP depends on it
hacer muecas to_make grimmaces to make a face → hacer is the head and the NP depends on itlan egin work do to work → the verb egin is the head and the NP depends on itéirigh as rise out of quit → the verb éirigh is the head and the particle as depends on itχάριν ἔχειkharin ekhei gratitude.ACC have.3SG he is grateful → ἔχει is the head and the NP depends on itsložiti facu make a face to show reaction → složiti is the head and the NP depends on itfare le linguacce to_make the grimaces → fare is the head and the NP depends on it
far fuori to_make out to kill → fare is the head and fuori is a particle depending on itnaar de bekende weg vragen for the known road ask → vragen is the head and naar de bekende weg is the extended NP depending on itzbijać bąki to smash fartsto fool around, to do nothing useful→ zbijać is the head and the NP bąki depends on it
dać komuś popalićto let someone smoketo make someone's life hard → dać is the head and the infinitive popalić depends on itbater as botas → bater is the head and the NP depends on it
criar vergonha na cara → criar is the head and the two NPs depend on ita face baie to make bath to bath → face is the head and the NP depends on it
a ieși înainte to go forth to greet → ieși is the head and înainte is a particle depending on itimeti krompir to have potatoes to be lucky → imeti is the head and the NP depends on itheq dorë remove hand give up heq is the head, and dorë depends on it.обесити нос obesiti nos hang one's nose to feel down→ обесити is the head and the NP нос depends on it
седети скрштених руку to seat with arms crossedto be inactive, withut the initiative → седети is the head and the NP (in the instrumental case) скрштене руке depends on itatt ge upp to give up → ge is the head and upp is the particle depending on it
The aim of this test is to categorize (as VID or no VMWE) those candidates which have no single clearly identified head verb. This is necessary because all other tests refer to the single head verb v and its dependents. Note that the test should be applied to the neutral form of each candidate. This is required because there may be no verb or the verb may not be the syntactic head in such a non-neutral variant.
قرارال أخذ to make a decision passes the test → variants like هأخذ الذيقرار ال the decison that he made, قراراتال أخذ making decisions , مأخوذةقراراتdecisions made passes the test as wellвземам решение passes the test → variants like решението, което беше взето pass the test as welleine Entscheidung treffen make a decision passes the test → variants like die Entscheidung wurde getroffen the decision was made, die Entscheidung, welche getroffen wurde the decision which was made, das Treffen der Entscheidung the making of the decision pass the test as wellπαίρνω μία απόφαση make a decision passes the test → variants like η απόφαση που πήραμε, πάρθηκε απόφαση, παίρνοντας απόφαση pass the test as wellto make a decision passes the test → variants like the decision which was made, decision-making, the making of the decision pass the test as welltomar una decisión passes the test → variants like la decisión fue tomada, tomando esa decisión, la decisión que tomaron pass the test as wellerabakia hartu decision take to make a decision passes the test → variants like hartutako erabakia the decision (which was) made, erabaki hura hartzea (the fact of) making that decision, erabakiak hartutakoan when the decisions were made pass the test as welldéan comhairle make counsel make a decision passes the test → variants like comhairle a dhéanamh counsel to make to make a decision ag déanamh comhairle at making counsel making a decisionδόξαν ἔχουσιdoxan ekhousi reputation.ACC have.3PL they have a reputation passes the test
δόξαν ἣν ἔνιοι ἔχουσι περὶdoxan hēn enioi ekhousi peri opinion.ACC which some have.3PL about the opinion which some hold about is a variant and passes the testdonijeti odluku make a decision passes the test → variants like odluka donesena tada decision made then pass the test as wellprendere una decisione to_take a decision make a decision passes the test → variants like la decisione è stata presa the decision was made, la decisione, che è stata presa the decision which was made, prendendo la decisione taking the decision pass the test as welleen beslissing nemen to make a decision passes the test → variants like de beslissing werd genomen the decision was made, de beslissing, die genomen werd the decision which was made, het nemen van de beslissing the making of the decision pass the test as wellzbijać bąki to smash fartsto fool around, to do nothing useful passes the test → variants like zbijanie bąków farts smashingfooling around, doing nothing useful, zbijający bąki smashing farts pass the test as welltomar uma decisão make a decision passes the test → variants like a decisão que foi tomada the decision which was made, decisão tomada decision made pass the test as wella lua o decizie make a decision passes the test → variants like decizia care a fost luată the decision which was made, luarea deciziei decision-making pass the test as wellzlomiti komu srce to break someone's heart to hurt someone's feelings bad passes the test → variants like srca, ki jih je zlomil hearts which he has broken (people's) feelings which he hurt bad, lomljenje src breaking (people's) hearts hurting (people's) feelings and nedavno zlomljeno srce recently broken heart pass the test as wellmarr një vendim take a decision to make a decision variants like vendimi që u mor (the decision that was made), marrja e vendimit (decision-making) pass the test as well.донети одлуку doneti odluku to bring a decision to make a decision passes the test → variants like одлука је донета odluka je doneta a decision has been made and доношење одлука donošenje odluka decision making pass the test as wellTest S.2 - [1DEP] - Single dependent
Does the VMWE contain exactly one lexicalized (functional) syntactic dependent d of the head verb v?
- Apply the VID-specific tests
لسانهالقطأكل the cat ate his tongueused to talk about someone who was known to talk a lot, then suddenly we see him silent→ two dependents,لسانه his tongue and القط the catна стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person → two dependents, на стар краставичар (PP) and краставици (NP)
прочитам от корица до корица to read from cover to cover → two dependents, от корица (PP) and до корица (PP)
правя (нечий) живот черен make someone'l life black to ruin someone's life → two dependents, (нечий) живот (NP) and черен (small clause)die Katze aus dem Sack lassen to let the cat out of the bag → two dependents die Katze and aus dem Sackκάνω την καρδιά μου πέτραkano tin kardia mu petra make the heart mine stone two dependents, την καρδιά and πέτρα
δίνω τόπο στην οργήdino topo stin orγi give place to anger to hold in one's anger two dependents, τόπο and στην οργήto make ends meet → two dependents, ends and meet
to let the cat out of the bag → two dependents, the cat and out of the bagdejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more → two dependents, con la miel and en los labios
dar gato por liebre to_give cat for hare to rip off, to take for a ride → two dependents, gato and por liebreodolkiak ordainetan eman black-puddings in-exchange give to do something as a response to something somebody has done to oneself (similar to 'what goes around comes around')ići glavom kroz zid to go with head through the wall to be stubborn → two dependents glavom and kroz zidmettere il carro davanti ai buoi to_put the cart in front of the oxen put the cart in front of the horse → two dependents carro and davanti ai buoipūst miglu acīs to blow mist into eyesto lie, to talk nonsense → two dependents, miglu and acīseen kat in de zak kopen to buy a pig in a poke → two dependents kat and in de zakchować głowę w piasek to hide head in sandto pretend not to see a problem → two dependents, głowę head and w piasek in sand
bać się własnego cienia to fear SELF one's own shadowto be very timid → two dependents, się SELF and własnego cienia own shadowtapar o sol com a peneira to hide the sun with a sieve to sugar-coat → two dependentsa da bir cu fugițiito give tribute with fugitives theto disappear→ two dependents, bir and cu fugiții
a- i ieși ochii din cap to his come out eyes the from head to stare→ three dependents, i, which is a non-RCLI, ochii, and din capskrivati glavo v pesekto hide head in sand to pretend not to see a problem → two dependents, glavahead and v pesekin sand
vlečeš me za nosyou are pulling my nose you're pulling my leg → two dependents, meme and za nosmy noseI hedh benzinë zjarrit I throw gasoline on the fire To make a situation worse (aggravate a problem) Two dependents: benzinë and zjarritићи линијом мањег отпора ići linijom manjeg otpora go down the line of less resistanceto take the path of least resistance → two dependents, линијом linijom line and мањег отпора manjeg otpora less resistence
продати рог за свећу prodati rog za sveću to sell a horn for a candle to deceive somebody on purpose → two dependents, рог rog horn and za sveću за свећу for a candleatt sätta sig upp mot någon to sit oneslef up against someone To defy someone → two dependents, sig and upp - Continue to the next test
مثلاً ضرب hit an example to give examlpe → the single dependent is a noun phrase,مثلاًexampleритам камбаната kick the bell to die → the single dependent is a noun phrase, камбаната
ставам на кайма turn into mince to be destroyed → the single dependent is a prepositional phrase, на кайма
одирам жив skin alive to make someone suffer → the single dependent is an small clause (adjective), живeine Fratze ziehen a grimace pull to make a face → the single dependent is a noun phrase, Fratze
, in Betracht ziehen to take into consideration → the single dependent is a prepositional phrase, in Betracht
er gibt auf he gives up → the single dependent is a particle aufπαίρνω σκληρά μέτραperno sklira metra take hard measures take strict measures → the single dependent is a noun phrase, μέτρα → the single dependent is a noun phrase
φέρω βαρέωςfero vareos bring heavily to resent the single dependent is an adverb, βαρέωςto make a face → the single dependent is a noun phrase, face
to take into account → the single dependent is a prepositional phrase, into account
to take turns → the single dependent is a noun, turns
to give up → the single dependent is a particle, uphacer muecas to_make grimmaces to make faces → the single dependent is a noun phrase, muecas
tener en cuenta to_have in account to take into account → the single dependent is a prepositional phrase, en cuentamin eman pain give to hurt (somebody) → the single dependent is a noun phrase, min
kontuan hartu into-account take to take into account → the single dependent is a noun phrase with a postpositional suffix, kontuanbain triail get trial try → the single dependent is a noun, éirigh as rise out of quit → the single dependent is a particleπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → the single dependent is a prepositional phrase
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → the single dependent is an NPimati osjećaj to have a feeling → the single dependent is a noun, osjećajfare le linguacce to_make the grimaces to make a face → the single dependent is a noun phrase linguacce
prendere in considerazione to take into consideration → the single dependent is a prepositional phrase, in considerazione
egli lo fa fuori he kills him → the single dependent is a particle fuoriatstiept kājasto stretch one's legsto die→ the single dependent is a noun phrase, kājasopgeven to give up → the single dependent is a particle, opbić na alarm to strike on alarmto raise the alarm → the single dependent is a prepositional phrase, na alarm on alarm
cholera wie cholera knowsI have no idea→ the single dependent is the nominal subject choleracometer um crime to commit a crime → one dependenta face fațăto make faceto to deal with→ the single dependent is a noun phrase, față
a ieși înainte → the single dependent is an adverb, înaintegre za it is about → the single dependent is a particle, za
smejati se to laugh → the single dependent is a reflexive clitic, se
imeti mačka to have a hangover → the single dependent is a noun, mačekhedh poshtë Throw down To reject or dismiss the single dependent: poshtë (adverb)ићи као алва ići kao alva go like halva to sell well → the single dependent is a prepositional phrase, као алва kao alva as halva
језик прегризао bite off your tonguedo not foresee bad things→ the single dependent is the NP језик jezik tongueatt ge upp to give up → the single dependent i s the particle upp
The test covers only lexicalized dependents. There may be other, non-lexicalized dependents, which the test ignores. We explicitly call the non-verbal elements dependents instead of arguments or complements because argument-adjunct distinction is irrelevant. The outcome of the test is positive if the verb has a single lexicalized dependent, which can be the subject, the direct or indirect object, but also an adverbial complement, adverb, particle, relative clause, etc.
Test S.3 - [LEX-SUBJ] - Lexicalized subject
Is the single lexicalized (functional) syntactic dependent d of the head verb v its subject?
- Apply the VID-specific tests
أوزارها الحرب وضعت the war put its weights the war is over →الحرب is the subject of وضعتчашата преля the glass overflowed this is the last straw → чашата is the subject of преляein kleines Vöglein hat mir gezwitschert a little bird told meμου είπε ένα πουλάκιmu ipe ena pulaki me told a little-bird a little bird told me → a little bird is the subject of tolda little bird told someone → a little bird is the subject of toldha llegado tu hora has arrived your time your time has come → tu hora is the subject of ha llegado
me lo ha dicho un pajarito it to_me has told a little_bird a little bird has told me → un pajarito is the subject of ha dichotxoritxo batek esan → txoritxo batek is the subject of esanptičica mi je šapnula a little bird whispered to me → ptičica is the subject of šapnulame lo ha detto l'uccellino a little bird told me → l'uccellino is the subject of ha dettogalva kūp the head is steamingto do something with great mental effortboontje komt om zijn loontje he that mischief hatches, mischief catcheslicho wie devil knowsI have no ideaa sua hora chegou your time has arrived your time has come
um passarinho me contou que ... a little-bird me.DAT told that ... little bird told me that...a șoptit o păsăricăwhispered a bird little a little bird told someonesrce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something → srce heart is the subject of pade falls , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky → sekira hatchet is the subject of pade fallsMë zuri koka My head caught me I got a headache Koka (head) is the single lexicalized dependent, functioning as the subject of the verb zuri (caught).иде некоме карта ide nekome karta the card goes for someone to have luck → карта is the subject of иде
пасти некоме камен са срца pasti nekome kamen sa srca a stone falls from one's hearth to feel relieved → карта is the subject of пасти - Continue to the next test
زيارة ب قام he did with visit to make a visit→ زيارة is the object of قامобичам чашката love the glass to be an alcoholic
вземам назаем take in loan to borrow
намирам се find SELF to be situatedκάνω μια ευχήkano mia efchi do a wish to make a wish μία ευχή is the object of είπεto make a wish → a wish is the object of makepedir un deseo to_ask a wish to make a wish → un deseo is the object of pedirhitz eman→ hitz is the object of emanλόγοις χράομαιlogois khraomai word.DAT use.1SG I speak λόγοις is the object of χράομαιnapraviti prekršaj to make an offense → prekršaj is the object of napravitidare spettacolo to_make a scene → spettacolo is the object of dareeen toespraak houden→ toespraak is the object of houdenbać się fear SELFto be afraid
chodzić prostą drogą to go (on) a straight road.INST to avoid complications
zacznać od zera to start from zero to start from scratchplouă cu găleata rains with bucket-the It rains heavily → cu găleata is the adverbial of plouăimeti glavo na ramenih to have head on shoulders to be sensible → glava head is the object of imeti havemarr hua take loan to borrow hua (loan) is the single lexicalized dependent, functioning as the object of the verb marr (take).тврдити пазар tvrditi pazar to secure shopping to pretend not to be interested in order to gain more → пазар is the object of тврдити
обрати бостан obrati bostan to pick melon to be ruined → бостан is the object of обрати
This test captures the fact that VMWEs with lexicalized subjects always belong to the VID category. Note that the test should be applied to the neutral form of a VMWE. This is required because there may be no verb or the verb may not be the syntactic head in a non-neutral variant.
Test S.4 - [CATEG] - Category of the dependent
What is the morphosyntactic category of the (functional) dependent d that co-occurs with the head verb v?
- Reflexive clitic - apply IRV tests. If the outcome is negative, discard the VMWE candidate.
- Particle (as opposed to an adposition) - apply IVPC tests. If the outcome is negative, discard VMWE candidate.
- Verb with no lexicalized dependent - apply MVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
не искам и да чуя don't want to even hear to oppose strongly → и да чуя is a VPwill sagen want to say that is to sayέχω να κάνωhave to doconcernto let go
to make doquerer decir to_want to_say to meann.a.laisser tomber let fall to give up
vouloir dire want say to meanτυγχάνουσι ἐρχόμενοιtugkhanousi erkhomenoi get.3PL go.PTC they happen to gopustiti koga živjeti to let someone live not to bother someone, znati raditi to know to work to be capablelasciar andare to_let go to unhand
voler dire want say to meanwil zeggen want to say that is to saydać komuś żyćto let someone livenot to bother someone
można wytrzymaćone can standthe situatiion is reasonably goodquerer dizer want say to mean
ouvir falar hear speak to know/remember vaguelyn.a.n.a.може бити može biti can beit is possible though unlikely - Adposition (preposition or postposition, as opposed to a particle) - in step 3 of the annotation process adpositions are not annotated unless they introduce a lexicalized dependent. Adpositions are covered optionally and experimentally in the post-annotation step (step 4), following the inherently adpositional verb (IAV) guidelines.
разчитам на to rely on
излизам със to come out with. Modern Greek does not have IAV expressionsto come across
to rely onconfiar en to_trust in to trust in entender de to_understand of to know aboutn.a.–– This category does not apply to Ancient Greek.izlaziti s kim to go out with someoneconfidare su to_trust in to trust in intendersi di to_understand of to know aboutbehoren tot to belong toconta pe count onn.a. - Extended nominal phrase (possibly including modifiers, prepositions, postpositions or case markers) - apply LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
زيارة ب قام make a visit → ب زيارة is a noun phrase composed of preposition and a nounритам камбаната kick the bell to die → камбаната is a noun phrase composed of a single noun
давам зелена светлина give green light to allow → зелена светлина is a noun phrase composed of an adjective and a noun
ставам на кайма turn into mince to be destroyed → на кайма is a prepositional phrase composed of a preposition governing a noundie Nase rümpfen the nose wrinkle turn up one's nose at sth. → die Nase is a noun phrase composed of a determiner and a noun
in Kraft treten intoκάνω μία ευχήkano mia efchi make a wish to make a wish → μία ευχή is a noun phrase composed of a determiner and a noun
δίνω εξηγήσειςdino exigisis give explanations to explain → εξηγήσεις is a noun phrase composed of a single plural nounto make a wish → a wish is a noun phrase composed of a determiner and a noun
to take turns → turns is a noun phrase composed of a single plural nounpedir un deseo →un deseo is a noun phrase composed of a determiner and a noun
entrar en vigor→en vigor is a prepositional phrase composed of a preposition and a nounkontuan hartu into-account take to take into account → the NP, kontuan, is composed of a noun (kontu), a determiner (a) and a postposition (-n)
urratsak egin steps do to take steps → the NP, urratsak, is composed of a single plural noun (urrats+ak)τὴν ἴσην χάριν αποδίδωμιtēn isēn kharin apodidо̄mi the same gratitude.ACC give.1SG I show the same gratitude → τὴν ἴσην χάριν is an NP composed of a DP and an adjectivedoći do zaključkato come to conlusion, to conclude→ do zaključka in doubt is a prepositional phrase composed of a preposition governing a nounprendere in considerazione take into account → in considerazione is a prepositional phrase composed of a preposition and a noun
rompere il silenzio to break the silence → il silenzio is a noun phrase composed of an article and singular noun
mettere radici → radici is a noun phrase composed of a single plural nouneen wandeling maken to take a walk → een wandeling is a noun phrase composed of a determiner and a noun
te koop zetten to put for sale → te koop is an extended noun phrase composed of a preposition and a noun
in aanmerking komen in comment come to qualify → in aanmerking is an extended noun phrase composed of a preposition and a nounpodjąć decyzjęto take a decision→ decyzję decision is a nominal phrase composed of a single noun
chodzić prostą drogą to go (on) a straight road.INST to avoid complications → prostą drogą(on)a straight road is a noun phrase composed of an adjective and a noun in (instrumental)
bujać w obłokach to swing in the cloudsto fantasize→ w obłokach in the clouds is a prepositinal phrase composed of a preposition and a nountomar banho to take a shower → banho is a noun phrase composed of a single nouna rupe tăcerea to break silence the to start talking → tăcerea is a noun phrase composed composed of a single noun
a face baie to do bathto take a shower → baie is a noun phrase composed of a single nounbiti v dvomih to be in doubts to doubt→ v dvomih in doubts is a prepositional phrase composed of a preposition governing a noun, klicati jelene to call cerfs to vomit → jeleni cerfs is a noun phrase composed of a single plural nounузети маха узети маха to take swing/moment to spread→ маха maha swing/moment is a nominal phrase composed of a single noun
дати часну реч dati časnu reč to give an honorable word to promose firmly → часну реч časnu reč (honorable word is a noun phrase composed of an adjective and a noun in (accusative)
пасти на ум некоме pasti na um nekome to drop on one's mind to get an idea→ на ум na um on mind is a prepositinal phrase composed of a preposition and a noun - (Hindi-specific) Adjective which is morphologically identical to an eventive noun: Apply the LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
- Adjective: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
излизам сух от водата to come out dry from the water to avoid taking responsibility
одирам жив skin alive to make somone suffer
гоня дивото chase the wild.ADJ to take risks → дивото is a substantiverot sehen to see redτα βάφω μαύρα them-NE.PL.ACC paint-1.SG black-NE.PL.ACC be very sadto stand firm, to see redme las vi negras me the saw black I saw myself in trouble
ponerse negro put.self black to get/become irritated
poner verde put green to criticise (someone)zuriak eta beltzak aditu white and black hear to hear all sorts of thingsvoir rouge to see red to be very angryostati svoj to stay one's own to be consistentvedere nero to see blackblauw zien van de kou to be blue/perished with the cold
zwartrijden black drive to take a ride without a ticketzrobić swojeto do one's ownto do what one is supposed to dopensar grande to think biga vedea roșu to see red
a o face lată to CL.ACC make wideto partynarediti svojeto do one's ownto do what one is supposed to doбити зелен biti zelen to be greento be young, unexperienced - Adverb: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
изваждам наяве take out in the open to uncover
хващам натясно catch in a tight place to coerce, to pressureφέρω βαρέωςfero vareos bring heavily to resentto get wellcaer bien fall well to be liked byalferrik galdu uselessly get-lost to ruin, to spoilκαλῶς εἶχενkalо̄s eikhen beautifully have.IMPF.3SG he was welldobroproći to go well to be successfulfare passi avanti to_make steps forward to make progressbeter worden to get wellchcieć dobrze to want wellto have good intentions
robić komuś dobrze to do someone.DAT wellto please someone
źle/marnie skończyć badly finishto come to a bad endcair bem fall well to be appropriatea se face bine to himself make well to get well
a face bine to make well to helpobrniti se na bolje to turn for better to be better, iti predaleč to go to far to demand to much or to do something inappropriateдобро доћи dobro doći to come wellto be useful
боље рећи bolje reći to say better to say in other words, more precisely - Pronoun: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
мързи ме (it feels) lazy me.ACC to be lazyτα καταφέρνωta kataferno them achieve to make it
την πατάωtin patao her step-on to failto make itjugársela play.self.it to risk itelkar hartu each-other take to get on with somebody, to agreesuarekin jolasean ibili with-fire playing be to play with firele faire it make to be enough/successfulfarcela to make it to managehet maken it make to be successfulNo example found in Polishdá-lhe João! give to him/her, João! show them what you got, João!a o coti CL.ACC.F.3SG turn to turnwith the non-anaphoric feminine clitic 'o' functioning as an expletiveimeti ga pod kapo to have him under one's hat to be drunk, mahniti jo to hit her to start going (somewhere)n.a. - Verb with lexicalized dependents including fully lexicalized clauses: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
السيف العذل سبق The sword preceded the blamesaid when someone do something without thinking and regret itне мога да кажа две думи на кръст cannot say two words on a cross to not be able to speak or express oneself
правя сам да си говори make someone talk to himself to drive someone crazyανοίγω τον ασκό του Αιόλουopen the bag of Aeolus open the bag of Aeolus to open the floodgates
και οι τοίχοι έχουν αυτιάke i tichi echun aftia and walls have ears everyone might be listeningto make ends meet, to know on which side the bread is buttered
hacer de tripas corazón make of intestines heart to pluck up the courage
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a riden.a.okretati se kako vjetar puše to turn how the wind blows to be inconsistentsbarcare il lunario to_land the living to make ends meet
non avere peli sulla lingua do not have hair on the tongue to be outspokenlachen als een boer die kiespijn heeft laughing on the other side of his/her face/mouthwiedzieć, co w trawie piszczy to know what in the grass squeaks to know what is going on, to be well informedvedeti, koliko je ura to know what the time it is to realize the truthзнати у ком грму лежи зец знати у ком грму лежи зец I know in which bush the rabbit lies to know what is going on, to be well informed - Other: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
Arabic does not have IRV expressionsстрахувам се fear myself.REFL to be afraid
радвам се feel joy myself.REFL to feel joysich wundern to wonder, sichschämen to be ashamed. Modern Greek does not have IRV expressionshelp yourself to the apples
I found myself in a difficult situationsuicidarse to suicide, quejarse to complainn.a.se suicider to suicide, s'évanouir to faint–– This category does not apply to Ancient Greek.čuditi se to wonder, penjati se to climbsuicidarsi to suicide, vergognarsi to be ashamedzich vergissen to be mistaken, zich schamen to be ashamedbać się fear SELFto be afraidsuicidar-se to suicide, queixar-se to complaina se sinucide to commit suicide with obligatory ACC reflexive clitic
a se holba to stare with obligatory ACC reflexive cliticčuditi se to wonder, smejati se to laugh, onesvestiti se to faintmërzitem bore myself get bored kujtohem remember myself rememberзнојити се znojiti se sweat SELFto sweat
откравити се otkraviti se to melt SELFto relax, to cheer upBulgarian does not have VPC expressionsanfangento begin, er fängt anhe begins, er hat angefangen he has begun → in German, VPCs may occur separated or within one word, we annotate all occurrences!
ich schlage vor I proposeπαίρνω μπροςperno bros take forward to get startedto give up, to look forward ton.a.n.a.–– This category does not apply to Ancient Greek.biti na to be onto to look likefar fuori to_make out to kill, lo fa fuorihe kills him , lo ha fatto fuori he killed himaanvangento begin, iets vangt aansth begins → in Dutch, VPCs may occur separated or within one word, we annotate all occurrences!
ik stel voor I proposePolish does not have IVPC expressionsjogar fora to-throw outside to discard, throw awayRomanian does not have VPC expressionsn.a.Albanian does not have VPC expressions.n.a.The aim of this test is to determine which category-specific identification tests should be applied. Note that the test should be applied to the neutral form of a VMWE candidate. This is required because there may be no verb or the verb may not be the syntactic head in non-neutral variant.
Section 5.2
Light verb constructions (LVC)
Light verb constructions (LVC) constitute a universal category. We retain the following key characteristics:
- They are formed by a verb v and a (single or compound) noun n,
which either directly depends on v (and possibly contains a case marker or a postposition), or is introduced
by a preposition.
In case of Hindi, the noun can be replaced by an adjective which is morphologically identical to an eventive noun. If you annotate Hindi, everywhere is this page when the noun is referred to, you should read the noun or the adjective.إتخذ إجراء make action → verb+direct object noun
قام بزيارة make a visit → verb+prepositional-object noun
أدى التحية العسكرية do the military salutesalute →verb+ composed nounвземам решение to make a decision
държа под контрол to keep under controlzum Einsatz kommen to the use come to be called into action
eine Rede halten a speech hold to give a speech(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)παίρνω μία απόφαση perno mia apofasi make a decision to decide verb + direct-object noun
δίνω στα νεύραdino sta nevra give to-the nerves cause to be nervous verb + prepositional-object noun
έχω στην κατοχή μουecho stin katochi mu have.1SG to-the possession my to possess verb + prepositional-object nounto give a lecture → verb + direct-object noun
to come into bloom → verb + prepositional-object noun
to make a high five → verb + compound nounhacer una promesa make a promise to make a promise
poner en peligro put in danger endanger, jeopardise→ verb + prepositional-object noun
tener dolor de cabeza have pain of head to have a headache → verb + compound nounlan egin work do to work, aurrera egin front-to do to go aheadfaire une présentation make a presentation → verb + direct-object noun
procéder à une analyse proceed to an analysis to make an analysis → verb + prepositional-object noun
faire un faux pas make a faux-pas → verb + compound nounἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punishstupiti na snagu step into force come into force
držati predavanje to hold a speech to give a speechchiamare in causa to_call in cause to single out
fare una passeggiata to_make a walk to have a walkeen toespraak houden a speech hold to give a speech→ verb + direct-object noun
in bloei staan in bloom stand to be in bloom→ verb + prepositional-object nounodnieść sukces carry-away success to be successful
mieć wyrzuty sumienia to have reproaches of conscience to blame oneself
wykonać rzut karny to perform a penalty kickfazer um aborto to make an abortion → verb + direct-object noun
estar com fome be with hunger to be hungry → verb + prepositional-object noun
fazer uma mesa redonda make a table round to have a round table (discussion) → verb + compound nouna duce dorul to carry yearning.the to miss somebody
a da divorț to give divorce to divorce
a da în clocot to give in boil to come to the boil
a da în fiert to give in boil to come to the boilbiti v dvomih to be in doubts → verb + prepositional-object noun, to doubt
imeti predavanje to give a lecture → verb + direct-object nounдати на знање dati na znanje give on knowledge to inform
поднети жалбу podneti žalbu to submit an appeal to file a complaint - The (single or compound) noun n is predicative and refers to an event (e.g. decision,
visit) or a state (e.g. fear, courage). Predicative nouns are nouns that have semantic arguments, that is, they express predicates whose meaning is
only fully specified by their semantic arguments:
قرار أخذ make a decision →noun refers to an event , there are 2 argument : a decider and decision
كلمةألقى to give a word → noun refers to an event , there are 2 arguments : the talker and the speechвземам решение to make a decision → noun refers to an act or event
давам съгласие to give permission → noun refers to an act or event
имам притеснения to have concerns → noun refers to a feeling or state
имам готовност to be ready → noun refers to a feeling or stateeine Entscheidung treffen to make a decision → noun refers to an event
Angst habento have fear→ noun refers to a state(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W)παίρνω μία απόφασηperno mia apofasi take decision to decide → noun refers to an event
κάνω βόλταkano volta make walk to walk → noun refers to an event
έχω αγωνίαecho agonia have anxiety to be anxious → noun refers to a state
κάνω κουράγιοkano kuragio make courage to be courageous → noun refers to a stateto make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
to pay a visit → noun refers to an event, there are 2 arguments: a visitor and a visited place/person
to have fear→ noun refers to a state, there are 2 arguments: somebody who is afraid and something frightening
to have courage → noun refers to a state, there is 1 argument: the courageous persondar un consejo give an advise to give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
tener valor to have courage→ noun refers to a state, there is 1 argument: the courageous personnegar egin cry do to cry → noun refers to an act or event
lo egin sleep do to sleep → noun refers to a statedonner un conseil give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
avoir du courage to have courage→ noun refers to a state, there is 1 argument: the courageous personμου εἰς τὴν γνώμην εἰσῄειmou eis tēn gnо̄mēn eisēei I.GEN into the opinion.ACC come.into.IMPF.3sg it came to my mind noun refers to a state
ἐξέτασιν ποιέομαιexetasin poieomai inspection.ACC do.1SG I inspect noun refers to an eventdonijeti odluku to bring a decision to make a decision → noun refers to an event
imati osjećajto have feeling→ noun refers to a statefare una domanda → noun refers to an event
avere paura, avere coraggio → noun refers to a stateeen beslissing nemen to make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
moed hebben to have courage→ noun refers to a state, there is 1 argument: the courageous personprowadzić rozmowy to lead conversations to lead negotiations→ the noun refers to an event
mieć rację to have rightto be right→ the noun refers to a statefazer uma prece to make a prayer → noun refers to an event, there are 2 arguments: the prayer and the thing she/he prays for
ter sintomas to have symptoms → noun refers to a state, there are two arguments: the person having symptoms and the disease causing these symptomsa lua o decizie to make a decision, a face o vizită to pay a visit→ noun refers to an event
a avea curaj → noun refers to a statebiti v dvomih to be in doubts to have doubts → noun refers to a state
imeti predavanje to give a lecture → noun refers to an eventkam frikë
kam kurajëдонети одлуку doneti odluku to bring a decision to make a decision (to decide) → the noun refers to an event
имати право imati pravo to have rightto be right→ the noun refers to a state - We retain two sub-categories of verbs, which define two sub-categories of LVCs:
- The verb v is "light" in that it contributes to the meaning of the whole only by bearing
morphological features: person, number, tense, mood, as well as morphological
aspect. This implies that v's syntactic subject
is n's semantic argument. In this case, we annotate the construction as LVC.full.
نصيحةأسدى to weave an advice to give advice
تاريخالصنع fabricate the history to make history
إستراتيجية ال وضع put a strategy to make a strategyдавам изявление give a statement to make a statement
нанасям щети spread damages to cause damages(OEG) 𓇋𓁹 𓊨𓏏 𓎡 ꞽr ś.t ⸗k Make (ꞽr) your (⸗k) place (ś.t)! Take your place! (PT 651d, T)κάνω μία παρουσίασηkano mia parusiasi make presentation to present
κάνω επίσκεψηkano episkepsi make visit to pay a visit, to visit
παίρνω απόφασηperno apofasi take decision to decideto make a presentation
to pay a visit
to have rights
to have a headache
to carry out a destructiondar un paseo give a walk to go for a walk
tener valor to have courage
tener dolor de cabeza have pain of head to have a headachefaire une présentation to make a presentation
faire une visite to make a visit
avoir le droit to have the right
avoir un mal de tête to have a headacheἐλπίδα / ἐλπίδας ἔχωelpida / elpidas ekhо̄ hope.SG / hope.PL have.1SG I have hope(s)napraviti pogrešku to make a mistakefare una presentazione to make a presentation
fare una visita to make a visit
avere il diritto to have the right
avere un mal di testa to have a headacheeen presentatie geven to give a presentation
een bezoek brengen to make a visit
onder stress staan under stress stand to be stressedodnieść sukces carry-away success to be successful
mieć rację to have rightto be right
cierpieć na anemię to suffer from anemiarealizar uma apresentação to make a presentation
fazer uma visita to make a visit
ter um direito to have a right
ter dor de cabeça have pain of head to have a headachea face o prezentareto make a presentation
a face o vizită to pay a visitimeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion, biti v pomoč to be in help to be helpful, delati razlike to make differences to differentiatejap një shfaqje
kam dhimbje kokeвршити претрес vršiti pretres to do a search to conduct a search
имати право imati pravo to have rightto be right - The verb v is "causative" in that it indicates that the subject of v is the cause or
source of the event or state expressed by n. In other words, the noun has semantic arguments expressed as
non-subject elements in the sentence, and the subject of the verb brings an additional information, indicating
the cause of source of the event/state. In this case, we annotate the construction as
LVC.cause. These constructions are expected to be less idiomatic than other VMWEs and can be
understood as complex predicates with a causal support verb.
حربالأعلن to declare war
حقوق أعطى to give rights
أملأعطىto give hopeдавам възможност to give an opportunity
нося късмет to bring luck(OEG) 𓏙 𓍿 𓌸𓂋𓅱𓏏 𓏏𓏏𓇋 𓅓 𓄡𓏏𓏤 𓊹 𓎟 č̣i̯ ⸗č mrw.t Ttꞽ m ẖ.t nčr nb You (⸗č) should-give (č̣i̯) the love (mrw.t) of Teti (Ttꞽ) into (m) the body (ẖ.t) of every (nb) god (nčr). You should instil love for Teti into the belly of every god. (PT 739c, T)δίνω ικανοποίησηdino ikanopiisi give satisfaction to satisfy
προκαλώ καταστροφήcause distruction
δίνω χαράdino chara give joy to make happyto grant rights
to give a headache
to provoke a reactiondar derecho to grant the right
dar vértigo give vértigo to make dizzy
causar un accidente to provoke an accidentdonner le droit to grant the right
donner le vertige give the vertigo to make dizzy
provoquer un accident to provoke an accidentἐλπίδα / ἐλπίδας παρέχωelpida / elpidas parekhо̄ hope.SG / hope.PL give.1SG I make hope(s)dati mogućnost to give an opportunitydare il diritto to grant the right
dare le vertigini to_give the vertigo to make dizzy
causare un incidente to provoke an accidentrechten verlenen to grant the right
een ongeluk veroorzaken to provoke an accidentto sprawia nam kłopot this causes us trouble
nakłada obowiązek na użytkowników put a duty on the users
dać prawo to give the rightto grant the right
narazić na straty expose to losses
stawiać komuś celto put an aim to someone to set a goal to someonedar o direito to grant the right
dar tontura give vertigo to make dizzy
provocar um acidente to provoke an accidenta da dureri de cap to give pains of head to give a headachedati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)provokoj një debat
bëj aksidentизнети мишљење izneti mišljenje to take out one's opinion to state one's opinion
задати главобољу zadati glavobolju to cause a headacheto give a headache
- The verb v is "light" in that it contributes to the meaning of the whole only by bearing
morphological features: person, number, tense, mood, as well as morphological
aspect. This implies that v's syntactic subject
is n's semantic argument. In this case, we annotate the construction as LVC.full.
The following decision tree should be applied to decide whether a candidate should be annotated as a LVC.full, LVC.cause or none.
LVC-specific decision tree:
- Apply test LVC.0 - [N-ABS: Is the noun
abstract?]
- It is not an LVC, exit
- Apply test LVC.1 - [N-PRED: Is
the noun predicative?]
- It is not an LVC, exit
- Apply test LVC.2 - [V-SUBJ-N-ARG:
Is the subject of the verb a semantic argument of the noun?]
- Apply test LVC.3 - [V-LIGHT:
The verb only adds meaning expressed as morphological features?]
- It is not an LVC, exit
- Apply test LVC.4 - [V-REDUC:
Can a verbless NP-reduction refer to the same event/state?]
- It is not an LVC, exit
- It is an LVC.full
- Apply test LVC.5 - [V-SUBJ-N-CAUSE:
Is the subject of the verb the cause of the noun?]
- It is not an LVC, exit
- It is an LVC.cause
- Apply test LVC.3 - [V-LIGHT:
The verb only adds meaning expressed as morphological features?]
Note: test 10 [N-SEM] from the previous version of the guidelines (1.0) was considered unnecessary and has been abandoned in the current version of the guidelines.
Note: LVC tests are often hard to apply. If you hesitate at some intermediary test, continue to the next one, since the last tests of LVC.full and LVC.cause will help you reach your final decision.
Test LVC.0 - [N-ABS] Noun is abstract
Is the noun n abstract?
- continue to next test
... قرار decision ، علم science ، أمل hope ، إجتماع meetingпроблем problem, възможност opportunity, изявление statement, план plan(OEG) 𓈖𓂋𓃭𓅱 nr.w fear fear (PT § 302c-d, W)απουσίαapusia absence
θυμόςthimos anger
αγάπηaγαpi love
δυσκολίαδiskolia difficulty
υπόσχεσηiposchesi promise
παρουσίασηparusiasi presentation
εμφάνισηemfanisi appearancepriority, anger, love, opinion, difficulty, speech, presentation, birthpaseo walk, derecho right, ilusión excitement, fe faith, duelo griefpas step, édition edition, discours speech, explication explanation, lute fightὀργή orgē anger anger
τιμωρίαtimо̄ria punishment punishment
πίστιςpistis trust trustproblem problem, mogućnost opportunity, ideja ideapriorità priority, rabbia anger, amore love, opinione opinion, difficultà difficulty, discorso discourse, presentazione presentation,所有possession, 検討examination, 名誉会長honorary chairmanliefde love, mening opinion, strijd fightkłopot problem, wysokość height, praca work, prawo right, zysk profitprioridade priority, festa party, fé faith, nascimento birth, distinção distinction, problema problem, gol goal (soccer)răspuns answer, prezentare presentationdvom doubt, mnenje opinion, ime name, vloga role, odločitev decisiondëshirë, mendim, vështirësi, fjalim, përparësi, zemërimмишљење mišljenje opinion, претрес pretres search, побуна pobuna rebellion, одлука odluka decision - it is not an LVC
طاولة table، ورقة paper، شخص person ، يد handправя торта to make a cake → a cake is a physical entity (not abstract)
давам пари to give money → money is a physical entity (not abstract)
подавам ръка to give out handto help in a difficult situation → hand is a physical entity (not abstract)(OEG) 𓊹 nčr god god (PT 460a-b, W)καρέκλα karekla chair , τραπέζι trapezi table , χέρι cheri hand , άνθρωπος anθropos humanchair, keyboard, hand, personmesa table, silla chair, mano hand, foto picture,aulki, teklatu, esku, pertsonachaise chair, clavier keyboard, main hand, personne personπαῖςpais child child
οἶκοςoikos house house
ἀγορά agora market square market squarestol table, ruka hand, kruna crownsedia chair, tastiera keyboard, mano hand, persona person家house, 車car, 家族familystoel chair, hand hand, persoon personzłożyć kartkę to fold a sheet→ a sheet is a physical entity (not abstract)
złożyć broń to lay down arms→ arms is a physical entity (not abstract)
bić pianę to beat foamto exaggerate about a problem→ foam is a physical entity (not abstract)
wystawić fakturę to issue a bill→ a bill is a physical entity (not abstract)
mieć brata to have a brother→ a brother is a physical entity (not abstract)cadeira chair, teclado keyboard, mão hand, pessoa person, pedra rockscaun chair, pian pianooseba person, mačka cat, kapa hat, avtomobil car, roka handkarrige, tastierë, dorë, njeriизнети јело izneti jelo to take out a dish→ a dish is a physical entity (not abstract)
Some concrete nouns may be predicative (test LVC.1). For instance, a relational noun such as daughter is semantically incomplete without its argument: daughter of X, so daughter is predicative. However, concrete predicative nouns should not pass test LVC.0.
Some nouns may have both concrete and abstract interpretations. For instance, money is concrete when it refers to banknotes (paper money, bills): I didn't have money so I paid by credit card. However, money is abstract when referring to a conventional value used in transactions between people: He spent a lot of money in the mall. If one cannot be sure that the noun is used in its concrete interpretation, test LVC.0 passes.
Test LVC.1 - [N-PRED] Noun is predicative
Does the noun n have at least one semantic argument, implying that it is a predicative noun?
- continue to next test
إجتماععقد tie a meeting to lead a meeting→ event with 2 arguments the meeting and the person that organize the meeting
حوار أجرىmake a dicussion→ event with 2 argument the discussion athe the person who contribute the discussionпоставям акцент to emphasize → event, with two arguments: the agent and the object being emphasized
имам право → property, with one semantic argument: the possessor of the propertyeinen Besuch abstatten to pay a visit → event, with two arguments: the visitor and the visitee
Angst haben to have fear → property with one semantic argument: the entity having fear
einen Blick auf etwas werfen a glance at sth. throw to take a glance at sth → an event with two arguments the entity glancing and the entity glanced at(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → property with one semantic argument: the entities having fear.κάνω μία επίσκεψη kano mia episkepsi to-make a visit pay a visit, visit → event, with two arguments: the visitor and the visitee
έχω τη δυνατότητα echo ti δinatotita have.1SG the ability to be able → property, with two core semantic arguments: the entity having the ability and the object of the ability
έχω μίσος echo misos have hate to hate → state, with two arguments: the entity being in the state of hating and the entity hated
βγάζω λόγο vγazo loγo take-out.1SG speech to make a speech → event, with one obligatory argument: the entity making the speech
παίρνω απόφασηperno apofasi take decision to decide event, with two arguments: the entity taking the decision and the decisionpay a visit → event, with two arguments: the visitor and the visitee
have strength → property, with one semantic argument: the entity having strength
take a glance at something → event, with two arguments: the entity glancing and the entity glanced at
make a contribution → event, with two arguments: the contributor and the beneficiary (notice that contribution could refer to both the event and the thing being contributed, but we always prefer the former reading when possible)hacer una visita make a visit to pay a visit → event, with two arguments: the visitor and the visitee
tener valor to have courage → property, with one semantic argument: the entity having courage
echar un vistazo a algo give a glance to something to take a quick look at something → event, with two arguments: the entity glancing and the entity glanced atbisita egin visit do to pay a visit → event with two arguments: the visitor and the visitee
itxaropena ukan hope have to hope, to have hope → event with one single argument: the person who hopesavoir du courage to have courage→ state(property), with one argument: the entity having courageπροσέχω τὸν νοῦνprosekhо̄ ton noun hold.to.1SG the thought I pay attention (to sth/sb) → an event with two arguments the entity paying attention and the entity paid attention to
ἐν ὀργῃ ἔχωen orgē ekhо̄ in anger.DAT have.1SG I am angry → property with one semantic argument: the entity being angryimati osjećaj to have a feeling → property with one semantic argument: the entity having feeling
otići u posjet to go to a visit to someone to pay a visit → event, with two arguments: the visitor and the visiteefare una visita → event, with two arguments: the visitor and the visitee
avere forza → property, with one semantic argument: the entity having strength
dare uno sguardo a qualcosa → event, with two arguments: the entity glancing and the entity glanced at評価するevaluation.makeevaluate
評価を得るevaluation.acc obtainobtain an evaluationeen bezoek brengen to pay a visit → event, with two arguments: the visitor and the visiteezłożyć wizytę to submit a visitto pay a visit→ event, with two arguments: the visitor and the visitee
złożyć skargę to submit a complaintto make a complaint → event, with two arguments: the complaining person and the one he/she complains about
mieć prawo to have the right→ state, with two arguments: the person having the right and the thing (s)he has the right to
budzić zastrzeżenia to wake-up reservations to provoke reservations → state, with two arguments: the person having reservations and the object of the reservationster fome to have hunger to be hungry → property, with one argument: the entity that is hungry
ter idade para fazer algo to have age (to do something) to be old enough (to do something) → state, with one argument: the entity that is old enough
In PT, we consider that the following classes of predicative nouns pass the test: diseases (gripe, trombose, infarto), physical sensations (fome, sede, sono), emotions (medo, paixão, nojo), cognitive entities internal to the cognizer (ideia, opinião, preocupação), characteristics (coragem, teimosia, fraqueza), relations (contato, conflito, amizade) and nouns expressing communication or speech acts (conversa, discussão, briga, conselho).a face o vizită to make a visit to pay a visit → event, with one argument: the entity that visits
a avea curaj to have courage → property, with one semantic argument: the entity having courageimeti predavanje to give a lecture → event, with two arguments: a lecturer and the people who are attending the lecturejap një kontribut
kam fuqi
i hedh një shikimподнети жалбу podneti žalbu to submit an appeal to file a complaint → event, with two arguments: the complaining person and the one he/she complains about
имати право imati pravo to have the right → state, with two arguments: the person having the right and the thing (s)he has the right to - it is not an LVC
كتابه أحمد أعطى gave Ahmed his book Ahmed gave his book → the nounكتاب is a physical entity that does not pass test LVC.0, even though أحمد could be considered its semantic argument
إعصارًا أحمدشهد Ahmed experienced a tornado→ the noun إعصارًا tornado is an event, but has no semantic argumentsИван хвърли боклука Ivan threw out the garbage → physical entity (not event/state)Joe macht einen Kuchen→physical entity (not event/state), even though Joe could be considered a semantic argument(OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire (PT 376b, W)Ο Γιάννης παίρνει τα ρούχα τουO Yanis perni ta rucha tu The John take.3SG the clothes his → the noun is a physical entity (not event/state) that does not pass test LVC.0
Ο Γιάννης έχει ωραίο σπίτιO Γianis echi oreo spiti The John has nice house → the noun is a physical entity (not event/state) that does not pass test LVC.0Joe makes a cake → the noun is a physical entity that does not pass test LVC.0, even though Joe could be considered its semantic argument
Joe experienced a tornado → the noun is an event, but has no semantic arguments
Joe has a lot of money → the noun is abstract and Joe could be considered its semantic argument, but we consider that money (as well as other goods such as car and bananas) can exist independently of a possessor, so the possessor (owner) should not be considered as semantic argument of moneyAna tiene una bicicleta Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
Ana hace una foto Ana takes a picture → noun is not abstract, so it does not pass test LVC.0pastela egin cake make to make a cake> → physical entity (not event/state)Anna a un vélo Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
Anna affronte la tempête Anna faces the storm → noun is abstract but has no argumentsἔχει δύναμιν καὶ πεζὴν καὶ ἱππικην καὶ ναυτικήνekhei dunamin kai pezēn kai hippikēn kai nautikēn have.3SG force.ACC and on.foot.ACC and on.horseback.ACC and naval.ACC he has an (army force) on foot, on horseback, and at sea → the noun is a physical entity (not event/state)Ivan ima olovku Ivan has a pencil → noun is not abstract, so it does not pass test LVC.0Joe fa un dolce → physical entity (not event/state), even though Joe could be considered its semantic argument
Joe ha vissuto un tornado → event, but has no semantic argumentJan maakt een taart→physical entity (not event/state), even though Jan could be considered a semantic argumentprzetrwać burzę to survive a storm → burza storm has no semantic arguments although it is abstractquebrar a cabeça to break one's head to rack one's brain → physical entity, does not pass test LVC.0
In PT, we consider that the following classes of abstract nouns do not pass this test: informational content that do not require agents (informações, notícias), natural phenomena (chuva, neve, tornado).Joe a făcut o prăjiturăJoe made a cake → physical entity (not event/state), even though Joe could be considered its semantic argumentJanez ima avto → the person that has a car could be considered as a semantic argument, but the car is not an event or a stateJoe bën një ëmbëlsirë
Joe ka shumë paraпреживети земљотрес preživeti zemljotres to survive the earthquake → земљотрес zemljotres earthquake has no semantic erguments although it is abstract
We only retain nouns n that have at least one semantic argument, which we define as a semantically mandatory and specific participant of the event or state expressed by the predicative noun.
Sometimes, it might be useful to consider verbs and adjectives derivationally related to the noun to reason about its semantic arguments.
Test LVC.2 - [N-SUBJ-N-ARG] Verb's subject is noun's semantic argument
Is the subject of the verb a semantic argument of the noun? In other words, is the verb linking the predicative noun to one of its semantic arguments that occurs as the subject of the verb?
- continue to next test
- Go to test LVC.5
لصديقه نصيحة أحمد قدم gave Ahmed an advice to his friend Ahmed gave an advice to his friend → أحمد Ahmed is the subject of the verb and a semantic argument (Advicer ) of the nounИван изнесе доклад Ivan presented a report → Иван is the subject of the verb and a semantic argument (agent) of the activity
Президентът получи покана за посещение в Германия The president received an invitation to visit Germany → Президентът president is the subject of the verb and a semantic argument (the receiver) of the invitation
Президентът получи награда Тhe president received an award→ Президентътpresident is the subject of the verb and a semantic argument (the receiver) of наградаaward(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → Unas is the subject of the verb and a semantic argument (the contender) of the noun.ο Γιάννης έκανε μία παρουσίαση στο αφεντικό τουO Yanis ekane mia parusiasi sto afentiko tu The John made a presentation to-the boss his John made a presentation to his boss ο Γιάννης is the subject of the verb and a semantic argument (the presenter) of the noun παρουσίαση
Ο Γιάννης πρόβαλε αντίσταση στις αρχέςo γianis provale antistasi stis arches The John presented resistance to the authorities John resisted to the authoritiesJohn made a presentation to his boss → John is the subject of the verb and a semantic argument (the presenter) of the nounMaría dio un paseo María went for a walk → María is the subject of the verb and a semantic argument (the walker) of the nounMax fait une promenade Max takes a walk → Max is the subject of the verb and a semantic argument (the walker) of the nounΚῦρος ἐξέτασιν ποιεῖται τῶν Ἑλλήνων καὶ τῶν βαρβάρωνKuros exetasin poieitai tо̄n Hellēnо̄n kai tо̄n barbarо̄n Cyrus inspection.ACC do.1SG the.GEN Greeks.GEN and the.GEN barbarians.GEN Cyrus inspected the Greeks and the barbariansHelena je otišla u posjet prijateljici Helena payed a visit to a friend → Helena is the subject of the verb and a semantic argument (the visitor) of the visit
Susjed jedobio dozvolu za gradnju Neighbour received a permission for construction → Neighbour is the subject of the verb and a semantic argument (the receiver) of the permission彼が聴衆から高い評価を受けた(こと)he.nom audience.source high evaluation.acc received (the fact)He received a high evaluation from the audience → The subject is the recipient of praise
聴衆が彼を高く評価した(こと)audience.nom he.acc highly evaluation.made (the fact)The audience gave him high praise → The subject is a 'praiser'Max maakte een wandeling Max takes a walk → Max is the subject of the verb and a semantic argument (the walker) of the nounJan złożył wizytę Marii Jan payed a visit to Maria → Jan is the subject of the verb and a semantic argument (the visitor) of the visit
Piotr dostał pozwolenie and budowę Piotr received a permission for construction → Piotr is the subject of the verb and a semantic argument (the receiver) of the permission
Beata ma marzenia o spokoju Beata has dreams about peace → Beata is the subject of the verb and a semantic argument (the possessor) of the dreams
wyborcy ponoszą za to winę the electorate bears the responsibility for this→ wyborcy electorate is the subject of the verb and a semantic argument (the agent) of the guilt
ustawa budzi zastrzeżenia the law wakes-up reservationsthe law raises reservations→ ustawalaw is the subject of the verb and a semantic argument (the theme) of zatrzeżeniareservationsFelipe tomou dois banhos Felipe took two showers → Felipe is the subject of the verb and a semantic argument (the person taking a shower) of the nounIon i-a făcut o prezentare șefului său Ion made a presentation to his boss→ Ion is the subject of the verb and a semantic argument (the presenter) of the nounIn Janezovo predavanje o slovenski kulturi za študente prevajalstva, the 3 syntactic arguments are expressed as a modifier with a possessive marker (Janezovo Janez's) and prepositional phrases (o slovenski kulturi on Slovene culture and za študente prevajalstva for students of translating )Бранко је добио постављење Branko je dobio postavljenje Branko was appointed to a position → Branko is the subject of the verb and a semantic argument of the appointment (receiver)
Јелена је Бранку узвратила посету Jelena je Branku uzvratila posetu Jelena returned Branko's visit. → Jelena is the subject of the verb and a semantic argument of the visit (visitor)خطاب المراسل ال قاطع The journalist has interrupted the speech → المراسل The journalist that is , the subject of the verb, is not a semantic argument of خطاب the speech , since a speech does not necessarily have an interrupterПриятелят на Мария прекъсна нейния доклад Maria's friend interrupted her report→ Maria's friend, that is, the subject of the verb, is not a semantic argument of the report, since a report does not necessarily have an interrupter(OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → the passive verb form (w)ṭ(.w) is linking its subject (śnčr) with an adverbial argument (ḥr śč̣.t)το αφεντικό του Γιάννη διέκοψε την παρουσίασή του John's boss interrupted his presentation → το αφεντικό του Γιάννη (John's boss), that is, the subject of the verb διέκοψε, is not a semantic argument of the noun predicate παρουσίαση presentation, since a presentation does not necessarily have an interrupterJohn's boss interrupted his presentation → John's boss, that is, the subject of the verb, is not a semantic argument of the presentation, since a presentation does not necessarily have an interrupter
The report provides information about the economy → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.El periodista interrumpió el discurso The journalist interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
El informe facilita información clave the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.Le journaliste a interrompu le discours The journalist has interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
Le rapport fournit des informations cruciales the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.ὁ δὲ ἐμπιμπλὰς ἁπάντων τὴν γνώμην ho de empimplas hapantо̄n tēn gnо̄mēn he satisfy.PTC all.GEN the.ACC expectation.ACC he, having satisfied everyone’s expectation → the subject of the verb (he) is not the subject of the noun (all)Učenici su prekinuli le predavanjeStudents have interrupted the lecture → Students, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter演奏が彼に聴衆の高い評価をもたらした(こと)performance.nom he.dat audience.gen high evaluation.acc brought (the fact)His play brought him a high evaluation from the audienceDe journalist heeft de toespraak onderbroken The journalist has interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupterMarek dał mi prawo wyboru Marek gave me the right to choose→ Marek is the subject of the verb and but not a semantic argument of the right (a right usually does not need to be grated)
Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is the subject of the verb and but not a semantic argument of the confidence
komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is the subject of the verb but not a semantic argument of wybory vote, which only requires the voters and the matter of the voteO jornalista interrompeu a inauguração The journalist has interrupted the inauguration → The journalist, that is, the subject of the verb, is not a semantic argument of an inauguration, since an inauguration does not necessarily have an interrupter
O relatório traz informações polêmicas the report provides polemic information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.To define a predavanje lecture one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a lecture implies the existence of its arguments.Демонстранти су прекинули говор Demonstranti su prekinuli говор Protesters interrupted the speech→ Protesters are the subject of the verb but not a semantic argument of the speech (a speech does not necessarily have an interrupter)
комисија је спровела гласање komisija je sprovela glasanje the committee carried out the vote→ комисија komisija committee is the subject of the verb but not a semantic argument of гласање glasanje vote, which only requires the voters and the matter of the voteIt is not always easy to determine if the verb's subject is an argument of the noun. You can use the former syntactic version of this test to verify your intuitions.
Test LVC.3 - [V-LIGHT] Verb with light semantics
Is v semantically light, that is, is the semantics that v adds to n restricted to: (i) what stems from its morphological features (e.g. future, plural, perfective aspect, etc.), (ii) pointing at the semantic role of n played by v's subject?
- continue to next test
- it is not an LVC
قرار أخذ take a decision → أخذ makeadds no meaning to قرار decision besides that of performing an activity
معروف قدم present a favor to give a favor → قدم to give adds no meaning to معروف favorbesides that of performing activity
زيارةبقام to do a visit to pay a visit → قام to do adds no meaning to visit زيارة besides that of performing an activityвземам решение make a decision → вземам adds no meaning to решение decision besides that of performing an act
държа реч to make a speech → държа adds no meaning to реч besides that of performing an act
поемам отговорност to take responsibility → поемам adds no meaning to отговорност besides that of having a propertyeine Entscheidung treffen a decision meet to make a decision → treffen adds no meaning to Entscheidung besides that of performing an activity
Angst haben to have fear → haben adds no meaning to Angst besides that of having a property.(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn) Unas instilled fear in them. (PT § 302c-d, W) → (w)ṭ.n adds no meaning to fear (nr.w) besides that of performing an action.κάνω μία βόλτα take a walk→ κάνωmake adds no meaning to βόλτα walkbesides that of performing an activity
παίρνω μία απόφαση → παίρνω take adds no meaning to απόφαση decision besides that of performing an activity
δίνω μία απάντηση → δίνω give adds no meaning to the noun απάντηση besides that of performing an activity
διενεργώ έλεγχο perform a check → διενεργώ perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
διαπράττω ένα έγκλημα → διαπράττω commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
ασκώ δριμεία κριτική → ασκώ commit adds no meaning to the noun κριτική besides that of performing a cognitive activity
νιώθω πολύ άγχος → νιώθω feel adds no meaning to άγχος besides that of being in a mental state
έχω άγχος have anxiety → έχω have adds no meaning to άγχος anxiety besides that of being in a mental state
προβαίνω σε καταγγελία to make a complaint, to complaint → προβαίνω make adds no meaning to καταγγελία complaint besides that of performing an activitytake a walk → take adds no meaning to walk besides that of performing an activity
make a decision → make adds no meaning to decision besides that of performing an activity
have fear → have adds no meaning to fear besides that of having a property
perform a check → perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
commit a crime → commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
pay a visit → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
deliver a speech → the verb in its usual sense means 'to move from one place to another', but here it is not used in this sense and does not add any semantics to the "speech" event
undergo a surgery → undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgerydar un paseo to take a walk → dar adds no meaning to paseo besides that of performing an activity
tomar una decisión to make a decision→ tomar adds no meaning to decisión besides that of performing an activity
tener miedo to have fear → tener adds no meaning to miedo besides that of having a propertyusain egin smell do to smell, to sniff → the verb egin adds no meaning to the noun usain besides that of performing an activity
lo egin sleep do to sleep → the verb egin adds no meaning to the noun lo besides that of performing an activityils ont du courage they have some courage → have adds no meaning to courage besides that of having a property
ils reçoivent l’ordre de partir they receive the order of leavingthey are ordered to leave → receive adds no meaning to order besides indicating that the subject is the recepient of the order
il a subi une intervention chirurgicale he has undergone an intervention surgery he underwent surgery → undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgeryγνώμην ἔχεινgnо̄mēn ekhein opinion.ACC have.INF to have an opinion → ἔχειν adds no meaning to γνώμην besides that of having a property.
τιμωρίαν ποιέομαιtimо̄rian poieomai punishment.ACC do.1SG I punish → ποιέομαι adds no meaning to τιμωρίαν besides that of performing an activityimati hrabrost to have courage → imati have adds no meaning to hrabrost courage besides that of having a property
donijeti odluku to make a decision → donijeti in its usual sense means 'to bring', but here it is not used in this sense and does not add any semantics to eventfareuna passeggiata → fare adds no meaning to passeggiata besides that of performing an activity
prendere una decisione → prendere adds no meaning to decisione besides that of performing an activity
avere paura → avere adds no meaning to paura besides that of having a property
eseguire un controllo → eseguire is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
commettere un crimine → commettere is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
fare una visita → the verb in its usual sense means 'make', but here it is not used in this sense and does not add any semantics to the "visiting" event
fare un discorso → the verb in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to the "speech" event子が親に愛情を持つ child.nom parent.dat affection.acc have The child has affection for his parent(s) → 持つ does not add meaning to 愛情 besides that of having a propertyeen beslissing nemen a decision take to make a decision → nemen adds no meaning to beslissing besides that of performing an activity
een wandeling maken to take a walk → maken adds no meaning to wandeling besides that of performing an activity
schrik hebben to have fear → hebben adds no meaning to schrik besides that of having a propertyoddać hołd to give-back tributeto pay tribute → oddać give-back adds no meaning to hołdtribute besides that of performing an activity
wystąpić z wnioskiem to stand out with a proposal to put forward a motion → wystąpić z stand out with adds no meaning to wniosekmotion besides that of performing an activitymover uma ação judicial to move a lawsuit to sue → to move adds no meaning to lawsuit besides that of performing an activity
apresentar uma lesão present a lesion to have a lesion → to present adds no meaning to lesion besides that of having a property
estar com medo be with fear to be afraid → to be with adds no meaning to fear besides that of being in a statea avea curaj to have courage → avea adds no meaning to curaj besides that of thaving a property
a lua o decizieto make a decision → lua adds no meaning to decizie besides that of performing an activityJanez ima predavanje Janez lectures → Janez is the subject of the verb and a semantic argument of the noun (the lecturer)одати почаст odati počast give away tributeto commemorate/pay tribute → одати odati give away adds no meaning to почаст počast tribute besides that of performing an activity
изрећи казну izreći kaznu to pronounce a sentence → изрећи izreći to pronounce adds no meaning to казну kaznu sentence besides that of performing an activityإنتباه شد grab attention to get attention → شد to grab / to attract indicates that the attention startsзапочвам играта start the game, start playing → започвам start adds an aspectual meaning to the nouneine Rede beginnen to begin a speech → beginnen adds an aspectual meaning to the noun Rede(OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → (w)ṭ(.w) expresses the action of setting incense.ξεκινάω μία προσπάθειαxekinao mia prospaθια start a trial → ξεκινάω adds an aspectual meaning to the nounto start a walk → start adds an aspectual meaning to the nouncomenzar un discurso to begin a speech → comenzar adds an aspectual meaning to the noun discursooinez hasi foot-by start to start walking → the verb hasi adds an aspectual meaning to the noundonner du courage to give courage → donner indicates the source of the courage (this would not pass test LVC.2)
donner son avis to give one's opinion→ donner adds the information that the opinion is communicated
Ce fait attire l'attention de la justice This fact attracts the attention of the justice → attirer indicates the attention startsἄρχειν τοῦ λόγουarkhein tou logou start the speech to begin speaking → ἄρχειν adds an aspectual meaning to the noun λόγου
πολέμου παύσασθαιpolemou pausasthai war end to stop fighting → παύσασθαι adds an aspectual meaning to the noun πολέμουpočeti igru start the game → početi start adds an aspectual meaning to the nouncominciare un ballo to start a dance → cominciare adds an aspectual meaning to the noun ballo子が手に荷物を持つ child.nom hand.loc luggage.acc have The child holds luggage in his hand(s) → 持つ indicates the act of holding an object ; it alternates with other verbs of holding, such as 抱えるeen toespraak beginnen to begin a speech → beginnen adds an aspectual meaning to the noun toespraakwymierzyć sprawiedliwośćto measure justiceto do justice→ wymierzyćmeasure adds an aspectual meaning to sprawiedliwośćjustice, this expression still passes VID tests
przejść na emeryturęto cross to retirementto take retirement→ przejść adds an inchoative (change-of-state) meaning to the noun
dopełnić obowiązkuto fulfill one's duty→ dopełnićfulfill adds a fulfillment meaning to obowiązekdutyentrar com uma ação judicial to enter with a lawsuit to file a lawsuit → to enter adds an aspectual meaning to the noun
dar uma opinião to give an opinion → to giveadds the meaning of communication which is not present in the name itself (one can ter uma opinião to have an opinion without ccommunicating it).a începe muncato start work the to start working → începe adds an aspectual meaning to the nounŠtudent je prekinil njegovo predavanje The student has interrupted his lecture → The student, that is, the subject of the verb, is not a semantic argument of the lecture, since a lecture does not necessarily have an interrupterотићи у пензију otići u penziju to leave to retirement to take retirement → отићи otići adds an inchoative (change-of-state) meaning to the noun
испунити дужност ispuniti dužnost to fulfill one's duty → испунити ispuniti fulfill adds a fulfillment meaning to дужност dužnost dutyNote that this light semantics of the verb is either usual for that verb (i.e. the verb is a pure syntactic operator, like commit, perform), or occurs in the context of the particular noun (e.g. for pay in to pay a visit). Both types of verbs pass the test.
In our view of LVCs, we do not require a light verb to be "bleached", as it is sometimes described in the literature. We simply do not take into account the relation between the verb's use as a light verb and its other uses. While the specific meaning added by light verbs to the predicative nouns have been extensively studied and described (e.g. by Miriam Butt and Tafseer Ahmed), we do not adopt any fine-grained classification here. If you have a doubt about a verb's "lightness", proceed to the next test: if you can evoke the same event/state without using the verb, then it is considered light.
Test LVC.4 - [V-REDUC] - Verb reduction
Try to build an NP without the verb, in which v's subject s becomes n's dependent. You might need to test several prepositions (of, by, for, from), possessives (my, her, somebody's), postpositions, case markers, as long as you use no verb. Can this verbless NP refer to the same event or state as the candidate v+n construction does?
- annotate as LVC.full
دورا يلعب أحمد Ahmed plays a role → دور أحمد Ahmed's role
تحقيق أحمد ب قام Ahmed made an inquiry → تحقيق أحمد Ahmed's inquiryИван пое отговорност Ivan took responsibility → отговорността на Иван — both refer to the same property/event
Иван взе решение Ivan made a decision → решението на Иван — both refer to the same property/eventPaul hat eine Rede gehalten Paul has given a speech → Paul's speech both refer to the same speech event
Ich habe ihm einen Besuch abgestattet I have paid him a visit → mein Besuchmy visit both refer to the same visiting event(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W) → (*) nr.w ⸗f m ꞽb ⸗śn His fear (is) in their hearts — both refer to the same fearing event.Ο Γιάννης έκανε μία παρουσίασηO Yanis ekane mia parusiasi John made a presentation John's presentation --> both refer to the same presenting event
Η Μαρία έδωσε μία υπόσχεσηI Maria edose mia iposchesi Maria gave a promise Maria promised Η υπόσχεση της Μαρίας --> --> both refer to the same promising eventPaul had a walk → Paul's walk — both refer to the same walking event
I paid him a visit → my visit to him — both refer to the same visiting event
Hester gave birth to Pearl → Pearl's birth to Hester — both refer to the same birthing event (note that the key criterion is that Hester, the subject of the verb, is a (prepositional) dependent of birth in the paraphrase)
The party gave priority to senior members → the priority of senior members for the party — both refer to the same prioritization eventPedro dio un paseo Pedro gave a walk Pedro took a walk → el paseo de Pedro Pedro's walk— both refer to the same walking event
El capitán da la orden de partir The captain gives the order to leave The general orders to leave → la orden del capitán de partir The general's order to leavePellok bisita egin zidan → Pelloren bisita -- both refer to the same visiting eventPaul a fait une enquête Paul made an inquiry → L'enquête de Paul Paul's inquiry
Paul procède à une perquisition Paul makes a search→ La perquisition de/par Paul the search of/by Paul
Le général donne l'ordre de partir The general gives the order to leave The general orders to leave → l'ordre du général de partir The general's order to leave
Les soldats reçoivent l'ordre de partir The soldiers receive the order to leave The soldiers are ordered to leave→ l'ordre aux soldats de partir The order to the soldiers to leave
Jean souffre de troubles psychiques John suffers from psychic troubles → Les troubles psychiques de Jean John's psychic troubles
Jean présente une hypersensibilité John presents a hypersensibility John has a hypersensibility→ l'hypersensibilité de Jean John's hypersensibility
Paul reçoit des menaces de (la part de) Pierre Paul receives threats from (the part of) Peter Paul is threatened by Peter → les menaces de Pierre à Paul Peter's threats to Paul
Ce médicament présente un risque This medicine presents a risk This medicine poses a risk → le risque de ce médicamentthis medicine's risk
Ce fait attire l'attention de la justice This fact attracts the attention of the justice → l'attention de la justice pour/sur ce fait the attention of the justice on/about this factΚῦρος ἐξέτασιν ποιεῖταιKuros exetasin poieitai Cyrus inspection.ACC do.3SG Cyrus inspected → ἐξέτασιν (τοῦ Κύρου) refers to the same eventIstraživač je donio zaključak The researcher made a conclusion → njegov zaključak his conclusion both refer to the same eventPaolo ha fatto una conquistaPaul made a conquer→ la conquista di Paolo
Il generale da l' ordinedi partire. The general gives the order to leaveThe general orders to leave → L'ordine di/da parte del generale di partire
Paolo riceve delle minacce da (parte di) Piero → le minacce di Piero a Paolo聴衆が彼を高く評価する audience.nom he.acc highly evaluation.makeThe audience higly praised him → 聴衆の彼の高い評価 the high evaluation of him by the audience
子が親に愛情を持つ child.nom parent.dat affection.acc have The child has affection for his parent(s) → 子の親への愛情 child.gen parent.dir.gen affectionPaul heeft een toespraak gehouden Paul has given a speech → Paul's toespraak both refer to the same speech eventObecni oddali hołd poległym The present gave-back tribute to the fallen The audience payed tribute to the fallen→ hołd obecnych the tribute of the audience
Jan miał na myśli Marię Jan had on thought Maria Jan meant Maria→ myśl JanaJan's thought
Jan otrzymał wymówienieJan received a dismissal→ wymówienie dla Jana dismissal for Jan
Inwestycja przynosi zyski the investment brings profit→ zyski z inwestycji profit from the investmentJoão cometeu um deslize → o deslize do João — both refer to the same event
O jogador cobrou um pênalti the player charged a penalty kick the player took a penalty kick → o pênalti do jogador the player's penalty kick — both refer to the same event
João tem consciência do perigo John has conscience of the danger John is aware of the danger → a consciência do João sobre o perigo John's awareness of the danger — both refer to the same state
João recebeu a remuneração John received the remuneration → a remuneração do João John's remuneration — both refer to the same event
O paciente recebeu a visita dos familiares The patient received the visit of the relatives → a visita dos familiares ao paciente the visit of the relatives to the patient — both refer to the same event
João apresenta lesões John presents lesions → as lesões do João John's lesions — both refer to the same statePaul a făcut o plimbarePaul had a walk → plimbarea lui Paul Paul's walk — both refer to the same walking event
i-am făcut o vizită I paid him a visit → vizita mea — both refer to the same visiting eventimeti dvome to have doubts to doubt → imeti have adds no meaning to dvomi doubts besides that of having a property
delati razlike to make differences to differentiate → delati in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to eventПрофесор држи предавање Profesor drži predavanje The professor is holding a lecture→ професорово предавање profesorovo predavanje The professor's lecture
Овај лек представља ризик Ovaj lek predstavlja rizik this drug presents a risk this drug poses a risk → ризик од овог лека rizik od ovog leka risk of this drug this drug's risk - it is not an LVC
في عام 2001 النور رأى أفاد التقرير بأن برنامج الصحة The report states that the Health Programme saw the light in 2001 The report states that the Health Programme began with its current components in 2001 → نور برنامج الصحة# the light of health program — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP ( نور برنامج الصحة the light of health program ) fails to refer to the original event ( رأى برنامج الصحة النور ) the health program saw the light ( started )Иван хвърли поглед на вестника Ivan threw a glance at the newspaper → #погледът на Иван върху вестника — different semantics; and requires a different prepositionPaul hat einen guten Eindruck gemachtPaul has made a good impression → #Paul's Eindruck auf seine Freunde Paul's impression on his friends has a different semantics(OEG) 𓂧𓈖 𓃹𓈖𓇋𓋴 𓌴𓐙𓂝𓏏 (w)ṭ.n Wnꞽś mꜣꜥ.t Unas set Right Unas set Right (PT 265c, W) → (*) mꜣꜥ.t Wnꞽś 'Unas's Right' fails to refer to the original event (Unas set Right).ο Παύλος πήρε νέα από τον αδερφό του O Pavlos pire nea apo ton aδerfo tu The Paul take.3PST news from his brother → #Τα νέα του Παύλου από τον αδερφό του Paul's news from his brother -- one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (τα νέα του Παύλου) fails to refer to the original event (Ο Παύλος πήρε νέα)Paul got news from his brother → #Paul's news from his brother — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Paul's news) fails to refer to the original event (Paul got news)Juan recibió la noticia de su hermano Juan got the news from his brother → #La noticia de Juan — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (la noticia de Juan) fails to refer to the original event (Juan recibió una noticia)Hizlariak interesa piztu zuen Speaker interest switched-on The speaker awakened interest → #Hizlariaren interesa, #the speaker's interest -- different semanticsSon comportement porte une atteinte grave à l'honneur des soldats His behaviour seriously jeopardises the soldiers' honnour → #l'atteinte de son comportement the jeopardy of his behaviourἡ γυνὴ πίστιν ἔλαβεhē gunē pistin elabe the woman assurance get.AOR.3SG the woman got an assurance → πίστις τῆς γυναικός ‘the woman’s assurance’ fails to refer to the original event (the woman got an assurance)Petar je dobio poruku od direktora Petar received message from his boss → #Petar's news from his boss — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Petar's message) fails to refer to the original event (Petar received message)Paul kreeg nieuws van zijn broer Paul got news from his brother → #Pauls nieuws van zijn broer — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Pauls nieuws) fails to refer to the original event (Paul kreeg nieuws)Michael Phelps pobił rekord sprzed 2 tysięcy latMichael Phelps broke the record from 2 thousand years ago→ #Michael Phelps' record
Ulica nosi imię sławnego poety The street carries the forename of a famous poet The street carries the name of a famous poet.→ imię ulicy the forename of the street
Adam jest tego samego zdania Adam is of the same opinion Adam has the same opinion → #zdanie Adama Adam's opinion refers to the contents of his opinion, not to the fact of having an opinionO jogador cobrou uma falta the player charged a foul the player took a free kick → a falta do jogador the player's foul — the focus changes from taking a free kick to being one of the parts involved in a foul (it's a VID)
O jogador provocou uma lesão the player provoked a lesion → a lesão do jogador the player's lesion — In the reduced NP, the focus changes from hurting somebody else to getting hurt
O músico apresenta suas composições the musician presents his compositions → as composições do músico the musician's compositions — the reduced NP does not keep the sense of presenting, it is not refer to the same event as the verbal constructionPaul a făcut o impresie bunăPaul made a good impression → #Impresia lui Paul despre soția sa Paul's impression on his wife— different semanticsto začeti predavanje to begin a lecture → začeti to begin adds an aspectual meaning to the nounБранко је оборио рекорд у трци на 100 метара Branko je oborio rekord u trci na 100 metara Branko broke the record in 100m race→ #Бранков рекорд #Brankov rekord
This test has a simple formulation but its application has some important subtleties which are central to our definition of the LVC.full category. The goal of this test is to keep only constructions in which the predicative noun is an event or state, excluding "gray-zone" predicates.
First, if it is not possible to build an acceptable NP where the verb v's subject s becomes a dependent of the noun n, e.g. using any preposition, postposition and/or case marker, this means that the verb is not light, and the construction cannot be annotated as LVC.full. This may remove constructions in which there is control, that is, both the noun and the verb share the same subject. However, control is not sufficient to characterize an LVC.full. In other words, LVC.4 fails, the verb is not completely light, and you cannot annotate the construction as LVC.full, even if intuitively it resembles an LVC.full due to control:
العمل قرار أحمد أخذ Ahmed a pris une decision de travail → قرار أحمد بالعمل the decision of Ahmed for work is unacceptablePaul a l'air de dormir Paul has the air of to-sleep Paul seems to be sleeping → *l'air de dormir de Paul is unacceptable
Paul a eu l'occasion de dormir Paul has had the oportunity to sleep Paul had the oportunity to sleep → *l'occasion de Paul de dormir is unacceptableZdravnik je postavil diagnozo The doctor made a diagnosis → njegova diagnoza His diagnosis both refer to the same event
Politik jedal napoved The politician made a forecast → njegova napoved his forecast both refer to the same eventSecond, the fact that the NP is acceptable does not suffice to characterise an LVC.full. Furthermore, the NP version in which the verb was omitted, if acceptable, must evoke the same event or state as the LVC. Here are some tricky examples and some recommendations about how to interpret them:
جديدة اجراءت الشركة أخذت the company took new procedures → the NP الاجراءت الجديدة new procedures is ok, the "الاجراءت " "procedures " seem to refer to new procedures, so ok to annotated as LVC.fullИмам по-голям брат I have an elder brother → моят брат my brother refers to one member of the relation, and not to the state of brotherhood between both actants
отправих покана към приятелите си I sent an invitation to my friends→ покана invitation can be interpreted both as the act of inviting and as its contents; for the first reason we count this candidate as LVC.fullΗ Μαρία έχει έναν αδελφό I Maria echi enan aδelfo Maria have.3SG a brother → Ο αδελφός της Μαρίας is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
Η Μαρία έστειλε ένα γράμμα Maria send.03.SG a letter → Το γράμμα της Μαρίας refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Η Μαρία έχει την άποψηi maria echi tin apopsi Maria has the opinion Maria believes and more generally, cases of έχω + a noun refering to the state of having a mental content (άποψη, γνώμη, πεποίθηση) → η άποψη της Μαρίας is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Η Μαρία έδωσε την υπόσχεσηi maria eδose tin iposchesi the maria give.3.PST the promise Maria promised and more generally, cases of δίνω + a noun refering to a speech act (υπόσχεση, διαταγή, απάντηση, κατάθεση) → Η υπόσχεση της Μαρίας refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Η Μαρία πήρε μία απόφαση I maria pire mia apofasi The Maria take.03.PR a decision Maria decided → απόφαση can refer to the deciding event (μία δύσκολη απόφαση) and/or to what is decided. We recommend that these cases should be annotated as LVC.fullMary has a brother → Mary's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
Mary sent a letter → Mary's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Mary has an opinion and more generally, cases of have + a noun refering to the state of having a mental content (opinion, belief) → Mary's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Mary made a speech and more generally, cases of make + a noun refering to a speech act → Mary's speech refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
Mary made a decision → decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.fullMaría tiene un hermano María has a brother → el hermano de María María's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
María envió una carta María sent a letter → La carta de María María's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
María dio un discurso María made a speech and more generally, cases of dar + a noun refering to a speech act → el discurso de María refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
María tomó una decisión María made a decision → decisión decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.fullla compagnie a pris des mesures d'économie the company took some measures of savingthe company took cost-saving measures → the NP les mesures d'écononmie de la compagnie is ok, the semantic equivalence is difficult to judge, the "measures" seem to refer to cost-saving actions, so ok to annotated as LVC.fullεἶχε τὴν ἀδελφὴν Σιτάλκηςeikhe tēn adelphēn Sitalkēs have.3SG the sister.ACC Sitalkes Sitalkes had a sister → ἀδελφὴν is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
οὐκ ἂν ἐπιστολὴν ἔπεμπονouk an epistolēn epempon not PRT letter.ACC send.3pl they would not have sent a letter → ἐπιστολήν refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
τὴν γὰρ γνώμην εἶχεtēn gnо̄mēn eikhe the thus opinion have.3SG he thus held the opinion and more generally, cases of have + a noun referring to the state of having a mental content → γνώμην is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
ὁ δὲ Σιτάλκης πρός τε τὸν Περδίκκαν λόγους ἐποιεῖτοho de Sitalkēs pros te ton Perdikkan logous epoieito the Sitalkes to also the Perdikkas speech.ACC do.3SG Sitalkes spoke to Perdikkas and more generally, cases of make + a noun refering to a speech act → λόγους refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.fullMarie neemt een beslissing → decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.fullmam starszego brata I have an elder brother → mój brat my brother refers to one member of the relation, and not to the state of brotherhood between both actants
Maria wysłała wiadomość Maria sent a message→ wiadomość Marii Maria's message refers to the contants of the message sent by Maria, rather than to the sending event itself
Maria jest zdania, że Mary has the opinon that... → zdanie Marii Mary's opinion refers to the content of the opinion, and not to the state of having an opinion
miał na celu awans He had promotion on the aim His aim was a promotion→ jego cel refers to the aim inself, and not to the state of having a aimta partia w wyborach miała większość this party had a majority in the elections→ #większość tej partii the majority of the party provokes a considerable shift in meaning
złożył zeznania na policji he gave testimony on the police office→ jego zeznania can be interpreted both as the act of testimony and as its contents; for the first reason we count this candidate as LVC.fullMojca jedala Tini priložnost Mojca gave Tina an opportunity → #Mojčina priložnost Mojca's opportunity has a different meaning; if the verb is removed, the original meaning is lost, so the verb is not light.Марија је послала поруку Marija je poslala poruku Marija sent a message→ Маријина порука Marijina poruka Mariјa's message refers to the contants of the message sent by Maria, rather than to the sending event itselfFinally, some nouns, especially nominalisations, are ambiguous between events and their participants. For instance, a costruction may be an event (the construction of the bridge took 2 years) or its result (this bridge is a spectacular construction). In that case, if the verbless NP can refer to the event, then you should prefer this reading over the "participant" interpretation. For example, in John made a construction, you may ask if John's construction refers to the construction event or to its result. In this case, it can refer to the event, so it should be annotated as LVC.full.
Test LVC.5 - [V-SUBJ-N-CAUSE] Verb's subject is noun's cause
Is the subject of the verb expressing the cause of the predicate expressed by the noun? In other words, does the verb bring an additional participant to the scene, representing the source or cause of the event or state referred to by the noun?
- annotate as LVC.cause
- it is not an LVC
حقوق أعطى to give rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do sometingИван даде възможност на Мария да представи картините си Ivan gave Maria the opportunity to present her paintings→ Ivan is not a semantic argument of възможност opportunity but he is the cause of the opportunity(OEG) 𓇋𓅱 𓂧𓈖 𓃹𓈖𓇋𓋴 𓈖𓂋𓃭𓅱 𓆑 𓅓 𓄣 𓋴𓈖 ꞽw (w)ṭ.n Wnꞽś nr.w ⸗f m ꞽb ⸗śn Unas (Wnꞽś) put ((w)ṭ.n) his (⸗f) fear (nr.w) in (m) the heart (ꞽb) of them (⸗śn). Unas instilled fear in them. (PT § 302c-d, W) → The subject of the verb is the cause of the event reffered to by the predicative noun.δίνω ικανοποίησηδino ikanopiisi give satisfaction to satisfy → the subject of the verb δίνω is the cause for the emotion denoted by the predicative noun ικανοποίηση and experienced by its complementto grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
to give a headache → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provoke, which is a prototypical causative verb. Here, the subject is not the agent of destruction, but its cause. Notice that if the sentence was the explosion provoked the destruction of the building, then the construction would be an LVC.full
residents seek to build consensus on the development of the territory → the semantic argument of consensus is the topic on which everybody agrees, the subject of build consensus expresses an external participant responsible for the consensus to exist.otorgar derechos to grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
dar dolor de cabeza → X has a headache, the cause of the headache, indicated as the subject of dar is not a semantic argument
la nueva ley provocó la destrucción del edificio the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provocar to provoke, which is a prototypical causative verb. Here, the subject is not the agent of destrucción destruction, but its cause. Notice that if the sentence was la explosión provocó la destrucción del edificio the explosion provoked the destruction of the building, then the construction would be an LVC.fullτιμωρίαν ποιέωtimо̄rian poieо̄ punishment.ACC do.1SG I inflict punishment → the subject of the verb is the cause of the event referred to by the nounzadati glavobolju to give a headache→ X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument質の高い演奏が彼に聴衆の高い評価をもたらした(こと)quality.gen high performance.nom he.dat audience.gen high evaluation.acc brought (the fact)His high-quality play earned him high praise from the audience → The subject is the cause of the 'high praise' from the audiencehoofdpijn geven → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argumentMarek dał mi prawo wyboru Marek gave me the right to choose→ Marek is not a semantic argument of prawo right but he is the cause of the right
dać podstawy prawne to give legal foundation
nakładać na kogoś powinność to put a duty on sb.
narazić kogoś na straty to expose someone to losses
stawiać komuś cel to set an aim to someone
ślady krwi wzbudziły podejrzenia policji the traces of blood raised suspicion to the policeBombardamentul a provocat moartea multor civili. The bombing provoked the death of many civilians.→ Many civilians (mulți civili) died and their death (moarte) was provoked by the bombing (bombardamentul)Борко је Марији задао бриге Borko je Mariji zadao brige Borko gave to Marija worries Borko worried Marija → Marija has a headache, the cause of the headache, indicated as the subject of задао zadao give is not a semantic argument of бриге brige worriesإنطباع أعطى → the subject of أعطى to give is not what is causing إنطباع the impressionТози инцидент подрони авторитета на кандидата This incident undermined the authority of the candidate→ Инцидентът incident is neither a semantic argument of the authority nor its cause(OEG) 𓂧 𓊹𓋴𓍿𓈒 𓁷 𓋴𓆓𓏏𓊮 (w)ṭ(.w) śnčr ḥr śč̣.t The incense (śnčr) was-put ((w)ṭ(.w)) on (ḥr) the fire (śč̣.t). The incense was set on the fire. (PT 376b, W) → The subject of the passive verb form is not the cause of the event.παίρνω απάντησηperno apantisi to take an answer to receive an answer → the subject of παίρνω is not what is causing a replyto relieve a headache → the subject of relieve is not what is causing a headache
to give birth → tricky case, since the subject of give actually is a semantic argument of birth, so it cannot be its cause. This construction must be annotated as VID (it does not pass test LVC.4 either).
excessive heat provokes fire → even though provoke prototypically expresses a cause, in this case fire is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.causecalmar un dolor de cabeza to relieve a headache → the subject of calmar to relieve is not what is causing a headache
dar a luz to give birth→ tricky case, since the subject of dar to give actually is a semantic argument of a luz, so it cannot be its cause. This construction must be annotated as VID (it does not pass test VPC.4 either).
un calor excesivo provoca incendios excessive heat provokes fires→ even though provocar prototypically expresses a cause, in this case incendios is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.causeσυγγνώμης τυγχάνειsuggnо̄mēs tugkhanei pardon.GEN get.3SG he gets pardoned → the subject of the verb is not the cause of the event referred to by the nounde hoofdpijn verlichten → the subject of verlichten is not what is causing a headacheIncydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is neither a semantic argument of the confidence nor its cause (it is the opposite of the cause)
komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is neither a semantic argument of wybory vote not its cause
mocny zapach uśpił czujność psów the strong scent lulled the vigilance of the dogs → the scent is the opposite of the cause of vigilancecăldura excesivă provoacă incendii → even though provocaprovoke prototypically expresses a cause, in this case incendiufire is not a predicate and should not pass test LVC.1, so the construction cannot be annotated as LVC.causeMarija ima brata Marija has a brother → Marijin brat Marija's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
Marija je poslala pismo Marija sent a letter → Marijino pismo Marija's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
Marija ima mnenje Marija has an opinion and more generally, cases of imeti to have + a noun refering to the state of having a mental content (mnenje, predstava, dvom opinion, idea, doubt ) → Marijino mnenje Marija's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
Marija je postavila vprašanje/trditev Marija posed a question/statement and more generally, cases of postaviti make + a noun refering to a speech act → Marijino vprašanje Mary's question refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.fullБорко ће Марију ослободити брига Borko će Mariju osloboditi briga Borko freed Marija of her worries. → Borko, the subject of ослободити osloboditi relieve is not what is causing бриге brige worriesConstructions annotated as LVC.cause involve:
- verbs that are typically used to express the cause of predicative nouns in general (e.g. cause, provoke), or
- verbs that are only used to express the cause of particular predicative nouns (e.g. grant in to grant a right).
When the construction involves a typically causative verb (e.g. cause, provoke), it might seem counter-intuitive to annotate it as VMWE because it looks perfectly regular, not presenting any VMWE idiosyncrasy. However, it turned out difficult to distinguish idiosyncratic from regular LVC.cause, so both should be annotated, like for LVC.full. In other words, some LVC.cause constructions are compositional and can be understood as complex predicates with a causal support verb, regardless of their compositionality.
Typically causative verbs (e.g. cause, provoke) can sometimes be light. In this case, according to the LVC decision tree, LVC.full has priority over LVC.cause. For instance, the announcement provoked an unexpected reaction should be annotated as LVC.full and not LVC.cause, although provoke is a typically causative verb. Indeed, reaction has two arguments (reaction of X to Y), one of which is the subject of the verb (test LVC.2 passes). In other words, typically causative verbs may be used in either LVC.full or LVC.cause, depending upon whether the cause subject of the verb is a normal, canonical argument to the predicative noun (LVC.full) or an "external" non-canonical cause (LVC.cause).
Some verbs could be considered causative, but their interpretation goes beyond purely indicating the cause of the event/state. Therefore, you should NOT annotate as LVC.cause constructions involving:
- verbs which encode a manner of causation:
to call a meeting entails communication to schedule the meeting
to hold a meeting entails leadership
to organize classes entails preparationσυνἠγαγεν ἐκκλησίανsunēgagen ekklēsian lead.together.3SG meeting.ACC he held a meeting entails leadership - verbs which encode modality:
to allow dialogue entails permission
to foster dialogue entails assistance
to require dialogue entails necessity - aspectual verbs whose subject is a semantic argument of the noun:
αρχίσαμε τη συζήτησηarchisame ti syzitisi we started the conversation
τελειώσαμε τη συζήτησηteliosame ti sizitisi finished.01.PST the conversation We finished the conversationwe started the meeting
we ended the meeting
we continued the meetingἄρχειν τοῦ λόγουarkhein tou logou start the speech to begin speakingwe begonnen de vergadering we started the meeting
Problematic cases and remarks
Syntactic variantsThe (single or compound) noun n functions as a regular syntactic dependent, so LVCs exhibit regular syntactic variants.
قرار أخذ make a decision → المدير أخذه الذي القرار the decision that made by the directorвзема решение → решението, което президентът взе the decision that the president madeeine Entscheidung treffen → die Entscheidung die der Direktor zu treffen hatte.παίρνω μία απόφαση → η απόφαση που πρέπει κάποιος να πάρει.perno mia apofasi → i apofasi pu prepi kapios na pari take a decision → the decision one has to take to make a decision, the decision I have to makemake a decision → the decision that the director has to make.tomar una decisión → la decisión tomada por la directora.erabaki bat hartu decision one take to make a decision→ zuzendariak hartutako erabakia director taken decision the decision (which was) made by the directorprendre une décision → la décision prise par la directrice.δόξαν ἣν ἔνιοι ἔχουσι περὶ τῆς Νικοφήμου οὐσίαςdoxan hēn enioi ekhousi peri tēs Nikophēmou ousias opinion which some have.3PL about the Nicophemos’ property the opinion which some hold about Nicophemus' property is a syntactic variant
δόξαν ἔχουσιdoxan ekhousi opinion.ACC have.3PL they hold an opinion is the canonical formdonijeti odluku to make a decision → odluka koju je morao donijeti direktor the decison that the director had to makeprendere una decisione → la decisione che il direttore ha dovuto prendere.een beslissing nemen → de beslissing die de directeur moet nemen.wziąć udział to take participation.ACCto take part → wzięcie udziału taking.GER participation.GENtaking part, biorący udział taking.PART participation.ACCtaking parttomar banho take shower → o banho que eu tomei estava bom the shower which I took was gooda lua o decizieto make a decision → decizia pe care directorul trebuie să o ia the decison that the director has to make.dati ime nekomu to give (somebody) a name to name (somebody) → the object receives a name and this action implies that as a result he/she is named. Therefore person who gives a name causes that something is named. The subject of the verb is not its semantic argument.
narediti konec nečemu to make an end (to something) to end (something) → the result of this action is that something is finished, which is caused by the subject of narediti to makeзадати некоме бриге zadati nekome brige to give worries to sb. to worry sb. → бриге које је Борко задао Марији brige koje je Borko zadao Mariji The worries Borko gave to MarijaAll LVC tests should be applied to a neutral form. If there is the neutral form is not totally syntactically unmarked (for instance it must be in passive voice), this is an indication that the target construction might not be an LVC, but a verbal idiom instead.
Selection of the verbIn many cases of LVCs, it can be said that there is some degree of selection of the verb by the noun.
جولة ب قام make a walk vs سباق ب قام make a raceвземам решение to make a decision vs *вземам отговорност to take responsibility
имам право to be right vs *притежавам правоeine Entscheidung treffen a decision meet make a decisionvs.*eine Entscheidung machen a decision make vs. *einen Beschluss treffen a resolution meetκάνω διάλειμμα vs. #παίρνω διάλειμμα
παίρνω απόφαση vs.#κάνω απόφασηhave a walk vs *have a race
run a race vs *run a walktomar una decisión take a decision make a walk vs *dar una decisión give a decision but darse/tomar una ducha give.self/take a showerpauso eman step give to take a step vs. ?pauso egin step do
bisita egin visit do to pay a visit vs. bisita eman visit givefaire une marche make a walk take a walk vs *procéder à une promenade perform a walk but faire/procéder à une enquête make/perform an inquiryχάριν δίδωμιkharin didо̄mi gratitude.ACC give.1SG I show gratitude #χάριν ποιέομαιpostaviti pitanje to put a question to pose a question vs *postaviti odgovorprendere una decisione take a decision make a decisionvs.*fareuna decisione make a decision vs. *prendere una conclusione take a conclusioneen wandeling maken a walk make to take a walk vs.*een race maken a race makewziąć udział to take participation vs. *pobrać udział
mieć rację to have rightto be right vs. *posiadać rację to possess rightfazer uma prece to make a prayer vs. *dar uma prece to give a prayer but fazer/dar uma caminhada to make/give a walka da divorț to give divorce to divorce vs. *a oferi divorțdati nasvet to give an advice → the subject of dati give cannot cause an adviceимати право imati pravo to have right to be right vs. *поседовати право *posedovati pravo to possess rightYet some regularities exist. For example, large classes of nouns function with have (e.g. +property) or commit (+negative achievement). Therefore, we chose not to retain the selection of the verb as a criterion for LVC categorization. Instead, the decision tree should be applied to decide whether a candidate should be annotated as LVC.
Many authors distinguish support verbs from light verbs, still others differentiate between true light verbs and vague action verbs.
On the one hand, we take a narrower scope than what is usually considered in the literature by ignoring aspectual support verbs (except when aspect is morphological). We believe that aspectual verbs do contribute an additional (change of state) meaning to the expression, and most of the time they are completely productive, not forming interesting VMWEs. For instance, for the predicative noun walk, we will consider the light verb to have, but not the aspectual verbs to start, to pursue, to stop a walk. Thus, to have a walk is an LVC.full. Note that for some nouns such as bloom, which are in itself inchoative, we do consider to come into bloom as LVC.full, as both the verb and the noun are inchoative, so the verb does not add any semantics to the noun.
On the other hand we take a broader scope than what is usually considered in the literature by taking in cases in which the verb has light semantics per se (it only bears morphology, such as the tense and mood, in any case), which hence cannot be described as "bleached" as is usually said of support verbs. For instance, whereas to pay does not have its usual meaning in to pay a visit, it cannot really be said that commit does not have one of its meanings in commit a crime (note that commit can be used with any negatively charged achievement noun, e.g. suicide, crime, fraud, felony...). Nonetheless, we annotate to commit a crime as LVC.full since it passes all tests.
One test often used in the literature is the existence of a morphologically related verb or adjective that means the same as the LVC. For instance, to make a visit is equivalent to to visit, to have an illness is equivalent to to be ill. Note however that it is neither sufficient nor compulsory:
- some LVCs have no derivationally-related equivalents, such as to have a flu, to have faith and to commit a crime;
- some constructions that are not LVCs do have a derivationally-related equivalent such as to write an email and to email;
- some LVCs have derivationally-related equivalents that do not mean the same as the LVC, such as to make a face and to face, or that have different argumental structure from the LVC, such as to have a problem and to be problematic.
Nonetheless, it might be useful to reason about the derivationally-related equivalents to decide whether a noun is predicative in test LVC.1. Therefore, here are some useful questions that might help deciding about the predicative nature of the noun in the LVC candidate
Verb paraphrase Is the abstract noun derivationally related to a verb with the same semantics? Then, there is probably a semantic argument, which coincides with the subject of the verb, so test LVC.1 passes:
القرار أحمد أخذ Ahmed made a decision = أحمد قرر Ahmed decidedвземам решение to make a decision = решавам to decide
правя грешка to make a mistake = греша/сгрешавам to make a mistakeο Γιάννης παίρνει μία απόφαση John makes a decision = ο Γιάννης αποφασίζει John decides
ο Γιάννης κάνει ένα ταξίδι John makes a trip = o Γιάννης ταξιδεύει
ο Γιάννης έχει θάρρος John has courage = ο Γιάννης είναι θαρραλέος John is courageous → and, more generally, characteristics and attributes
ο Γιάννης έχει πείνα/δίψα John has hunger/thirst = ο Γιάννης πεινάει/διψάει John is hungry/thirsty → and, more generally, physical sensations
ο Γιάννης έχει πάθος/φόβο/θυμό John has passion/fear/anger = ο Γιάννης παθιάζεται/φοβάται/θυμώνει John is passionate/afraid/angry → and, more generally, feelings, emotions, statesJohn makes a decision = John decides
John has a walk = John walksJuan toma una decisión Juan makes a decision = Juan decide Juan decides
Juan da un paseo Juan takes a walk = Juan pasea Juan walksJonek erabakia hartu du = Johen erabaki du John decision-the taken has = John decided has John has made a decision = John has decidedπορείαν ποιέομαιporeian poieomai march.ACC do.1SG I march = πορεύομαιIvan donosi odluku Ivan takes decision = Ivan odlučujeIvan decides
Janica jeodnijela pobjedu Janica carried away a win = Janica je pobijedila Janica wonJohn neemt een beslissing John makes a decision = John beslist John decidesJan podejmuje decyzję John takes decision = Jan decyduje John decides
Ewa odniosła zwycięstwo Eva carried away a victory = Ewa zwyciężyła Eva wonIon ia o decizie John makes a decision = Ion decide John decidespostaviti vprašanje to pose a question → vprašanje, ki ga je moral postaviti the question that he had to poseМарко је донео одлуку Marko je doneo odluku Marko brought a decision Marko made a decision = Марко је одлучио Marko je odlučio Marko decided
Марко је узео учешће Marko je uzeo učešće Marko took participation = Марко је учествовао Marko je učestvovao Marko participatedAdjective paraphrase: Is the abstract noun derivationally related to an adjective with the same semantics? Then, there is probably a semantic argument, which coincides with the noun that is modified by the adjective, so test LVC.1 passes.
شجاعة أحمد ال يملك Ahmed has the courage = شجاع أحمد Ahmed is courageousимам смелост to have courage = съм смел to be courageous
нямам търпение to not have patience = съм нетърпелив to be impatient
нося отговорност to carry responsibility = съм отговорен to be responsibleο Γιάννης έχει θάρρος = ο Γιάννης είναι θαρραλέοςo Yanis echi θaros = o Yanis ine θaraleos → and, more generally, characteristics and attributes
Ο Γιάννης έχει δύναμη = Ο Γιάννης είναι δυνατόςO Γianis echi δinami = O Γianis ine δinatos → and, more generally, characteristics and attributesJohn has courage = John is courageous → and, more generally, characteristics and attributes
John has hunger/thirst = John is hungry/thirsty → and, more generally, physical sensations
John has passion/fear/anger = John is passionate/afraid/angry → and, more generally, feelings and emotions
John has problems/difficulties = Something is problematic/difficult for John → and, more generally, statesJuan tiene miedo Juan has fear = Juan es miedoso Juan is easily scared → and, more generally, characteristics and attributes
Juan tiene hambre Juan has hunger = Juan está hambriento Juan is hangry → and, more generally, physical sensationsAnek itxaropena du = Ane itxaropentsu dago Ane hope has = Ane hopeful is Ane has hope = Ane is hopeful → and, more generally, characteristics and attributes
Anek = Ane gosetuta Ane hunger has = Ane hungry is Ane has hunger = Ane is hungry→ and, more generally, physical sensationsνοῦν ἔχωnoun ekhо̄ sense.ACC have.1SG I am sensible = ἔννοοςimati strpljenja to have patience = biti strpljiv to be patient
nositi odgovornost to carry responsibility = biti odgovoran to be responsibleJohn heeft moed John has courage = John is moedig → and, more generally, characteristics and attributesmieć odwagę to have courage = być odważnym to be courageous
mieć straty to have losses = być stratnym to have lost sth
mieć sens to have a sense to make sense = być sensownym to be reasonableavea curaj to have courage = fi curajosto be courageousимати храбрости imati hrabrosti to have courage = бити храбар biti hrabar to be courageousSynonym verb/adjective paraphrase: Does the abstract noun have a synonym/hypernym derivationally related to a verb or adjective with the same semantics? Then, the questions above can be applied to the synmonym verb/adjective.
Иван и Мария постигнаха консенсус Ivan and Maria reached a consensus = Ivan and Maria agreed → consensus has no corresponding verb or adjective, but agreement is a synonymέχω τη γνώμη echo ti gnomi I have the opinion I think = πιστεύω → γνώμη has no corresponding verb or adjective, but πίστη,άποψη are synonymsJohn and Mary reach a consensus = John and Mary agree → consensus has no corresponding verb or adjective, but agreement is a synonym
John has a chance to do something = John is likely to do something → chance has no corresponding verb or adjective, but likelihood is a synonymAnek min eman dio Joni = Anek Jon mindu du Ane pain given has to-Jon = Ane Jon hurt has Ane has hurt JonRadnici i uprava postigli su konsenzus workers and managment reached consensus = Radnici i uprava su se dogovorili workers and managment agreed→ konsenzus consensus has no corresponding verb or adjective, but dogovor agreement is a synonymmieć 190 cm wzrostu to have 190 cm of height to be 190 cm tall = mierzyć 190 cm tp measure 190 cm to be 190 cm tall
dokonać inwazji to perform an invasion = wtargnąć to invadeda voie=permiteМаја има шансе да нешто уради Maja ima šanse da nešto uradi = Маја може нешто да уради Maja može nešto da uradi → шанса šasna has no corresponding verb or adjective, but моћи/могућност moći/mogućnost is a synonymThe existence of a related verb is not a definitive tests, but a hint that the noun is probably predicative. Since determining whether a noun is predicative is tricky, we advise language teams to provide additional documentation and examples for borderline cases.
The previous version of the guidelines had a syntactic test which you can still use to verify if the verb's subject is an argument of the noun. However, this test was considered hard to apply in the previous guidelines, and is not mandatory anymore.
The syntactic test consists in trying to add the semantic argument as a complement of the noun in the presence of the verb. In other words, does the noun n, in the presence of v, prohibit at least one syntactic argument a which it normally licensed in the absence of v?
An alternative formulation for this test is the following: Let s be the subject of v, and let r be the semantic role that s plays with respect to the noun n. Is it prohibited for r to be realized both by s and by a syntactic argument a of n, except when a is in the whole–part relation with s?الميزانية قرار الوزير أخذ + قرار الحكومة في الميزانية → في الميزانية قرارالحكومة أخذ الوزير — الوزير the decider cannot be a modifier of decision قرارПетър Стоянов взе решението да подпише договора Ivan made the decision to sign the contract + решението на президента да подпише договора → *Петър Стоянов взе решението на президента да подпише договора — the noun cannot be modified by the person performing the act/event (which is the subject)Die Königin hat dem Premierminister einen Besuch abgestattet the Queen has paid a visit to the Prime Minister + ein Besuch der Dame beim Premierminister a visit of the Lady to the Prime Minister *Die Königin hat einen Besuch der Dame beim Premierminister abgestattet*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
Paul hat eine Entscheidung über das Budget getroffen Paul made a decision on the budget + die Entscheidung des Rates über das Budget the council's decision on the budget → *Paul traf die Entscheidung des Rates über das Budget *Paul made the committee's decision on the budget — the decision maker cannot modify decisionο πρωθυπουργός έκανε επίσημη επίσκεψη στον Αμερικανό πρόεδροo proθypurgos ekane episimi episkepsi ston amerikano proedro + η επίσκεψη του πρωθυπουργού στον Αμερικανό πρόεδρο
ο πρωθυπουργός έκανε επίσημη επίσκεψη του υπουργού στον Αμερικανό πρόεδρo proθypurgos ekane episimi episkepsi tu ypurgu ston amerikano proedro — the visitor cannot be a modifier of επίσκεψηThe Queen paid a visit to the Prime Minister + a visit of the Lady to the Prime Minister → *The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
Paul made a decision on the budget + the committee's decision on the budget → *Paul made the committee's decision on the budget — the decision maker cannot modify decision
Paul had a discussion with Mary+ Peter's discussion → *Paul had Peter's discussion with Mary
Bjarnson scored a goal + Arnason's goal → *Paul scored Arnason's goal but Paul scored the goal of Iceland — the scoring entity can only modify goal in the last case, when they are part of the Iceland teamLa reina hizo una visita al primer ministro The Queen paid a visit to the prime minister + una visita de la primera dama al primer ministro a visit of the first Lady to the prime minister→ *La reina hizo una visita de la primera dama al primer ministro The Queen paid a visit of the first lady to the first minister— the visitor cannot be a modifier of visita
Pablo tomó una decisión con respecto al presupuesto Pablo made a decision on the budget + la decisión del comité con respecto al presupuesto the committee's decision on the budget→ *Pablo tomó la decisión del comité con respecto al presupuesto Pablo made the committee's decision on the budget— the decision maker cannot modify decisiónIkasleek arreta jarri zioten irakasleari +lagunen arreta The-students attention put to-the-teacher + friends' attention The students paid attention to the teacher + their friends' attention → *Ikasleek lagunen arreta jarri zioten irakasleari The students paid their friends' attention to the teacher — the person paying attention cannot be a modifier of arretaLa ministre a rendu une visite aux victimes + la visite de la ministre aux victimes → *La ministre a rendu une visite du président aux victimes — the visitor cannot be a modifier of visite
Bjarnson a marqué un but + le but d'Arnason → *Paul a marqué le but d'Arnason but Paul a marqué le but de l'Islande — the scoring entity can only modify but (goal) in the last case, when they are part of the Iceland teamUčiteljica je donijela odluku u vezi s izletom The teacher made a decision regarding the excursion + učenikova odluka u vezi s izletom pupil's decision regarding the excursion → *učiteljica je donijela učenikovu odluku u vezi s izletom — the decision maker cannot modify decisionIl primo ministro ha preso la decisione di dimettersi the Prime Minister decided to resign + le dimissioni del governo the resignation of the government → *Il primo ministro ha preso la decisione del governo di dimettersi — the resigner cannot be a modifier of resignationDe koningin heeft de premier een bezoek gebracht the Queen has paid a visit to the Prime Minister + een bezoek van de dame aan de premier a visit of the Lady to the Prime Minister *De koningin heeft een bezoek van de dame aan de premier gebracht*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visitPaweł złożył rezygnację ze stanowiska dyrektora Paweł submitted a resignation from the position of the director Paweł tendered his resignation from the director position + rezygnacja Piotra→ *Paweł złożył rezygnację Piotra ze stanowiska dyrektora Paweł tendered Piotr's resignation from the director position - the resignation cannot be modified by the resigning person
Paweł prowadzi rozmowy → *Paweł prowadzi rozmowy Piotra Paweł leads Piotr's talks , Paweł prowadzi rozmowy komisji Paweł leads the talks of the commission - the discussing entity komisjacommission can only modify rozmowytalks if Paweł belongs to the commission.
Jan otrzymał wymówienieJan received a dismissal + wymówienie dla Pawła dismissal for Paweł → *Jan otrzymał wymówienie dla PiotraJoão está tomando banho John is taking shower + o banho do Pedro Pedro's shower → *João está tomando o banho do Pedro — the bath cannot be modified by a bath taker
Pedro sofreu prejuízo com a compra Pedro suffered finantial loss with the purchase + o prejuízo do José José's finantial loss → *Pedro sofreu o prejuízo do José com a compra — the financial loss cannot be modified by the affected entity
A Maria fez um aborto Maria made an abortion + o aborto da Joana Joana's abortion → #A Maria fez o aborto da Joana — the noun cannot be modified by another patient
O médico realizou o parto com sucesso The doctor performed the childbirth with success + o parto do Dr. Pedro Dr. Smith's childbirth → *O médico realizou o parto do Dr. Pedro com sucesso — the childbirth could be modified by the mother (patient) but not by another doctor (agent).Paul a dat sfaturi surorii salePaul gave advice to his sister + sfatul lui Petre Peter's advice → Paul a dat sfatul lui Petre surorii sale Paul gave Peter's advice to his sister — sfatul the advice cannot be modified by its authorAleš si dela skrbi Aleš makes worries Aleš has worries = Aleš je zaskrbljen Aleš is worried → and, more generally, feelings and emotionsБорко је водио расправу с Маријом Borko je vodio raspravu s Marijom Borko led a discussion with Marija Borko had a discussion with Marija + Петрова расправа +Petrova rasprava Borko's discussion → *Борко је водио Петрову расправу с Маријом *Borko je vodio Petrovu raspravu s Marijom Borko had Peter's discussion with MarijaThe rationale for this tests is that a semantic argument n cannot be realized as its syntactic dependent, since it is already realized as v's syntactic dependent instead (usually as v's subject). For instance the noun visit takes two semantic arguments, the visitor and the visited entity, as in the visit of the Queen to the Prime Minister. When used in to pay a visit, the visitor semantic argument is realized as the subject of to pay (The Queen paid a visit to the Prime Minister), and cannot be realized at the same time within the NP headed by visit (*The Queen paid a visit of the Lady to the Prime Minister).
Note that the syntactic formulation may be tricky to apply. It is sometimes possible to add the semantic argument as a complement of the noun in the presence of the verb, if we change the interpretation of the argument (and thus its thematic role). For instance, even though the construction John took Luke's decision may be acceptable, the interpretation would be comparative (John took a decision that Luke should have taken). Therefore, the test passes since the verb is still connecting a predicate (decision) to its argument (John, the decider).
Section 5.3
Verbal idioms (VID)
Verbal idioms constitute a universal category. A verbal idiom (VID) has at least two lexicalized components including a head verb and at least one of its dependents. The dependent can be of different types. Here are some examples:
- Subject
- Direct object
- Circumstantial or adverbial complement
أسهم ال ارتفعت stock soaredброят му се ребрата be counted someone's (possessive pronoun) ribs (someone) to be very thin and skinnyein kleines Vöglein hat mir gezwitschert a little bird told me(OEG) 𓄫 𓄣 𓎡 𓆓𓏏𓇿 ꜣw ꞽb ⸗k č̣.t Your heart (ꞽb) shall-be-long (ꜣw) eternally (č̣.t). You shall be glad eternally. (Borchardt 1907: 80, fig. 55)μου είπε ένα πουλάκιmu ipe ena pulaki me told a little-bird a little bird told me
κόβει το μάτι μου kovi to mati mu cut the eye my to noticea little bird told someonetu hora ha llegado your time has arrived your time has comeἐὰν θεὸς ἐθέλῃean theos ethelē if god want.3SG if possibleun uccellino disse a qualcuno気がつく mind.nom touch notice/realise
気分が晴れる feeling.nom clear feel bettergalva kūp the head is steaming knowsto do something with great mental effortboontje komt om zijn loontje he that mischief hatches, mischief catcheslicho wie devil knowsI have no ideaa sua hora chegou your time has arrived your time has comea șoptit o păsăricăwhispered a bird little a little bird told someonesrce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something , sekira pade v med komu (someone's) hatchet falls in honey one gets luckyђаво да некога носи đavo da nekoga nosi may the Devil carry someone to hell with someone
Бог некога погледао Bog nekoga pogledao God looked at someone to be lucky
ђаво је умешао прсте đavo je umešao prste the Devil mixed in his fingers an unfavorable outcome
пао некоме мрак на очи pao nekome mrak na oči darkness fell on someone's eyes to blow a fuseإجتماع أحيى revived a meeting to lead a meetingгушна букета hug the bunch of flowers to dieer hat den Schuss nicht gehört he did the shoot not hear it takes him a long(er) time to understand sth(OEG) 𓐣𓂝𓏝 𓃹𓈖𓇋𓋴 𓌃𓅱𓏝 𓈖 𓋹𓈖𓐍𓅱 wč̣ꜥ Wnꞽś mṭw n ꜥnḫ.w Unas (Wnꞽś) shall-separate (wč̣ꜥ) the word (mṭw) for (n) the living (ꜥnḫ.w). Unas shall judge the living (PT 273b, W)κάνω σεφτέkano sefte
λαμβάνω μέροςtake part
κρατάω τα μπόσικαkratao ta bosikato kick the bucketestirar la pata to strech the leg kick the bucketδίκην λαμβάνωdikēn lambanо̄ justice.ACC take.1SG I punishtirare le cuoia空気を読む atmosphere.acc read read the situationatstiept kājas to strech one's legs to diehet ijzer smeden als het heet is iron forge while hot make hay while the sun shines; strike while the iron is hotudać Greka to pretend to be a Greekto pretend not to understandbater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)ustreliti kozla to shoot the goat to say or do something stupidдизати нос dizati nos to raise one's nose to be haughty
добити ногу dobiti nogu to get a leg to get dumped
држати банку držati banku to hold a bank to dominate the conversationالحديد وهو ساخن ضرب hit the iron and it is hot strike while the iron is hotудрям в гръб hit in the back to stab in the back
правя сам да си говори make (someone) to talk to himself to drive (someone) crazyetwas wie warme Semmeln verkaufen sth. like warm bread rolls to sell sth. fast and easy(OEG) 𓁹 𓂋 𓄣 𓎡 ꞽr (⸗ꞽ) r ꞽb ⸗k (I) (⸗ꞽ) shall-do (ꞽr) according-to (r) your (⸗k) heart (ꞽb). I will do what you want. (Duell 1938: pl. 162)φέρω βαρέωςfero vareos bring heavily resentto take something with a pinch of salt, to sell like hotcakes, to strike while the iron is hot, to come off with flying colorscoger algo con pinzas to hang something with pegs take something with a pinch of saltεἰς χεῖρας ἐλθεῖνeis kheiras elthein into hand.PL go.INF to surrenderprendere qualcosa con le pinze
battere il ferro finché è caldo気になる mind.dat become be on one's mindpalaist vējā to let go in the windto wasteiets met een korreltje zout nemen to take something with a pinch of saltwiercić komuś dziurę w brzuchu to drill a hole in one's bellyto intrusively solicit someone, to insist too muchlevar em conta to bring in account to take into account
ir ao ar go to the air to go on aira lua în considerare to bring in account to take into accountspati kot ubit to sleep like dead to sleep soundlyпродаје се као алва prodaje se kao alva to sell like halva to sell well
ударити на велика звона udariti na velika zvona to bang on big bells to spread the news
бити као запета пушка biti kao zapeta puška to be like a tense rifle to be ready for actionIt is often challenging to distinguish VIDs from other VMWE categories if only one dependent of the head verb is lexicalized. The VMWE categorization depends on the category of this dependent:
- Reflexive clitic or particle: the VMWE is either an IRV (reflexive pronoun) or an IVPC (particle), never a VID.
- Verb with no lexicalized dependent: fine-grained tests need to be applied in order to discriminate between a MVC and a VID. See the section on Structural tests.
- Extended nominal phrase: fine-grained tests need to be applied in order to discriminate between an LVC and a VID. See the section on Structural tests.
With a dependent of any other category, the VMWE is always a VID, including the following:
- Adjectival phrase
- Verb with lexicalized dependents
- Relative clause
- Non-reflexive pronoun
постигам своето to achieve one's ownto have it my wayschwarz fahren to drive black to take a ride without a ticketκάνω αρπαχτή
κρατάω πισινήto come clean, to stand firmjugar sucio to play dirty to play dirtyuscirne pulitiうまくいく good.ly go go wellpanākt savu to achieve one's ownto have it my wayzwartrijden to drive black to take a ride without a ticketzrobić swoje to do one's ownto do what one is supposed to do
tykać cudze to touch someone else'sto take something that does not belong to you
dopiąć swego to button up one's ownto fulfill one's plansto jogar sujo to play dirtya juca murdar to play dirtybiti zelen od zavisti to be green with envyбити зелен biti zelen to be green to be a greenhorn/to be inexperiencedне мога две думи на кръст да кажа I cannot say two words crossing each other to be unable to speak or express oneself → две думи на кръст да кажа is a clause
правя сам да си говори make someone talk to himself to drive someone crazy → сам да си говори is a clauseέπεσε να πεθάνειto make ends meetfar quadrare i contide handen ineenslaan hands joined hit join forcesсастављати крај с крајем sastavljati kraj s krajem to join one end to another to make ends meet → крај са крајем kraj sa krajem is a clauseще видиш откъде изгрява слънцето you will see where the sun rises from(angrily) you will get what you deserve, you will be punishedwissen wo es langgeht to know where things are heading to know on which side one's bread is butteredδεν ξέρω πούν παν τα τέσσεραto know on which side the bread is butteredsaber de qué pie cojea to know of which foot (he/she) limps to know someone inside outnon sapere da quale parte stareparādīt, kur vēži ziemo to show where crayfish hibernateto cause someone the unpleasantness he deservesde klok horen luiden, maar niet weten waar de klepel hangt to hear the bell ring, but not know where the clapper hangs ≈to not know the details of somethingwiedzieć, skąd wieje wiatr to know where wind blows fromto know on which side your bread is buttered, to know how to take advantage of the situationsaber onde pisar know where to-step to know the way to succeed in something
mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revengea ști cu ce se mănâncă to know with what CL.Refl. eats to knwo what it is aboutvedeti koliko je ura to know what time it is to realize the truthзнати у ком грму лежи зец znati u kom grmu leži zec to know in which bush the rabbit lies to know what the main problem is
не знати где је некоме глава ne znati gde je nekome glava not to know where one's head is to be out of one's mind
дај шта даш дај šta daš give what you give be satisfied with anything that is given to youвтасахме я we proved it.FEM (as in bread: raise in volume due to yeast) to fall into a difficult situationes gibt it gives there isτα καταφέρνωta kataferno
την πατάωtin pataoto make itl'emporter to take it away to winle ha preseEj tu galīgi!Go you ultimately! Go to hell!het eens zijn it agreed be to agreePolish does not seem to have this type of VMWEsdá-lhe João! give to him/her, João! show them what you got, João!a o șterge to her delete to fly the coop
a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletiveucvreti jo to escape her to escape something/someone by runningn.a.Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of VIDs.
تجري الرياح بما لا تشتهي السفن Winds blow counter to what ships desireкраставите магарета се надушват отдалече the itchy donkeys smell each other from afaralike people are attracted to each otherRom wurde nicht an einem Tag erbaut Rome was not build in a day wer A sagt muss auch B sagen who says A must also say B you must finish what you startστο σπίτι του κρεμασμένου δεν μιλάνε για σχοινίin-the house the.GEN hunged-man.GEN not speak.03.PL about ropeRome was not built in a day
Fortune favors the bold
The pleasure is mine
I beg your pardon!Roma no se construyó en un día Rome was not build in a day donde dije digo, digo Diego where said.I said, say.I Diego to do or give something and then take it back, to retract oneselfσυνῄδη οὐδὲν ἐπισταμένῳsunēdē ouden epistamenо̄ know.PLP.1SG nothing know.PTC I know that I know nothingRoma non è stata costruita in un giorno
La fortuna aiuta gli audaci
Il piacere è mioRīga nekad nebūs gatava Riga will never be ready (made)de klok horen luiden, maar niet weten waar de klepel hangt to hear the bell ring, but not know where the clapper hangs ≈to not know the details of somethingtrafiła kosa na kamień met the scythe a stonesomeone rude/dishonest came across someone else who used similar methods against him/herquem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelingsUrciorul nu merge de multe ori la apă Pitcher-the not goes of many times at water The pitcher goes so often to the well that it is broken at lastPočasi se daleč pride more haste less speed
Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to lateнашла крпа закрпу našla krpa zakrpu a rag found a patch to find one's other half
било па прошло bilo pa prošlo happened and it's done let bygones be bygones
рекла казала rekla kazala said and told hearsayIf more than one dependent of the head verb is lexicalized, then the candidate VMWE is always classified as a VID.
لسانه القط أكل the cat ate his tongueзаравям глава в пясъка to hide head in sandto pretend not to see a problemdie Katze aus dem Sack lassen to let the cat out of the bagβάζω λάδι στη φωτιά vazo ladi sti fotia put oil to-the firemake things worse
κάνω τη ζωή ποδήλατοkano ti zoi poδilato make.1SG the life bicycle to tortureto let the cat out of the bag, to cut a long story short, to call it a dayhacer de tripas corazón make of intestines heart to pluck up the courage
dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
dar gato por liebre to_give cat for hare to rip off, to take for a ridese faire des idées to make SELF ideas to imagine something false,s'en aller to go SELF from there to leave,il y a it has there there is話を大きくする story.acc big make exaggeratepirkt kaķi maisā to buy a cat in a bagto agree to something without knowing the necessary informationeen kat in de zak kopen to buy a pig in a poke → two dependents kat and in de zakchować głowę w piasek to hide head in sandto pretend not to see a problemtapar o sol com a peneira to hide the sun with a sieve to sugar-coata da bir cu fugiții to give tribute with fugitives.the to back awaybeseda mi je ostala v grlu word got stuck in my throat I am speechlessбежати главом без обзира bežati glavom bez obzira to run away mindlessly to bolt
бежати као ђаво од крста bežati kao đavo od krsta to run away like Satan from a cross to run like a bat out of hell
забити главу у песак zabiti glavu u pesak to stick your head in the sand to bury your head in the sand
ићи линијом мањег отпора ići linijom manjeg otpora to go with the line of least resistance to take the path of least resistanceatt sätta sig upp mot någon to sit oneself up against someone to defy someone
att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)Cases when there is no single clearly identifiable head verb, because of coordinated verbs or of an irregular syntactic structure, are also covered by the VID category.
اصبر تنل be patient you get be patientцъфна и вържа to blossom and give fruit (usually sarcastically) to prosperleben und leben lassen live and let liveέδωσε πήρεto drink and drivecoser y cantarto_sew and to_singeasy as pie, a piece of cakeἠντεβόλει καὶ ἱκετεύεēntebolei kai hiketeue supplicate.3SG and beseech.3SG he begged and beseechedleven en laten leven live and let livepluć i łapać spit and catch to be lazy, to do nothing useful
coś kogoś ani ziębi, ani grzeje something neither cools nor warms someonesomeone is indifferent to something
badż tak dobry i zrób cośbe so good and do somenthingbe so good as to do somethingpintar e bordar paint and knit to abusea tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resembleživi in pusti živeti live and let liveни лук јео ни лук мирисао ni luk jeo ni luk mirisao neither ate nor smelled an onion to be innocent
нити смрди нити мирише niti smrdi niti miriše neither stinks nor has a nice scent neither good nor badn.a.to voice act
to pretty-print
to short-circuit
to tumble dryn.a.court-circuiter to short-circuitn.a. there are no cases of compound hyphenated verbs in ROn.a. there are no cases of compound hyphenated verbs in SLрекла-казала rekla-kazala said and told hearsayIn case of several lexicalized dependents, special care must be taken to identify and also annotate embedded VMWEs.
страхувам се от собствената си сянка to fear SELF from one's own shadowto get easily scared → contains the IRV страхувам се to fear SELFto be afraideinen Plan aufstellen to set up a plan to draw up a plan → contains the VPC aufstellen to set upto let the cat out of the bag → contains the VPC to let outhacerse ilusiones make.self hopes to get your hopes up → contains the IRV hacersese faire des idées to make SELF ideas to imagine something false → contains the non-VMWEs se faire and faire des idéeseen plan opstellen to set up a plan to draw up a plan → contains the VPC opstellen to set upbać się własnego cienia to fear SELF one's own shadowto be very timid → contains the IRV bać się to fear SELFto be afraidvirar-se nos trinta turn-RCLI in-the thirty to get by contains the synonymous IRV virar-se to get by ≠ virar to turn/becomea da cărțile pe față to give cards.the on face to reaveal one's true intentions → contains the ID a da pe față to reveal
a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IRV has to be annotated as well - a three-level embeddingdelati se norca iz koga to make RCLI fool of someone to make fun of someone → contains the IRV delati se to make oneself to pretendбојати се сопствене сенке bojati se sopstvene senke to fear SELF one's own shadow to be afraid of one's own shadow → contains the IRV бојати се to fear SELF to be afraidIdioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.
съм с единия крак в гроба be with one leg in the graveto be close to death→ idiom because #с единия крак в гроба with one leg in the grave loses the meaning
съм на червено be on redto be in debt → non-VMWE because the copula can be omitted, as in в края на месеца винаги оставам на червеноat the end of the month I always get into debtsei kein Frosch be no frog be no chicken → idiom because #kein Frosch no frog loses the meaningto be dying for → idiom because #dying loses the meaning of wanting something
to be somebody → idiom because #somebody loses the meaning of being important or successful
it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutchser un pelota to be a ball to suck/butter up → idiom because un pelota a ball loses its original meaningοἷον τ`ἦνhoion t’ēn of.what.sort and was.3SG it was possible??? sprake zijn van there is some talkbyć jedną nogą na tamtym świecie to be with one leg in the other worldto be close to death→ idiom because #jedna noga na tamtym świecie one leg in the other world loses the meaning
być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
być w trakcie (czegoś) to be in the road (of sth)to be doing sth → non-VMWE because the copula can be omitted, as in wyszedł w trakciezebraniahe went out during the meetingser alguém na vida to be somebody in life to be somebody → idiom because #alguém na vida loses the meaning
não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando gregoa fi ușă de biserică to be door of church to be honest → idiom because #ușă de biserică loses the meaning
a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaningbiti trn v peti komu to be a thorn in somebody's heel to be a big problem, obstacle → idiom because #trn v peti loses the meaningбити неко и нешто biti neko i nešto to be someone and something to be a somebody →idiom because #некоsomebody and #нештоsomething loses its meaning of being important or succesful
бити једном ногом у гробу biti jednom nogom u grobu to be with one leg in the grave to be close to death →idiom because #једном ногом у гробу with one leg in the grave loses its meaning
бити зелен biti zelen to be green to be a greenhorn/to be inexperienced → idiom because #зелен green loses its meaningNote that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.
Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings. Some authors argue though that partial semantic compositionality can be obtained via decomposability, e.g. to spill the beans is compositional provided that to spill is paraphrased as to reveal and the beans as a secret
VID-specific decision tree:
In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a VID. Note however that this tree is to be applied only after it was referred to by the generic decision tree containing structural tests.- Apply test VID.1 - [CRAN: Candidate contains cranberry word?]
- It is a VID, exit.
- Apply test VID.2 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.4 - [MORPHSYNT: Regular morphosyntactic change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- Apply test VID.5 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
- It is a VID, exit.
- It is not a VID, exit
Test VID.1 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- it is a VID
- further tests are required
хващам натясно catch in a tight place to coerce, to pressure → натясно is only used in MWEs
правя на бъзе и коприва to turn into elder and nettle to scold, to tell off → бъзе is an old word, very rarely used independently
вземам предвид, имам предвид to → предвид (as adverb) is only used in MWEs
стоя диван чапраз to stay upright as in Osman council to stay ready to serve → чапраз is an old word, very rarely used independentlysich um etw. scharen to gather around something → scharen is not a stand-alone wordμάλλιασε η γλώσσα μου maliase i glosa mu is-full-of-hair-3SG the-SG.NOM tongue-SG.NOM my-SG.GEN.POSS to repeat the same thing again and again → μάλλιασε is not a stand-alone wordto go astray → astray is not a stand-alone wordsin decir ni chus ni mus → chus is not a stand-alone word without to_say neither chus nor mus without saying a word
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardlytxintik ere ez esan 'txint' neither no say not even say a word →the word 'txint' is not used out of this expressionprendre la poudre d'escampette to escape → escampette is not a stand-alone wordμηδεμίαν ὤρην ἔχεινmēdemian о̄rēn ekhein no worry have.INF not to be worriedmangiare a ufo to eat without paying → a ufo is not a stand-alone word
fare lo gnorri to play dumb → gnorri is not a stand-alone word
scendere in lizza to enter the lists → lizza is not a stand-alone word一矢を報いるone.arrow.ACC repayto retaliate → 一矢 is not a stand-alone word
矢面に立つ arrow.face.LOC standto face direct attack → 矢面 is not a stand-alone wordop apegapen liggenbe at one's last gasp → apegapen is not a stand-alone wordodsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
wyjść na jaw to come-out to light to transpire, to become knownir para as cucuias to go wrong → cucuias is not a stand-alone worda nu avea habar to have no idea → habar is not a stand-alone wordbiti si kvit to pay up a debt, owe nothing to somebody → kvit is not a stand-alone wordне би било згорег ne bi bilo zgoreg it wouldn't be for the worse it wouldn't be a bad idea → згорег zgoreg is not a stand-alone word
читати (некоме) вакелу čitati (nekome) vakelu to read somebody a scolding to scold somebody → вакела vakela is not a stand-alone word
имати на претек imati na pretek to have an abundance → претек pretek is not a stand-alone word
не часити ne časiti don't jump the gun → часити časiti is not a stand-alone wordatt komma ihåg to remember → ihåg is not a stand-alone wordправя на сос → правя and сос are stand-alone wordssich um etw. herum stellen to stand around something → all words are stand-alone wordsto go away → go and away are stand-alone wordsir a la universidad to go to university → ir, a, la and universidad are stand-alone wordsunibertsitatera joan university-to go to go to university →both words are stand-aloneandare giù to go down → andare and giù are stand-alone wordshij gaat weg he goes away → gaan and weg are stand-alone wordswyznać tajemnicę to reveal a secret → wyznać and tajemnica are standalone wordsir para a escola to go to school → ir, para, a and escola are stand-alone wordsa nu avea idee to have no idea → all words are stand-alone wordsbiti si v sorodu to be related to each other → biti si and sorod are stand-alone wordsбити квит biti kvit to be even → бити biti and квит kvitare stand-alone wordsatt komma på to figure out → komma and på are stand-alone wordsTest VID.2 - [LEX] - Lexical inflexibility
Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
- it is a VID
- further tests are required
وضعه على الرَّف put it on the shelf ـ→ وضعه على الطاولة # to put it on tableбълвам змии и гущери to spew snakes and lizards → #бълвам влечуги (to spew reptiles)
всяка жаба да си знае гьола every frog to know its own puddle → #всяка жаба да си знае локватаdie Katze aus dem Sack lassen to let the cat out of the bag → #den Hund aus dem Karton lassen #to let the dog out of the box
eine Entscheidung treffen to meet a decision to make a decision → #eine Entscheidung machen/herstellen a decision make/produce #to make/produce a decision(OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ṭp ⸗f ꞽm (⸗ꞽ) (My) lord filled his head with (me).κάνω την πάπια → #κάνω τη χήναkano tin papia --> kano tin china make.1SG the duck play dumb
φέρω βαρέως → #φέρω ελαφρώς
μπαίνει το νερό στ' αυλάκι → #μπαίνει το νερό στο ποτάμιto let the cat out of the bag → #to allow the feline out of the container
to go on → *to go upon
to stand firm/fast → *to stand hard/rigid/solidmeterse en la boca del lobo to_get_into.self in the mouth from_the wolf venture into the lion's den → #meterse en el ojo del gato
tomar una decisiónto_take a decision to make a decision → #hacer/coger/producir una decisión to_make/grab/produce a decision #to make/grab/produce a decisionerabakia hartu decision take to make a decision →erabakia #sortu/jaso/egin create/receive/doπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → o # περὶ καλοῦ ποιέομαιnon dire gatto se non ce l'hai nel sacco don't say cat if it is not in the sack don't count on something before it happens→ #non dire cane se non ce l'hai nel sacco#don't say dog if it is not in the sack
sputare il rospo spit the toad spit it out → #sputare la rana#spit the frog空気を読む atmosphere.acc read read the situation →*大気を読む
生計を立てる means.of.living.acc stand earn an income →生計を*起こすeen kat in de zak kopen to buy a pig in a poke → #een hond in de zak kopen #to buy a dog in the bag
een beslissing nemen to meet a decision to make a decision → #een beslissing produceren a decision make/produce #to make/produce a decisionwiedzieć, co w trawie piszczy to know what in grass squeals to be well informed → #wiedzieć, co w trawniku popiskuje
nie wchodzić w rachubę not to come into count to be out of question → #wchodzić w liczenie/rachunek
wodzić kogoś za nos to lead someone by the nose to cheat on someone → #wodzić za nozdrza/ucho/wargiquebrar um galho break a branch to help → #danificar um ramo to damage a stema da cu bâta în baltă to give with bat-the in pond to say sth embarrassing → *a da cu bățul în baltă to give with stick-the in pond, *a da cu bâta în lac to give with bat-the in lakeimeti mačka to have a cat to have a hangover → #imeti psa to have a dog
iti rakom žvižgat to go whistling to crabs to fail, to die → #iti jastogom pet to go singing to the lobstersзнати у ком грму лежи зец znati u kom grmu leži zec to know in which bush the rabbit lies to know what the main problem is → #знати у ком жбуну лежи кунић #znati u kom žbunu leži kunić to know in which shrub the hare lies
пустити буву pustiti buvu to let go of the fly to start a rumour/to spread news → #пустити вашку #pustiti vašku to let go of the lice
отети се контроли oteti se kontroli to break away from control to lose control → #отети се провери #oteti se proveri to break away from the examinationatt Plocka russinen ur kakan to pick the raisins out of the cake to choose only the best things → #att välja ut nötterna från kakanالطائرة أخذ take the plane → أخذ الحافلة take a busизнасям доклад present a report → изнасям урок/лекция/презентация и т.н.den Bus nehmen to take the bus → den Zug/ das Flugzeug, etc nehmen to take the train/plain/etcπαίρνω το λεωφορείοperno to leoforio take the bus to take the busto take a plane → to take a bus/car/boat, etc.coger el autobús to_take the busto take the bus → coger el avión/tren, etc. to take the plain/train/etc.autobusa hartu bus take to take the bus → trena/taxia/hegazkina hartu to take a train/taxi/planeprendere il trenoto take the bus → prendere il bus/aereo/etc to take the bus/plain/etcjqum u joqgħod always moving aboutde bus nemen to take the bus → de trein, het vliegtuig, enz. nemen to take the train, plane, etcsprawić kłopot to make a trouble→ sprawić przykrość/trudność/niedogodność/problem/zawikłanie/nieprzyjemnośćto make a(n) nuisance/difficulty/inconvenience/problem/complicationquebrar um braço to break an arm → quebrar uma perna/costela/falange to break a leg/rib/phalanxa lua o decizieto take a decision to make a decision → a lua o hotărâre to take a decree to make a decisiondelati težave to make a trouble→ delati preglavice/probleme/ to make a(n) nuisance/problemизазвати проблеме izazvati probleme to cause problems → изазвати бриге/невоље izazvati brige/nevolje to cause worries/afflictionsatt ta bussen to take the bus → att ta tåget/flyget, etc to take the train/plain/etcUsual modifications for [LEX] include replacing content words in the candidate by synonyms, hypernyms, hyponyms, antonyms, troponyms, meronyms, and related words in general.
Test VID.3 - [MORPH] - Morphological inflexibility
Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- it is a VID
- further tests are required
أخذ الثور من قرنيه to take the bull by his horns → أخذ الثور من قرنه# take the bull by one hornхвърлям око throw an eye to throw a glance → #хвърлям очи.PLURAL
хващам бика.DEF за рогата take the bull by the horns → #хващам бик.INDEF за рогата
не мога да си намеря място cannot find a place for myself to be extremely nervous → only exists in negative formins Gras beißen to bite into the grass to die → #in ein Gras beißento bite into a grass #in die Gräser beißen to bite into the grasses, in Kraft treten into force step to come into effect → #in Kräfte treten into forces step(OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ꞽb.w ⸗f ꞽm (⸗ꞽ) (My) lord filled his hearts with (me).κάνω του αλατιούkano tu alatiu do the salt → #κάνω των αλατιώνto kick the bucket → #to kick the buckets
to pretty-print → *to prettier-print
to take turns → #to take a turncoger el toro por los cuernos to_take the bull by the horns to take the bull by the horns → #coger el toro por el cuernoto_take the bull by the horn #to take the bulls by the horns to_take the bulls by the horns #to take the bulls by the horns
entrar en vigor to_enter in vigor to come into effect → #entrar en vigores to_enter in vigors #to come into effectsprendre le taureau par les cornes to_take the bull by the horns → #prendre le taureau par une corne to_take the bull by a hornπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #περὶ πολλῶν ποιέομαιandare a letto con le gallineto go to bed with the hens to go to bed early → #andare a letto con la gallina to_go to bed with the hen
cercare il pelo nell'uovo to look for the hair in the egg to be pedantic → #cercare i peli nell'uovoin de gaten houden keep an eye on → #in het gat houdenbudować zamki na lodzie to build castles on ice to rely on unstable foundations → #budować zamek na lodzie to build a castle on ice
mucha kogoś ugryzła a fly bit someone someone is in a bad temper→ #mucha kogoś ugryzie a fly will bite someone
wyciągnąć nogito stretch.PERF legsto die→ #wyciągać nogi to stretch.IMPERF legs (imperfective aspectual variant prohibited)bater perna hit leg to walk around → bater a/uma/essas perna/pernas/perninha/pernona to hit the/one/these leg/legs/leg.SMALL/leg.BIGa da colțul to give corner.the to die → *a da colţurileto give corners.theklicati jelene to call cerfs to vomit → #klicati jelena to call a cerfобећавати куле и градове obećavati kule i gradove to promise towers and towns to promise somebody the moon → #обећавати кулу и град obećavati kulu i grad to promise a tower and a town
бити у свакој чорби мирођија biti u svakoj čorbi mirođija to be the dil in every broth to meddle → #бивај у свакој чорби мирођија bivaj u svakoj čorbi mirođija be the dil in every broth
дође као кец на једанаест dođe kao kec na jedanaest comes as an ace on an eleven an unfavorable outcome → #дође као кечеви на једанаест dođe kao kečevi na jedanaest comes as aces on an eleventräda i kraft step in force to come into effect → #träda i krafter step into forcesلعبة صنع to make a toy → صنع ألعاب to make many toysхвърлям топка to throw a ball → хвърлям топка/топката/топки/топкитеeinen Kuchen backen to bake a cake → viele/keine/den Kuchen backen/machen many/no/the cake bake/makeκάνω κουλούρια → κάνω νόστιμα κουλούριαto make a cake → to make a/many/those/no cake/cakesmover el brazo to_move the arm to move the arm → mover/agitar/levantar/estirar el brazo/la pierna/las manos/las piernas to_move/shake/raise/stretch the arm/the leg/the hands/the legs to move/shake/raise/stretch the arm/the leg/the hands/the legsἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → ἐπιστολὰς πέμπωfare un dolce → fare un/molti/dei/quei/nessun dolce/dolcieen taart bakken to bake a cake → veel/geen/de taarten bakken/maken many/no/the cakes bake/makekształtować opinię to form an opinion→ kształtować opinie to form opinionsbater o braço to hit the arm→ bater o/os/um/esse braço/braços/bracinho hit the/the.PL/a/this arm/arms/arm.SMALLa face o prăjiturăto make a cake → a face multe/aceste prăjiturito make many/these cakesvzeti taksi to take a cab → ne vzeti nobenega taksija/en taksi/dva taksija to take no/one/two/… cab(s)обећавати улагање obećavati ulaganje to promise an investment → обећавати улагања obećavati ulaganja to promise investmentsatt baka en kaka to bake a cake → att baka flera/den där/några/ingen kaka/kakor to bake several/that/some/no cake(s)Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, tense, mood, aspect, etc. - depending on the target language's morphology.
Test VID.4 - [MORPHSYNT] - Morpho-syntactic inflexibility
Does a regular morpho-syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- it is a VID
- further tests are required
یده ب أخذ take with his hand to give a hand → يده في أخذ# to take in his handаз ти давам думата си I give you my word → #аз ти давам неговата дума (I give you HIS word)
аз си продавам душата I sell my soul → #аз продавам неговата душа (I sell his soul)Ichwerde mein Bestes tun I will my best do I will do my best → *Ich werde dein Bestes tun I will do your best, Ich gebe dir mein Wort I give you my word → *Ich gebe dir ihr Wort I give you her word(OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → *mḥ nb (⸗ꞽ) ꞽb ⸗k ꞽm (⸗ꞽ) (My) lord filled your heart with (me). The suffix pronoun attached to ꞽb should agree in gender and number with the subject of this MWE.Ο Γιάννης παίζει τα ρέστα του → #Ο Γιάννης παίζει τα ρέστα μας
Ο Γιάννης έριξε μαύρη πέτρα πίσω του → #Ο Γιάννης έριξε μαύρη πέτρα πίσω μαςI will do my best → *I will do your best
I give you my word for that → #I give you his word for that
he was pulling my leg → #I was pulling my legte doy mi palabra to_you give_I my word I give you my word → #te doy su palabra to_you give_I his/her word I give you his/her wordil vide son sac he empties his bag he reveals his secret thoughts → #il vide mon sac he empties my bagπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #περὶ τοῦ πολλοῦ ποιέομαιIofarò del mio meglio → *Io farò del tuo meglio
Io ti do la mia parola→ #Io ti do la sua parola腹を立てるstomach.acc raiseget angry → *明日、腹を立てよう cf.明日、旗を立てよう
帰らぬ人となる return.NEG person become die → *帰らない人となる, *帰らぬ人とな(らない、ろう、る?…)Ik zal mijn best doen I will my best do I will do my best → *Ik zal jouw best doen I will do your bestPolish VMWEs do not seem to exhibit this kind of inflexibilityele se suicidou he self.3P.SG suicided → *ele me suicidou
eu perdi meu tempo I wasted my time → eu perdi teu/seu/nosso tempo English allows this, Portuguese doesn't. We say I made you waste your time instead.Îți dau cuvântul meu CL.DAT give.1SG word.the my I give you my word → #Îți dau cuvântul luiCL.DAT give.1SG word.the his I give you his wordVlečeš me za nos you are pulling my nose you're pulling my leg → *Vlečeš se za nos you're pulling your nose
Pojdi se solit! to go salt oneself Get lost! → *Pojdi ga solit go salt himдати све од себе dati sve odsebe to give one's all → #дати све од тебе dati sve od tebe to give everything from youJag gör mitt bästa I do my best I do my best → *Jag gör ditt bästa I do your bestкопая си гроба to dig my grave → копая ти/му/й/им гроба (to dig your/his/her/their grave)er traf seine Entscheidung he made his decision → er traf meine/ihre/unsere/eure Entscheidung he made my/her/our/your decisionhe did his job → he did my/her/our/your jobHa hecho su trabajo Has_he/she done his/her work He/She has done his/her work → Ha hecho mi/tu/nuestro trabajo Has_he/she done my/your/our work He/She has done my/your/our workἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → τὴν ἐπιστολὴν πέμπωhafatto il suo lavoro → ha fatto il mio/tuo/nostro/vostro/loro lavorohij deed zijn werk → he did my/her/our/your jobPolish VMWEs do not seem to exhibit this kind of inflexibilityEu fiz meu trabalho I did my job → Tu/ele/nós fizeste/fez/fizemos meu trabalho You/he/we made my jobel își face tema he his does homework.the he does his homework → el îmi/ne/le face tema he my/our/their does homework.the he does my/our/their homeworkopravil je svojo nalogo he did his job → opravil je mojo/njeno/našo/tvojo nalogo he did my/her/our/your jobурадио је свој посао uradio je svojposao he did his job → урадио је мој посао uradio je moj posao he did my jobhan gör sitt jobb he does his job → han gör mitt/hennes/vårt jobb he does my/her/our jobUsual modifications for [MORPHSYNT] involve agreement or loss of agreement between some components in the candidate.
Test VID.5 - [SYNT] - Syntactic inflexibility
Does a regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- it is a VID
- It is not a VID, exit
на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person → #продавам краставици на стар краставичар, #краставиците са продадени
бълвам змии и гущери → #бълвам гущери и змииNoun phrase (NP) or prepositional phrase (PP)(OEG) 𓎕𓏝 𓎠𓅆 𓄣 𓆑 𓇋𓅓 mḥ nb (⸗ꞽ) ꞽb ⸗f ꞽm (⸗ꞽ) (My) (⸗ꞽ) lord (nb) filled (mḥ) his (⸗f) heart (ꞽb) with (ꞽm) (me) (⸗ꞽ) My lord trusted me. (Urk. I 134, 1) → ꞽb ⸗f mḥ.w ꞽm (⸗ꞽ) (Urk. I 99, 4) 'His heart was filled with (me)', i.e. 'His trust was earned by me'.speak of the devil the person one is talking about shows up → #he was speaking of the devil
to go bananas to get crazy → #bananas are gone
to drink and drive → #drive and drink
to kick the bucket → #the bucket was kickedcoser y cantar to_sew and to_sing easy as pie, a piece of cake → #cantar y coser to sing and to sew
perder la cabeza to_loose the head to go bananas → #perder las cabezas to_loose the headsπερὶ πολλοῦ ποιέομαιperi pollou poieomai above much.GEN do.1SG I hold in high esteem → #ποιέομαι περὶ πολλοῦalzare la cresta to lift the crest become cocky → #la cresta è stata alzata the crest has been lifted
andare in malora go to ruin go to ruin → #nella malora è andata in ruin was gone
vivi e lascia vivere live and let live → #lascia vivere e vivi let live and live満足がいくsatisfaction.nom gobe satisfied → *満足にいかせる cf. 太郎がいく→太郎にいかせるkleine bedrijven leggen het loodje small companies lay the lead get the short end of the stick → #het loodje wordt gelegdkogoś krew zalewa blood foods someone someone gets furious→ #ktoś jest zalewany przez krew someone is flooded by blood (passive blocked)
robić bokami to do with-sidesto have serious financial problems→#robić swoją robotę bokami to do one's job with sides (regular modification blocked)
dobrze komuś z oczu patrzy well someone.DAT from eyes lookssomeone looks like a good person → #uprzejmość dobrze komuś z oczu patrzy kindness well someone.DAT from eyes looks (subject prohibited)
nie zagrzać miejsca w pracy not to warm a place at worknot to stay long at one work → #zagrzać miejsce w pracy to warm a place at work (negation is compulsory)
zdechł pies! died the dog!it is a lost cause→ #pies zdechł the dog died (a regular word order variability is blocked)
wziąć w łebto take into headto fail → #wziąć porażkę w łeb to take failure into head(direct object prohibited for the normally transitive verb wziąćto take)pisar na bola step on the ball make a mistake → #a bola na qual ele pisou the ball on which he steppeda da colțul to give corner.the to die → *colțul a fost dat corner.the has been givendelati se Francoza to pretend to be French to pretend to be indifferent → *delan Francoz made Frenchкоцка је бачена kocka je bačena the die has been thrown the die has been cast → #коцка се бацила kocka se bacila (blocked passive) the die cast itself
ведрити и облачити vedriti i oblačiti to brighten and to cloud to call the shots → #облачити и ведрити oblačiti i vedriti (regular word order variability is blocked) to cloud and to brighten
не вредети пишљивог боба ne vredeti pišljivog boba to not be worth a single bean to be worthless → #вредети пишљивог боба vredeti pišljivog boba (negation is compulsory) to be worth a single bean
носити на души nositi na duši to carry something on one's soul to carry the burden of guilt → #ношење на души nošenje na duši (nominalization blocked) carrying on a souldet knallar och går it trots and walks it is OK/as usual → #det går och knallarпродавам неговата кола I sell his car → колата му беше продадена (his car was sold), неговата кола, която тя продаде (his car which she sold), т.н.jemandes Auto waschen to wash one's car → ihr Auto wurde gewaschen her car was washed, das Auto, welches sie wusch the car that she washed, Autowaschen car-washing, etcto wash one's car → her car was washed, the car that she washed, car washing, etc.pisar la arena to step on the sand → la arena que pisaste The sand on which you steppedἐπιστολὴν πέμπωepistolēn pempо̄ letter.ACC send.1SG I send a letter → πέμπω ἐπιστολὴνlavare la macchina →la sua macchina è stata lavata, la macchina che ha lavato, il lavaggio della macchina, etc.iemands auto wassen to wash one's car → haar auto werd gewassen her car was washed , de auto, die zij waste the car that she washed, autowassen car-washing, etc.kształtować opinię to form an opinion → opinia jest kształtowana the opinion is formedpisar na areia to step on the sand → a areia na qual você pisou the sand on which you stepped
jogar futebol to play football → ?futebol é jogado football is played One may argue that this is a VMWE because passive sounds strange. However, we assume that this sense of jogar does not accept passive. Since this construction is very productive, we do not annotate it as VMWE.a spăla maşinato wash the car→ maşina a fost spălată, maşina pe care a spălat-o, spălarea maşinii etc.the car was washed, the car that he/she washed, car washingnarediti film to make a movie → Film, narejen po knjigi a movie based on a bookнаписати књигу napisati knjigu to write a book → књига је написана knjiga je napisana the book is writtenatt tvätta bilen to wash one's car → min bil tvättades my car was washed, bilen som hon tvättade the car that she washed, biltvätt car-wash etc.Section 5.4
Inherently reflexive verbs (IRV)
Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.
Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.
Namely, we will only annotate a REFLV as an inherently reflexive verb (IRV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verbs constitute a quasi-universal category.
IReflVs are a difficult category to annotate due to various problematic cases. Note in particular that in some languages, e.g. Slavic, the reflexive clitics inflect and should be considered not only in their most frequent case, i.e. accusative.
We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IRV.
- Inherently reflexive ⇒ ANNOTATE as IRV
- The verb without the RCLI does not exist
усмихвам се to smile, страхувам се to be afraidstydět se to be ashamed, divit se to wondersich schämen to be ashamed, sich wundern to wonder(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).suicidarse to suicide, abstenerse to abstainn.a.s'évanouir to faint, se suicider to suicidesuicidarsi to suicide, arrabbiarsi to get angryzich schamen to be ashamed, zich vergissen to be mistakendowiedzieć się to find out, bać się to be afraidqueixar-se to complain, abster-se to abstaina se teme to be afraid with obligatory ACC reflexive clitic
a își însuși to appropriate with obligatory DAT reflexive cliticsramovati se to be ashamed, bati se to be afraidстидети се stideti se to be ashamed,
бојати се bojati se to be afraidatt försova sig to sleep in
att gifta sig to get married - The verb without the RCLI does exist, but has a very different meaning
смея ≠ смея се to dare ≠ to smile, намирам ≠ намирам се to find ≠ to be situatedsich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle(OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).to find oneself in a difficult situation
to to help oneself to the cookiesrecoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insistn.a.s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to actriferire ≠ riferirsi to report, tell ≠ to referzich aanstellen ≠ aanstellen to put on airs, to act ≠ to appoint, zich begeven ≠ begeven to proceed ≠ to break down, zich realiseren ≠ realiseren 'to realise (be aware) ≠ to realise (achieve)'znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manageencontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refera se îndura ≠ a îndura to have the heart ≠ to suffer
a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IRV (it passes Test15)dati se it is possible (to do something) ≠ dati to give, dobiti se to meet ≠ dobiti to getгубити ≠ губити се gubiti ≠ gubiti se to lose ≠ to pass outatt känna sig ledsen/arg to feel sad/angry ≠ to touch
- The verb without the RCLI does not exist
- Reciprocal ⇒ NOT ANNOTATED
- The RCLI has a sense of mutually:
целувам се to kiss each other, срещам се to meet each otherlíbat se to kiss each other, potkávat se to meet each othersich küssen to kiss each other, sich treffen to meet each otherbesarse to kiss each other, verse to see each othern.a.s'embrasser to kiss each other, se rencontrer to meet each otherbaciarsi to kiss each othercałować się to kiss each other, spotykać się to meet each othercumprimentar-se to greet each other, ver-se to see each othera se saluta to greet each otherpoljubljati se to kiss each other, srečati se to meet each otherпољубити се poljubiti se to kiss,
срести се sresti se to meet
- The RCLI has a sense of mutually:
- Reflexive ⇒ NOT ANNOTATED
- The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
мия се to wash oneself, реша се to combe oneselfmýt se to wash oneself, drbat se to scratch oneselfsich waschen to wash oneself, sich kratzen to scratch oneself(OEG) 𓇋𓅱 𓈖𓐩𓈖 𓇓𓅱 𓃹𓈖𓇋𓋴 ꞽw nč̣.n św Wnꞽś Unas (Wnꞽś) has-protected (nč̣.n) himself (św). Unas has protected himself. (PT 290c, W)mirarse to look at oneself, vestirse to dress oneselfn.a.se laver to wash oneself, se parler to talk to oneselflavarsi to wash oneself, vestirsi to dress oneselfzich wassen to wash oneself, zich scheren to shave oneselfmyć się to wash oneself, drapać się po głowie to scratch oneself on the headapressar-se to hurry oneself, vestir-se to dress oneselfa se spăla to wash oneselfumivati se to wash oneself, praskati se to scratch oneselfумивати се umivati se to wash one's face,
чешати се češati se to scratch oneselfatt tvätta sig to wash oneself
- The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
- Body part, also called possessive reflexive ⇒ NOT ANNOTATED
- Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
мия си ръцете wash REFL.POSSESSIVE hands wash one's handsmýt si nohy wash RCLI.DAT the feet wash one's feetsich das Bein brechen RCLI the leg break break one's leg(OEG) 𓂜 𓂻𓅱𓈖 𓇋𓋴 𓃹𓈖𓇋𓋴 𓆓𓋴 𓆑 nꞽ ꞽw.n ꞽś Wnꞽś č̣ś ⸗f Indeed (ꞽś), Unas (Wnꞽś), his (⸗f) body (č̣ś), cannot-come (nꞽ ꞽw.n). Indeed, Unas himself cannot come. (PT 333b, W)rascarse el brazo scratch.RCLI the arm scratch one's armn.a.se gratter la tête RCLI scratch the head scratch one's headgrattarsi la testa RCLI scratch the head scratch one's headmyć sobie nogi wash RCLI.DAT the feet wash one's feetimpossible, uses possessive insteada-şi rupe mâna RCLI.DAT break arm break one's armumivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's armсломити си ногу to break RCLI the foot slomiti si nogu to break one own's leg,
умити си лице umiti si lice to wash RCLI the face to was one own's face
- Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
- Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
- The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
- Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
книги се пишат трудно books write.PL RCLI difficult it is difficult to write booksdie Häuser verkaufen sich gut the houses sell RCLI well the houses sell welllas casas se venden bien the houses RCLI sell well the houses sell welln.a.les pots se vendent bien the pots RCLI sell well the pots sell wellle case si affittano the houses RCLI rent the houses are renteddomy dobrze się sprzedają houses sell.PL RCLI well houses sell wellas casas se vendem bem the houses RCLI sell well the houses sell wellcasele se vând bine houses-the RCLI sell well houses sell wellhiše se dobro prodajajo the houses sell RCLI well the houses sell wellземља се добро продаје zemlja se dobro prodaje the land RCLI well sell the land's selling well
- Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
- In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
трудно се пишат книги difficult RCLI write.PL books it is difficult to write booksse alquilan casas RCLI rent houses people rent housesn.a.si affittano case RCLI rent houses people rent housesdobrze sprzedają się te domy well sell RCLI these houses these houses sell well Polish is a relatively free word-order language and a postverbal subject is a regular (even if stylistically marked) alternation.alugam-se casas rent-RCLI houses people rent housesse vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
se construiesc locuințe noi RCLI built houses new new houses are builtnove hiše se gradijo new houses RCLI built new houses are builtдобро се продаје ова роба dobro se prodaje ova roba well RCLI sell these goods these goods are selling well
- In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
- Impersonal ⇒ NOT ANNOTATED
- The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
- There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
- The verb is in third person singular, even when the object is plural
не се вечеря късно not RCLI have dinner late it is not good to have dinner latehier tanzt es sich gut here dances it RCLI well people dance well herese busca a actores RCLI searches to actors people look for actors
se trabaja mejor aquí RCLI works better here people work better heren.a.il se dit des bêtises it RCLI says silly things people say silly thingssi lavora troppo RCLI works too much people work too much
si affitta molte case RCLI rents many houses people rent many housesza dużo się pracuje too much RCLI works people work too much
bzdury się opowiada nonsense RCLI tells people tell nonsensedorme-se muito sleeps-RCLI much people sleep a lot
conta-se histórias tells-RCLI stories people tell stories Transitive impersonal is considered wrong by traditional grammar but it is found in corpora.se lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
se aleargă dimineața RCLI run in the morning people run in the morninggovori se/govorijo se neumnosti it says/they say RCLI silly things people say silly thingsради се превише radi se previše it works RCLI too much there's too much work being done,
говоре се глупости govore se gluposti they say RCLI nonsense nonsense is being said
- Inchoative ⇒ NOT ANNOTATED
- Similar to middle, but the RCLI marks a less productive syntactic alternation:
- the direct object of the transitive version appears as subject of the REFLV
- the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
вратата се отваря the door opensdveře se otvírají the door opensdie Tür öffnet sich the door opensla puerta se abrió the door openedn.a.la porte s'est subitement ouverte the door suddenly openedla porta si apre the door opensdrzwi się otwierają the door openso vaso se quebrou the vase brokemașina s-a stricat the car broke down
ușa s-a deschis the door openedvrata se odpirajo the door opensврата се отварају vrata se otvaraju the doors are openingdörren öppnar sig the door opens
- Similar to middle, but the RCLI marks a less productive syntactic alternation:
IRV-specific decision tree
- Apply test IRV.1 - [INHERENT]
- Annotate as IRV
- Apply test IRV.2 - [DIFF-SENSE]
- Annotate as IRV
- Apply test IRV.3 - [DIFF-SUBCAT]
- Annotate as IRV
-
- verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
- It is not a VMWE, exit
- Annotate as IRV
- verb has a subject ⇒ Apply test IRV.5 - [MIDDLE-INCHO]
- It is not a VMWE, exit
- Apply test IRV.6 - [REFL]
- It is not a VMWE, exit
-
- subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
- It is not a VMWE, exit
- Annotate as IRV
- subject is PLURAL ⇒ Apply test IRV.8 - [RECIPRO]
- It is not a VMWE, exit
- Annotate as IRV
- subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
- verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
Test IRV.1 - [INHERENT] Inherent clitic
Does the verb only exist with the RCLI and never occurs without it?
- annotate as IRV
страхувам се ⇒ *страхувам to be afraid
усмихвам се ⇒ *усмихвам to smilesich schämen ⇒ *schämen to be ashamed
sich wundern ⇒ *wundern to wonder(OEG) 𓋴𓅓𓊃𓈖 𓆑 𓇓 𓂋 𓆑 ś:ms.n ⸗f św (ꞽ)r ⸗f He (⸗f) proceeded (ś:ms.n) himself (św) to ((ꞽ)r) him (⸗f). It is to him that he proceeded. (PT 10c, N) → The verb ś:ms is only attested with a reflexive pronoun (Wb. (V 141, 14).suicidarse ⇒ *suicidar to suicide
abstenerse ⇒ *abstener to abstainn.a.s'évanouir ⇒ *évanouir to faint
se suicider ⇒ *suicider to suicidesuicidarsi ⇒ *suicidare to suicidezich schamen ⇒ *schamen to be ashamed
zich vergissen ⇒ *vergissen to be mistakendowiedzieć się ⇒ *dowiedzieć to find out
bać się ⇒ *bać to be afraid
wydarzyć się ⇒ *wydarzyć to happenqueixar-se ⇒ *queixar to complain
abster-se ⇒ *abster to abstaina se teme ⇒ *a teme to be afraid
a își însuși ⇒ *a însuși to appropriatesramovati se ⇒ *sramovati to be ashamed
čuditi se ⇒ *čuditi to wonderбавити се ⇒ *бавити baviti se ⇒ *baviti to deal with,
дивити се ⇒ *дивити diviti se ⇒ *diviti to admire - next test
Test IRV.2 - [DIFF-SENSE] - Different sense
Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?
- annotate as IRV
намирам се ≠ намирам to be situated ≠ to find
радвам се≠ радвам to feel happy ≠ to make happysich verstehen ≠ verstehen to get along well ≠ to understand(OEG) 𓊪𓈙𓈙𓂻𓈖 𓋴 𓅐𓏏 𓎡 𓏌𓏏𓇯 𓁷𓂋 𓎡 pšš.n ś(ꞽ) mw.t ⸗k Nw.t ḥr ⸗k Your (⸗k) mother (mw.t) Nut (Nw.t) spread (pšš.n) herself (ś(ꞽ)) over (ḥr) you (⸗k). Your mother Nut protected you. (PT 638a, T) → pšš means 'spread' without a reflexive pronoun (Wb. I 560).to find oneself in a difficult situation
to to help oneself to the cookiesrecogerse ≠ recoger to go home ≠ to pick up, to gathern.a.s'apercevoir ≠ apercevoir to realize ≠ to see
s'agir ≠ agir to be ≠ to actriferirsi ≠ riferire to refer ≠ to report, to tellzich voordoen ≠ voordoen to arise ≠ to showznajdować się ≠ znajdować to find oneself ≠ to be
sprawdzić się≠ sprawdzić to prove appropriate ≠ to check
wybrać się≠ wybrać to go ≠ to chooseencontrar-se ≠ encontrar to be ≠ to meet
referir-se ≠ referir to concern ≠ to refera se îndura ≠ a îndura to have the heart to ≠ to sufferrazumeti se ≠ razumeti to get along well ≠ to understandзнати ≠ знати се znati ≠ znati se to know ≠ to know someone,
забављати ≠ забабљати се zabavljati ≠ zabavljati se to amuse someone else ≠ to amuse oneself to amuse someone ≠ to date someone - next test
Test IRV.3 - [DIFF-SUBCAT] - Different subcategorization frame
Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same syntactic argument as the RCLI in the REFLV version?
- annotate as IRV
X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses YX se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Yn.a.X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to VinfX si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot YX verwondde zich aan Y ⇔ X verwondde Y X wounded/injured RCLI to Y ⇔ X wounded/injured Y
X toonde zich ADJ ⇔ X toonde NOUN X showed RCLI ADJ ⇔ X showed NOUN ?? elle se trouve grosse want se trouver hier zelfde betekenis als trouverX tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
X dziwi się Y.dat ⇔ Y dziwi X ⇔ Z dziwi X Y.inst X surprises SELF Y.dat ⇔ Y surprises X ⇔ Z surprises X Z.instX se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot YX se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that YА се објаснио с Б ⇔ А је објаснио Б A se objasnio s B A resolved the issues with B ⇔ A explained something to B - next test
Test IRV.4 - [IMPERS] - Impersonal
When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?
- do NOT annotate as verbal MWE
не се вечеря късно ⇔ хората не вечерят късно not RCLI have dinner late it is not good to have dinner latehier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well herese duerme mucho ⇔ las personas duermen mucho people sleep a lot
se busca a actores ⇔ la gente busca a actores people look for actorsn.a.il se dit des bêtises ⇔ les personnes disent des bêtises people say silly thingssi dorme molto ⇔ le persone dormono molto people sleep a lot
si affitta molte case ⇔ le persone affittano molte case people rent many housespracuje się za dużo ⇔ ludzie pracują za dużo people work too much
opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsensedorme-se muito ⇔ as pessoas dormem muito people sleep a lot
conta-se histórias ⇔ as pessoas contam histórias people tell storiesse lucrează până târziu ⇔ lumea lucrează până târziu people work until late
se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morninggovorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsenseради се превише. ⇔ људи раде превише. radi se previše. ⇔ ljudi rade previše. there's too much work being done ⇔ people are working too much. - annotate as IRV
Test IRV.5 - [MIDDLE-INCHO] - Middle or Inchoative
When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?
- do NOT annotate as verbal MWE
някой отваря вратата ⇒ вратата се отваря somebody opens the door ⇒ the door opensman kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opensla gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door openedn.a.on vend bien ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opensqualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opensktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
ktoś nasila skargi ⇒ skargi nasilają się somebody increases complaints ⇒ complaints increase
ktoś rozgrywa mecz ⇒ mecz rozgrywa się somebody plays a game ⇒ the game playsalguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ tell.PL-RCLI stories somebody tells stories ⇒ stories are told
alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy ⇒ the boy RCLI calmedsomebody calmed the boy down ⇒ the boy calmed down
o juiz casou João com Maria ⇒ João se casou com Maria the judge married João with Maria ⇒ João RCLI married with Maria the judge married João with Maria ⇒ João got married to Maria
o juiz casou Maria e João ⇒ Maria e João se casaram the judge married Maria and João ⇒ Maria and João RCLI married the judge married Maria and João ⇒ Maria and João got married
alguém lembrou João do meu aniversário ⇒ João se lembrou do meu aniversário somebody reminded João of my birthday ⇒ João RCLI reminded of my birthday somebody reminded João of my birthday ⇒ João remembered my birthdaycineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door openednekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door openedнеко је отварао врата ⇒ врата се отварају neko je otvarao vrata ⇒ vrata se otvaraju someone was opening the doors ⇒ the doors were being opened,
неко шири гласине ⇒ галасине се шире neko širi glasine ⇒ glasine se šire someone's spreading the rumors ⇒ the rumors are being spread - next test
Test IRV.6 - [REFL] - Reflexive
When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?
- do NOT annotate as verbal MWE
Павел лекува себе си ⇒ Павел се лекува Pavel heals himselfPaul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himselfPaul washes only himself ⇒ Paul washes himselfPablo se lava a sí mismo ⇒ Pablo se lava Paul washes himselfn.a.Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himselfPaolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himselfPaul wast alleen zichzelf ⇒ Paul wast zich(zelf) Paul washes himselfPaweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
Paweł bogaci tylko siebie ⇒ Paweł bogaci się Paul enriches himself Paul gets rich
Paweł myje tylko siebie ⇒ Paweł myje się Paul washes himselfPaulo só lava a si mesmo ⇒ Paulo se lava Paul washes himselfPaul se spală doar pe sine ⇒ Paul se spală. Paul washes himselfPavel praska sam sebe ⇒ Pavel se praska Paul scratches himselfМарко лечи сам себе ==> Марко се лечи Marko leči sam sebe ==> Marko se leči Marko is treating himself ==> Marko is getting treated - next test
- The subject is singular: test REFL-MUTUAL
- The subject is plural or coordinated (Bob and Alice): test RECIPRO
Test IRV.7 - [REFL-MUTUAL] - Reflexive-mutual
Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning?
- do NOT annotate as verbal MWE The test applies only if test 15 has failed. For example, for "X se marie" 'X gets married' in French, it is odd though possible to say 'X and Y marry each other', but this does not mean 'X gets married', because it is only possible if X and Y are marriage officiants
Павел се мие ⇔ те се мият един друг they wash each otherPaul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each otherPablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each othern.a.Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each otherPaolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each otherPaul wast zich ⇔ Zij wassen elkaar they wash each otherPaweł się myje ⇔ oni myją się nawzajem they wash each otherPaulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each otherel se spală ⇔ ei se spală unul pe altul they wash each otherPavel se umiva ⇔ umivajo drug drugega they wash each otherМарко се забавља ⇔ они један другог забављају Marko se zabavlja ⇔ oni jedan drugog zabavljaju Marko is amusing himself ⇔ they are amusing one another
- annotate as IRV
Test IRV.8 - [RECIPRO] - Reciprocal
Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning? That is:
- Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
- Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
- do NOT annotate as verbal MWE
Павел и Елена се целуват ⇔ Павел целува Елена и Елена целува Павел Pavel and Elena kissPaul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each otherPablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
los niños se abrazan ⇔ los niños abrazan a los niños the children hug each othern.a.Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
les jours se suivent ⇔ les jours suivent les jours the days follow each otherGiovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altroPaweł i Elena całują się ⇔ Paweł całuje Elenę i Elena całuje Pawła, Paweł i Elena całują się nawzajem Paweł kisses Elena and Elena kisses Paweł, Paweł and Elena kissJoão e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each otherIon şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
participanții se salută ⇔ participanții îi salută pe participanți the participants greet each otherPavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each otherМ и Н су се пољубили ⇔ М је пољубио Н и Н је пољубила М M i N su se poljubili ⇔ M je poljubio N i N je poljubila M M and N kissed ⇔ M kissed N and N kissed M - annotate as IRV
Problematic cases and remarks
PolysemyKeep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IRV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.
Clitics position and concatenationIn some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.
- In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
- s'abstenir to abstain
- In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
- enamorarse to fall in love
- alzarsi to get up
- In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
- queixar-se-ia would complain
- In Romanian the clitic and the verb are either separate or have a hyphen between them:
-
se aude un clopot RCLI hears a bell a bell is heard
s-aude un clopot RCLI-hears a bell a bell is heard
-
se aude un clopot RCLI hears a bell a bell is heard
The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.
Some idiomatic constructions include reflexive clitics. Two cases are possible:
- If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the VID:
пилците се броят наесен chicken REFL are counted in the autumn the true results can be seen only at the end ⇒ кокошките се броят the hens REFL countedsich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun Ndarse cuenta de to realize ⇒ *darse N de for any noun N
meterse en líos to get in trouble ⇒ REFLV not annotated in literal equivalents like meterse en una tienda to get in a storen.a.se rendre compte de to realize ⇒ *se rendre N de for any noun N
s'arracher les cheveux RCLI tear the hair worry ⇒ REFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nailrendersi conto di to realize ⇒ *si rende N di for any noun N
si strappa i capelli RCLI tear the hair to worry ⇒ REFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nailzich uit de voeten maken RCLI out of the feet make to get out of the way ⇒ *zich uit de N maken for any noun N
zich in de kijker spelen RCLI in the field-glass play to attract attention with one's skills ⇒ *zich in de N spelen for any noun Nzdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun Ndar-se mal to fail ⇒ dar-se ADV intransitive is acceptable only for antonym bem well
meter-se numa fria to get-RCLI in a cold to get in trouble ⇒ REFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabina-și smulge părul din cappuliti si lase tear RCLI the hair to worry ⇒ REFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrowsкитити се туђим перјем kititi se tuđim perjem decorate RCLI someone else's feathers steal someone's thunder; take credit for someone else's accomplishments - If the REFLV would be annotated as IRV in syntactically comparable literal constructions, annotate both the IRV and the VID as embedded MWEs (rare):
смея се през сълзи laugh REFL through tears to laugh bitterlyn.a.rozlatywać się w proch scatter itself into dust disappearvirar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/becomea i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IRV; both the IRV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of a VID (6.4_R)
a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to gorežati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IRVсмејати се као луд smejati se kao lud to laugh like crazy
Overlap LVC - IRVIt is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IRV:
Fragen stellen to ask questions ⇒ sich Fragen stellen to doubt/hesitatehacer preguntas to ask questions ⇒ hacerse preguntas to doubt/hesitaten.a.poser des questions to ask questions ⇒ se poser des questions to doubt/hesitate[No example yet]no examples found for ROIn this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as VID, since there are two syntactic arguments:
sich Fragen stellen to doubt/hesitatehacerse preguntas to doubt/hesitaten.a.se poser des questionsno examples found for RONotice that annotating only the verb and the RCLI as IRV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IRV:
sich stellen to surrenderhacerse get used ton.a.se poser to sit/lay downDative clitics and double cliticsIn some languages, e.g. Polish, clitics inflect for case. Most cases of IRV seem to be restricted to the accusative case:
страхувам се to be afraidbát se to be afraidn.a.n.a.bać się to be afraida se sinchisito RCLI.ACC care to care
a se sfiito RCLI.ACC be.shy to be shy
a se căito RCLI.ACC repent to repentbati se to be afraidбојати се bojati se to be afraidHowever, other cases can appear in IRV:
отивам си to go oneself.DAT to go awayporadit si to advise oneself.DAT to managen.a.n.a.radzić sobie to advise oneself.DAT to managea-și însuși to-RCLI.DAT appropriateto appropriate - with a Dative clitic
a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative cliticdrzniti si to dare oneself.DAT to dareSome expressions can have double clitics. Only the first two words belong to the IRV:
надсмивам се над себе си to laugh RCLI.acc at RCLI.DAT to laugh at myselfn.a.n.a.przyglądać się sobie to observe RCLI.acc RCLI.DAT to observe each other
radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneselfn.a.nasmehniti se sebi to smile at oneselfподсмевати се сам себи podsmevati se sam sebi to make fun of oneselfThis category does not cover other types of pronouns and clitics. They are covered by regular VID tests and should be annotated as such. Examples of constructions that should be annotated as VID rather than IRV include:
es gibt it gives there isn.a.l'emporter to take it away to win
s'en aller to self from-it go to leave
en avoir marre to have from-it enough to be fed up
il y avoir it at-it haveto existprender-ci to take to-it to make the right choice
prender-le to take it to be beatendá-lhe João! give to-him/her, João! show them what you got, João!a-i arde to CL.DAT burn to have a desire
a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
a-i repugnato CL.DAT loathe to loathe
a-i priito CL.DATto be favourable to sb.ucvreti jo to escape her to escape something/someone by runningмрзи ме/мрзи те/мрзи га/... mrzi ме/mrzi те/mrzi га/... to bother me/to bother you/to bother him/... I cannot be bother/you cannot be bothered/he cannot be bothered/...Section 5.5
Idiomatic verb-particle constructions (IVPCs)
In the previous versions of the guidelines, this category was called VPC (verb-particle construction).Idiomatic verb-particle constructions (IVPCs), sometimes called (idiomatic) phrasal verbs or phrasal-prepositional verbs, like
n.a.um|fahren over|drive to run over,mit|kommen with|come to join,vor|bereiten before|prepare to prepareto put off, to blow up, to do inn.a.n.a.buttare giùn throw down to swallowvoor|bereiden before|prepare to preparen.a.n.a.n.a.constitute another quasi-universal category. They have the following general characteristics:
- They are formed by a lexicalized head verb v and a lexicalized particle p dependent on v.
- The meaning of the IVPC is fully or partly non-compositional.
- In fully non-compositional IVPC (IVPC.full) the change in the meaning of v goes
significantly beyond adding the meaning of p:
n.a.die Fische sind eingegangen the fish went in the fish diedto do in to kill, destroy, cheat or harm severelyn.a.rondkomen round-come to make ends meetn.a.n.a.
- In semi-non-compositional IVPCs (IVPC.semi), p adds a partly predictable but non-spatial meaning to v
n.a.to eat up to eat completelyn.a.opeten to eat completelyn.a.n.a. - In fully non-compositional IVPC (IVPC.full) the change in the meaning of v goes
significantly beyond adding the meaning of p:
IVPCs are pervasive in English, German, Swedish, Hungarian and possibly some other languages but irrelevant to or infrequent in Romance and Slavic languages or in Farsi and Greek for instance.
In some Germanic languages and also in Hungarian, verb-particle constructions can be spelled either as one (multiword) token or separated. Both types of occurrences are to be annotated:
n.a.Die Kinder sollen in der Schule aufpassen The children must pay attention at school
Herr Müller, passen Sie auf! Mr. Müller, be carefuln.a.n.a.Ongelukken komen voor Accidents happen
Ongelukken kunnen voorkomen Accidents can happenn.a.n.a.n.a.The first challenge in identifying an IVPC is to properly distinguish the particle from a possibly homographic preposition, e.g.:
n.a.to look up the number vs to look up the chimneyn.a.n.a.???n.a.n.a.n.a.or a verbal prefix:
n.a.um- in um|fahren vs umfahrenn.a.n.a.voor- in voor|komen to occur vs voorkomen to preventn.a.n.a.n.a.Namely, a particle, contrary to a preposition, cannot govern a complement. This can be tested depending on the verb's subcategorization frame:
- For intransitive verbs, the particle can occur without an NP. The fact that there is no NP that could be governed by the particle to form a PP shows that it is a particle rather than a preposition.
- For transitive verbs, the particle can occur either before or after the direct object. The fact that it is mobile and can go before or after the NP shows that it is a particle rather than a preposition
n.a.intransitive: The airplane took off
transitive The fire did in the whole block or The fire did it inn.a.n.a.???intransitive: Ongelukken komen voor
???transitive Hans is zijn moeder aan het opbellen or Hans is zijn moeder op aan het bellenn.a.n.a.n.a.Prefixes, contrary to particles, can never be spelled separately from the verb, nor can the past tense of prefixed verbs be formed with the infix -ge-
n.a.*er fuhr den See um
*er hat den See umgefahren, instead: er hat den See umfahren he drove around the lake but: er hat das Schild umgefahren he run over the signn.a.n.a.aanbidden to worship *aangebedenn.a.n.a.n.a.See the language-specific tests for more details on distinguishing particles from prepositions and verbal prefixes.
Note that in this shared task we do not account for compositional verb-particle combinations, i.e. those whose meaning can be deduced from the meaning of the preposition and of the verb:
n.a.er legt das Buch ab he puts down the book, er kommt ins Haus rein he comes into the house he enters the houseto lie down, You may go in nown.a.n.a.hij legt het boek neer he puts down the book, hij komt het huis binnen he comes into the house he enters the housen.a.n.a.n.a.Some combinations may have both compositional and non-compositional meanings depending on the context and only the latter should be annotated:
n.a.ein Schild aufstellen to put up a sign vs. einen Plan aufstellen to draw up a planto put up a flag vs. to put up a friend for the nightn.a.n.a.apparatuur opstellen to put up equipment vs. een rooster opstellen to draw up a rostern.a.n.a.n.a.the following decision tree should be applied to decide whether a candidate should be annotated as a IVPC or not.
IVPC-specific decision tree:
- Apply test IVPC.1 - [PART-REDUC: Can the verb without the particle refer to the same event?]
- It is a IVPC.full.
- Apply test IVPC.2 - [PART-SPATIAL: Is the particle spatial?]
- It is not an IVPC, exit
- Apply test IVPC.3 - [PART-SPATIAL-LIT: Is the particle spatial in a literal reading?]
- It is a IVPC.semi
- It is not an IVPC, exit
Test IVPC.1 - [PART-REDUC] - Verb without the particle refers to the same event/state
Can a sentence without the particle refer to the same event/state as the sentence with the particle? Special care must be taken when the same construction might or might not be a valid VPC depending on its context.
- It is an IVPC.full.
- Go to the next test.
n.a.Der Lehrling fängt ein Praktikum an the apprentice catches an internship on the apprentice begins an internship does not imply #Der Lehrling fängt ein Praktikum the apprentice catches an internship
Die Bäuerin hat sich wieder eingefangen the farmer’s wife has herself again catched the farmer’s wife has calmed down again does not imply #Die Bäuerin hat sich wieder gefangen the farmer’s wife has catched herself again
Der Schüler legt die Prüfung ab the pupil lays the exam off the pupil takes the exam does not imply #der Schüler legt die Prüfung the pupil lays the exam
Das Schiff legt vom Hafen ab the boat lays from the harbor off the ship leaves the harbor does not imply #das Schiff legt vom Hafen the boat lays from the harborto do somebody in to kill sb does not imply #to do somebody
to check in upon arrival does not imply #to check upon arrivaln.a.n.a.A meccs után csak az edző nem rúgott be Only the coach did not get drunk after the match → A meccs után az edző berúgottThe coach got drunk after the match does not imply #Az edző rúgott the coach kicked
Nem jött be ez a koktél nekem I didn’t like this cocktail → Bejött ez a koktél nekem I liked this cocktail does not imply #Jött ez a koktél nekem this cocktail bumped into meDe leerling legt een examen af the pupil lays the exam off the pupil takes the exam does not imply #de leerling legt een examen the pupil lays an examn.a.n.a.n.a.n.a.Der Bauer fängt die Hühner ein the farmer catches the chickens in the farmer catches the chickens implies der Bauer fängt die Hühner the farmer catches the chickens
Der Lehrer legt das Buch auf dem Tisch ab the teacher lays the book on the table apart the teacher puts the book away on the table implies Der Lehrer legt das Buch auf den Tisch the teacher puts the book on the table
Der Lehrer legt den Mantel ab the teacher lays the coat off the teacher takes off his coat implies Der Lehrer legt den Mantel the teacher puts the coatto look up into the sky implies to look into the sky
to eat up the cookies implies to eat the cookiesn.a.n.a.A csatár nem rúgta be a helyzetét The forward missed its chance to score a goal → A csatár berúgta a helyzetét implies A csatár rúgott The forward kicked
Nem jött be a szobába He did not come into the room → (Bejött a szobába he entered the room implies Jött a szobába he came into the roomde koekjes opeten to eat up the cookies implies de koekjes etenn.a.n.a.n.a.Test IVPC.2 - [PART-SPATIAL] - Spatial particle
Is the particle spatial in the context of the verb, i.e. does it express direction or position?
- It is not an IVPC, exit.
- Go to the next test
n.a.to stand up
to give something back
to stay up tonight
You may go in now
to mix ingredients togethern.a.opstaan to stand up
aankijken look at
iets optillen to lift something up
slijm ophoesten cough up phlegmn.a.n.a.n.a.to eat the cookies up
to mix ideas togethern.a.de koekjes opeten to eat up the cookiesn.a.n.a.Test IVPC.3 - [PART-SPATIAL-LIT] - Spatial particle in a literal reading
Does the IVPC candidate have a literal counterpart in which the particle is spatial, i.e. expresses direction or position?
- It is not an IVPC, exit.
- It is a IVPC.semi.
n.a.to mix ideas togethern.a.n.a.n.a.n.a.to eat the cookies upn.a.de koekjes opeten to eat up the cookiesn.a.n.a.Section 5.6
Multi-verb constructions (MVC)
Multi-verb constructions (MVC) constitute a quasi-universal category. They are VMWEs composed by a sequence of two adjacent verbs (in a language-dependent order), a functionally governing verb V-gov (also called a vector verb) and a functionally dependent verb V_dep (also called a pole/polar verb), which have the following characteristics:
- They usually have the same subject.
- They usually denote actions that are closely connected and may be seen as part of the same event.
- They function together as a single predicate.
- They are unaccompanied by any explicit coordination, subordination, or dependency marker.
- They only have a single tense, aspect and polarity value.
- They may be idiomatic or indicate successions of events.
- The V-gov (vector) verb is semantically delexicalized and the V-dep (polar) verb contains the core meaning of the whole. Note that V-dep might be seen as the head and V-gov as the dependent, in dependency frameworks such as Universal Dependencies, where the principle of the primacy of content words is applied.
The behavior of MVCs is very heterogeneous across languages. Therefore, most tests for the detection of MVCs are language specific. The current tests were designed for Indonesian, Hindi, Japanese and Chinese. The generalization of these tests cross-lingually is planned as future work.
MVC-specific decision tree for Hindi
- Apply Test MVC.1.BASE - [MVC-STRUCT-BASE: V-dep is non finite and V-gov bears inflection?]
- It is not a VMWE, exit
- Apply Test MVC.3.KAR - [INS-REDIRECT-KAR: kar or ke appears just after V-dep?]
- Apply Test MVC.6 - [MANNER: V-gov indicates the manner/means/direction of V-dep?]
- It is a manner serial verb, not a VMWE, exit
- Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep?]
- It is a reason serial verb, not a VMWE, exit
- Apply Test MVC.8 - [SEQ: V-gov and V-dep bound by temporal sequence?]
- It is a temporal sequence serial verb, not a VMWE, exit
- Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
- It is a serial verb expressing simultaneous actions, not a VMWE, exit
- Continue to the next test
- Apply Test MVC.10 - [LIGHT: V-gov in the
closed list of light verbs?]
- Annotate as MVC
- Apply Test MVC.13 - [V-LEX: V-dep refers to the same event/state as V-gov+V-dep?]
- It is not a VMWE, exit
- Annotate as an MVC
MVC-specific decision tree for Chinese
- Apply Test MVC.2.ASPECT - [INS-DISCARD-ASP: V-gov can take un aspect marker –le or –guo?]
- It is not a MVC, exit
- Apply Test MVC.5 - [MODAL: V-gov is a modal or an auxiliary verb?]
- It is not a MVC, exit
- Apply Test MVC.6 - [MANNER: V-gov indicates
the manner/means/direction of V-dep (or vise versa)?]
- It is not a MVC, exit
- Apply Test MVC.7 - [REASON: V-gov indicates the reason for V-dep (or vise versa)?]
- It is not a MVC, exit
- Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
- It is not a MVC, exit
- Apply Test MVC.4 - [SHARE-ARGS: V-gov and V-dep share arguments?]
- Annotate as an MVC
- It is not a MVC, exit
MVC-specific decision tree for Indonesian and Japanese
- TODO (in the meantime, follow the tests one by one)
MVC-specific decision tree for any other language
- Apply directly Test MVC.13 - [COMP: V-dep refers to the same event/state as V-gov+V-dep?]
- It is not a VMWE, exit
- Annotate as an MVC
Test MVC.1 - [MVC-STRUCT] MVC-like structure
Does the candidate respect the necessary structural (language-dependent) requirements for an MVC?
Hindi
Test MVC.1.BASE [MVC-STRUCT-BASE]: Is V-dep non finite and does V-gov carry the tense, aspect and agreement inflections?
- continue to the next test
n.a.n.a.n.a. - universal subcategories:
- it is not an MVC
Japanese
Test MVC.1.IMORPH: Does the first verb (V-dep) contain the i-morph suffix?
- continue to the next test
- it is not a MVC
Any other language
Go to the next test
Test MVC.2 - [INS-DISCARD] Insertion which discards
Does the candidate sequence appear, or could it appear, with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that this candidate is a regular combination and should be discarded?
Chinese
Test MVC.2.ASPECT - [INS-DISCARD-ASP]: Can the aspect marker 了 -leperfective or 过 -guoprovide the meaning of the prefix be inserted between between V-gov and V-dep (or the opposite)?
- it is NOT an MVC
我wǒ I看出来 kànchūlái figure out→ 我看wǒkàn I see了le aspect marker出来 chūlái exit→ The insertion of the aspect marker 了 le aspect markeris grammatically sound
- continue to next test
我wǒ I听说tīngshuō heard → *我听wǒtīng I heard了 le aspect marker说 shuō say→ The insertion of the aspect marker 了 le aspect marker leads to ungrammaticality in the phrase
Indonesian
Test MVC.2.PRON - [INS-DISCARD-PRON]: Can a pronoun like dia he/she be inserted between the first [AS: between V-gov and V-dep or the opposite?] and second verb?
- it is NOT an MVC
- continue to next test
Test MVC.2.CLAUSE - - [INS-DISCARD-CLAUSE]: Can a that-clause like bahwa that, or a whether-clause like apakah whether be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?], where the first verb [AS: V-gov?] is a saying verb like mengatakan say or an asking verb like menanyakan ask?
- it is NOT an MVC
- continue to next test
Test MVC.2.PURPOSE - [INS-DISCARD-PURP]: Can untuk for/to be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?]?
- it is a purpose serial verb, not an MVC
n.a.Saya Ibersiap pergi get ready to go= SayaI bersiap untuk pergi get ready for the purpose of going→ The insertion of untuk for/to is grammatically sound and does not change the meaning of the sentence. Although it is possible to insert untukfor/to between first and second verb, it is usually unnecessary and omitted.n.a.n.a.
- continue to next test
Japanese
Test MVC.2.HONOR - [INS-DISCARD-HONOR]: Is the first verb [AS: V-gov or V-dep?] preceded by the honorific particle お o and is the second verb する/できるsuru/dekiru?
- it is NOT an MVC, but an honorific construction.
n.a.n.a.お-話し-する o-hanasi-suru I humbly talkn.a.n.a.
- continue to next test
Any other language
Go to the next test
Test MVC.3 - [INS-REDIRECT] Insertion which redirects
Does the candidate sequence appear with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that a particular test should be applied next?
Hindi
Test MVC.3.KAR - [INS-REDIRECT-KAR]: Does conjunctive participle kar or ke appear attached to or immediately after V-dep?
- Go directly to test MVC.6 [MANNER].
- Go directly to test MVC.10 [LIGHT].
Any other language
Go to the next test
Test MVC.4 - [SHARE-ARGS] Shared arguments
Do V-gov and V-dep share arguments?
- it is an MVC
- it is not an MVC
n.a.
- it is NOT an MVC
n.a.n.a.可以 kéyǐcan, 可能 kěnéngmight, 会 huìwill, 必须 bìxūmust, 需要 xūyàoneed to, 要 yàowant to, 能 néngable to, 应该 yīng gāishould
- continue to next test
- it is a manner serial verb, not an MVC
n.a.n.a.us-ne ciikh-kar mujh-e bulaa-yaa He-erg yell-ConjPpl I-dative call-perf he called me by screamingpulang melalui return-home pass-through go home by passing through (a place)投げ込み nage komi throw go in throw into
なぐり殺し naguri korosi punch kill kill by punching走进来 zǒu jìnláiwalk enter walk into (a place) - continue to next test
- it is a reason serial verb, not an MVC
n.a.n.a.vo melaa jaa-kar khush hu-aa he fair go-ConjPpl happy become-perf he got happy having gone to the fairn.a.
- continue to next test
- it is a sequential serial verb, not an MVC
n.a.n.a.us-ne gilaas banaa-kar bec-aa he-erg glass make-ConjPpl sell-perf having made the glass, he sold itbersiap pergi prepare go prepare in order to go (somewhere) → the first verb must happen before the second verb happens, otherwise the sentence will not make sense.夫人が最初にfujin ga saisho ni the wife first叩き起こさtataki okosa hit to awakenれre verb suffix != #夫人が最初にfujin ga saisho ni the wife first起き叩さtataki okosa hit to awakenれ re verb suffix→ The two verbs 叩き tataki hitand 起こさ okosa awakenare bound by temporal sequence, such that if the order is switched, the sentence does not make sense.n.a.
- continue to next test
- it is a serial verb expressing simultaneous actions, not an MVC
n.a.n.a.berlari menuju run head-towards run and go towardsn.a.
- continue to next test
- it is a (light) MVC
n.a.n.a.
- continue to next test
- it is a preposition-like MVC
n.a.n.a.n.a.排列成 páiliè chéng arrange become arrange into (something)
- continue to next test
- it is a deverbalized V1/V2 MVC
n.a.(JA) 響き渡る hibiki wataru echo spread-widely reverberate → The first verb is a noun-like argument of the second verb [deverbalized V2]
聞き違え kiki chigae listen be-different mishear/misunderstand → The second verb is a noun-like argument of the first verb [deverbalized V1]n.a.n.a. - continue to next test
- it is an MVC
- it is not an MVC
it will make me think → it will make me build/solve/constructquiero leer tu tesis want.I read your thesis I want to read your thesis → quiero adquirir/descargar/imprimir tu tesis want.I acquire/download/print your thesis I want to get/download/print your thesisje l'ai laissé finir la présentation I him have let finish the presentation I let him finish the presentation → je l'ai laissé commencer/lancer/interrompre la présentation I him have let start/launch/interrupr the presentation
ce garçon veut dire autre chose this boy wants say other thing this boy wants to say something else → ce garçon veut chuchoter/communiquer/crier autre chose this boy wants whisper/communicate/scream another thingik heb mijn trui laten wassen I had my sweater washed→ ik heb mijn trui laten strijken/verven/maken I had my sweater ironed/dyed/repaireddał jej pospać he let her sleep→ dał jej odpocząć/poleżeć he let her rest/layn.a.
Test MVC.5 - [MODAL] Modal or auxiliary verb
Chinese
Is V-gov a modal or an auxiliary verb?
Any other language
Go to the next test
Test MVC.6 - [MANNER] Manner verb
Chinese, Hindi, Indonesian, Japanese
Does V-gov indicate the manner or means (and possibly a direction) of the action expressed by V-dev (in Chinese: or vice versa)?
Any other language
Go to the next test
Test MVC.7 - [REASON] Reason verb
Hindi and Chinese
Does V-gov indicate the reason of the action expressed by V-dep (in Chinese: or vice versa)?
Any other language
Go to the next test
Test MVC.8 - [SEQ] Temporal sequence
Hindi, Indonesian, Japanese
Are the verbs bound by a temporal sequence?
Any other language
Go to the next test
Test MVC.9 - [SIMULT] Simultaneous actions
Do the verbs indicate rapid and simultaneous actions (without resorting to a coordination conjunction)?
Test MVC.10 - [LIGHT] Light verb
Hindi
Does V-gov belong to a closed list of light verbs: aa come, baiTh sit, chal go, chuk finish, choR leave, Daal throw, de give, ja go, jataa declare, khaa eat, lagaa put, le take, maar hit, paa get/obtain, paRh fall, rakh keep, uTh rise?
Any other language
Go to the next test
Test MVC.11 - [PREP-LIKE] Preposition-like verb
Chinese
[Hongzhi Xu: this test is not very clear and is only specific to one particular MVC (it should probably be deleted in future editions)] Is the second verb in the candidate [AS: V-gov or V-dep?] a preposition-like verb like 成 chéng become?
Any other language
Go to the next test
Test MVC.12 - [NOUN-LIKE] Noun-like verb
Japanese
Are any of the components [AS: V-gov or V-dep?] in the candidate noun-like arguments?
Any other language
Go to the next test
Test MVC.13 - [V-LEX] Lexical inflexibility
Does a regular replacement of V-dep by a related verb taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
ce mot veut dire autre chose this word wants say other thing this word means something else → #ce mot veut chuchoter/communiquer/crier autre chose this word wants whisper/communicate/scream another thing
Section 5.7
Inherently adpositional verbs (IAVs)
Inherently adpositional verb (IAV) is a special optional and experimental category (corresponding to the IPrepV category in the first pilot annotations), and to what is also sometimes called in English prepositional verbs. It consists of a verb or VMWE and an idiomatic selected preposition or postposition that is either always required or, if absent, changes the meaning of the verb of VMWE significantly. Language teams who decide to annotate IAV should do so after annotating other categories (step 4 of the annotation process), since overlapping can be quite frequent with other categories, as detailed below. Language teams are not required to use this category.
Our definition of inherently adpositional verbs is a generalization (applying to many languages) of the annotation guidelines of the English STREUSLE corpus, which define guidelines for annotating prepositional verbs.
IAVs are verb+adposition combinations in which:
- the dependents of the adposition are not lexicalized
разчитам на някого/нещо to rely on somebody/something is annotated as IAV because the object is not lexicalised,
but in the ID вземам на мушка някого/нещо take on target to critisise heavily somebody/something cannot be annotated as IAV because мушка is also lexicalized in the IDto stand for something is annotated as IAV because the object is not lexicalized,
but in the ID to take something for granted, to take for cannot be annotated as IAV because granted is also lexicalized in the IDentender de algo understand of somethingto know about something is annotated as IAV because the object is not lexicalised, whereas entender algo would not be any type of VMWE.n.a.pristati na kaj to land on (something) to agree (with something)is annotated as IAV because the object is not lexicalized,
but in the ID ostati na trdnih tleh to remain on solid ground to remain realistic ostati na to remain on cannot be annotated as IAV because trdnih tleh solid ground is also lexicalized in the ID - the adposition is integral, that is, "it cannot be omitted without markedly altering the meaning of the verb"
في رغب want to he has a desire to do something → رغب في * can occur without the preposition في * in , but it will never have a sense of رغب فيсчитам за to take for → *считам can never occur without the preposition за
разчитам на to rely on → разчитам can occur without the preposition, but it will never have a sense of to depend/rely onto rely on → *to rely can never occur without the preposition on
to count on → to count can occur without the preposition, but it will never have a sense of to depend/rely onentender de understand of somethingto know about something → entender to understandcan occur without the preposition, but it will never have a sense of to be an expert about something
contar con count withto rely on → contar to countcan occur without the preposition, but it will never have a sense of to rely on.n.a.grenzen aan → *grenzen can never occur without the preposition aan
behoren tot → behoren can occur without the preposition, but it will never have a sense of behoren tottemeljiti na to be based on → *temeljiti can never occur without the preposition na
biti za to be for to agree with or support (something or someone)→ biti to be can occur without the preposition, but it will never have a sense of to agree with or to support
Note that idiomatic adpositional valency, in which the adposition opens a slot for a complement, should not be mistaken for idiomatic verb-particle constructions. Tests distinguishing particles from prepositions can be used to disambiguate these categories.
Particles can occur after the object: to wake somebody up but prepositions cannot *to come a new restaurant across
Not only single verbs but also VMWEs may be inherently adpositional. This is why IAV annotation needs to be the last step, after all other VMWEs in a sentence have been identified and categorized. In case of overlap between another category and IAV, the whole VMWE annotation needs to be repeated with the addition of the lexicalized adposition, and the whole is annotated as an IAV.
1. to put up is annotated as VPC
2. the whole sequence to put up with is annotated as IAV
1. atenerse is annotated as IRV
2. the whole sequence atenerse a is annotated as IAV
1. ubadati se to deal RCLI is annotated as IRV, since the verb without the RCLI does not exist
2. the whole sequenceubadati se z to deal RCLI withis annotated as IAV, since the verb also does not exist without the preposition
Test IAV.1 - [CIRCUM-QUEST] Circumstantial question with no adposition
This is an adaptation of STREUSLE's guideline on prepositional verbs by Nathan Schneider and Meredith Green.In response to a declarative sentence with the verb+adposition combination, is there a natural way to query the circumstances of the verbal event using the verb, but not the adposition?
- it is not an IAV
- annotate as an IAV
- Why do you care?
→ to care about is not annotated as IAV
- ¿Por qué te preocupas?why you worry.you? Why are you worried?
→ preocuparse por is not annotated as IAV
- Se lahko zaneseš, da ti bo kdo pomagal? Can you rely that someone will help you?Can you rely on that someone will help you?
→ zanesti se to rely on is not annotated as IAV
- #When did you come?
to come across is annotated as IAV
- #¿Desde cuándo entiende? Since when understands.she?Since when does she know?
entender de is annotated as IAV
- #Kaj gre? #What goes?
gre za is annotated as IAV
Section 6
Tests for nominal MWEs (NMWEs)
If the DIST test has allowed us to decide that the MWE candidate has a nominal distribution, the status of this candidate (as NID, PronID, NV or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.
The role of the first 3 tests, NMWE.1, NMWE.2 and NMWE.3 is to eliminate a candidate if it is a named entity (or a definite description).
The tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions. In practice, however, it turns out that specific tests are often not applicable to some NMWE classes, and more generic tests (e.g. LEX) are required. As a consequence, generic tests, appearing towards the end of the list, may end up being used quite frequently.
Decision tree for nominal MWE candidates
- Apply test NMWE.1 - [SPECIF-REF: Candidate refers to a specific entity?]
- Apply test NMWE.2 - [NAMING-CONV: Naming convention applies to the whole class?]
- Apply test NMWE.3 - [SEM-TYPE: Person, organization, location, product or event?]
- It is a proper name or a definite description, not an MWE, exit
- It is not a proper name, continue to test NMWE.4
- It is not a proper name, continue to test NMWE.4
- It is not a proper name, continue to test NMWE.4
- Apply test NMWE.4 - [DEVERBAL: Candidate derives from a VMWE?]
- It is an NV.VID, NV.LVC.full, etc., depending on the outcome of the VMWE tests, exit.
- Apply test NMWE.5 - [PRON: Candidate on the list of MWE pronouns?]
- It is a PronID, exit.
- Apply test NMWE.6 - [CRAN: Candidate contains a cranberry word?]
- It is an NID, exit.
- Apply test NMWE.7 - [IRREG-STRUCT: Irregular syntactic structure?]
- It is an NID, exit.
- Apply test NMWE.8 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is an NID, exit.
- Apply test NMWE.9 - [MODIF: Modification of a component prohibited?]
- It is an NID, exit.
- Apply test NMWE.10 - [COORD: Coordination prohibited?]
- It is an NID, exit.
- Apply test NMWE.11 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
- It is an NID, exit.
- Apply test NMWE.12 - [HEAD: Semantic head is hypernym?]
- It is an NID, exit.
- Apply test NMWE.13 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is an NID, exit.
- It is not an MWE, exit
Test NMWE.1 - [SPECIF-REF] - Specific reference
In the given context, does the candidate refer to one or more specific entities, rather than being used generically?
- It might be a proper name, go to test NMWE.2
- It is not a proper name, continue to test NMWE.4
Many Johns Smiths live in London → Johns Smiths refers to several specific persons
He used the cold weapon hidden under his coat → cold weapon refers to a specific weapon
The two cold weapons were found at the place of the crime → cold weapon refers to several specific weapons
The theory of relativity was proposed by Einstein → there is only one theory of relativity, so it must be single and specific
the UN Secretary-General visited Greece → at the moment of writing there is only one UN Secretary-General (so he/she must be single and specific)
Universal Dependencies is a collection of treebanks - Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
I ate a cold lunch - cold lunch refers to a specific meal
Le (café) Descartes the (café) Descartes → le (café) Descartes refers to a specific place
Il cachait une/l' arme blanche sous le manteau He was hiding a/the cold weapon under his coat → arme blanche has a(n) (in)definite specific reference
Le Secrétaire général de l'ONU est en visite officielle en Grèce The secretary general of the UN is in visit official in Greece The UN Secretary-General is officially visiting Greece → 'Secrétaire général' de l'ONU is specific at the moment of writing
Il Segretario Generale dell’ONU ha rilasciato la dichiarazione. → Segretario Generale refers to a specific person
La teoria della relatività fu formulata da Einstein. → teoria della relatività refers to a specific theory
Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here - Maje Kowalskie refers two two specific persons
Posłużył się białą bronią przyniesioną w torbie He used the white weapon brought in his bagHe used the cold weapon brought in his bag → biała broń refers to a specific weapon
W pobliżu znaleziono kilka białych broni Nearby several white weapons were found Nearby several cold weapons were found →białe bronie refers to several specific weapons
paradox Banacha i Tarskiego został opisany w 1924 roku the Banach-Tarski paradox was described in 1924 → there is only one Banach-Tarski paradox (so it must be single and specific)
Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at the moment of writing there is only one such secretary
Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays - Anonimowi Alkoholicy Anonymous Alcoholics refers to a single specific organization
Zjadłam zimny obiadI ate a cold lunch - zimny obiad refers to a specific meal
Провалник је био наоружан хладним оружјем Provalnik je bio naoružan hladnim oružjem The burglar was armed with a cold weapon → хладно оружје hladno oružje has a(n) (in)definite specific reference
Генерални секретар УН Generalni sekretar UN The UN Secretary-General → Генерални секретар УН Generalni sekretar UN is specific at the moment of writing
Cold weapons are prohibited on a plane → cold weapons is used generically, i.e. refers to the whole class
I avoid cold lunches - cold lunches is used generically, i.e. refers to all instances of the class
The UN Secretary-General is the chief administrative officer of the United Nations →UN Secretary-General is used generically, i.e. refers to the whole class
J'évite de porter une chemise blanche I avoid wearing a white shirt → chemise blanche does not refer to a specific occurrence
Białe bronie są zabronione na pokładzie White weapons are forbidden onboardCold weapons are forbidden onboard → białe bronie white weapons cold weapons is used generically, i.e. refers to the whole class
dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - gospodarka rynkowa market economy is used generically
Nie lubię zimnych obiadów I don't like cold lunches - zimne obiady cold lunches refers to all instances of a class
Test NMWE.2 - [NAMING-CONV] - Concept naming convention
Does the naming convention between the candidate c and an entity e refer to all instances of a whole semantic class? In other words, can c refer to another entity e' based on the properties of e’, with no need of an extra naming convention?
- It is not a proper name, go to test NMWE.4
- It could be a proper name, continue to test NMWE.3. Note that the answer might be no in two cases:
- The is no other e' in the concept denoted by the candidate
- There could be another e' in the same class as e but the naming convention does not apply to it
The two cold weapons were found at the place of the crime → if another entity e' occurs which has the same properties as the ones in this sentence (it is a weapon that does not use explosives or fire), e' can be called cold weapon with no need of an extra naming convention
the UN Secretary-General visited Greece → at a different moment in time, there can be another person e' playing the same role, so she/he can be called UN Secretary-General with no need for an extra naming convention
I ate a cold lunch → if another entity e' occurs which has the same properties as the one in this sentence (it is a lunch which is cold), e' can be called cold lunch with no need of an extra naming convention
Le Secrétaire général de l'ONU a un mandat de 5 ans The UN Secretary-General has a five-year term →any e' may be designated by c with no extra conventions, as long as it occupies the function c
Ha pagato con una carta di credito.
W pobliżu znaleziono kilka białych broni Nearby several white weapons were foundNearby several cold weapons were found → as above
Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at a different moment in time, there can be another person e' playing the same role, so she/he can be called Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization with no need for an extra naming convention
Zjadłam zimny obiadI ate a cold lunch → if another entity e' occurs which has the same properties (it is a lunch which is cold), e' can be called zimny obiadcold lunch with no need of an extra naming convention
Председник Републике Александар Вучић је изјавио...Predsednik Republike Aleksandar Vučić je izjavio... President of the Republic Aleksandar Vučić said... → at a different moment in time, there can be another person e' playing the same role, so she/he can be Predsednik Republike President of the Republic with no need for an extra naming convention
Universal Dependencies is a collection of treebanks - there is no other e' which could be called Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays → there is no other e' which could be called Anonimowi Alkoholicy Anonymous Alcoholics
Many Johns Smiths live in London →- as above
Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here →- as above
Test NMWE.3 - [SEM-TYPE] - Semantic type
Is the entity e referred to by the candidate c a PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT or EVENT?
- The candidate is a proper name or a definite description, not an MWE, exit.
- It is not a proper name, continue to test NMWE.4
Universal Dependencies → a treebank collection is a human product
Einstein's mother → definite description
Black Sea → location
l'Organisation des nations unies → an ORGANISATION
Charante-Maritime →a LOCATION
le Petit Robert →a HUMAN PRODUCT
la Nuit Blanche → an EVENT
Ξενοφῶν ἈθηναῖοςXenophōn Athēnaios Xenophon, the Athenian Xenophon.NOM.sg.m Athenian.NOM.sg.m
Mario Rossi → person
Organizzazione delle Nazioni Unite → organisation
Dizionario Treccani → human product
Hołd pruski 1525 Prussian Tribute 1525 → event
Morze Martwe Dead Sea → location
Zygmunt III Waza Sigismund III Vasa → person
Alzheimer's disease → a disease is not a human product nor an event
demenza senile → a disease is not a human product
Test NMWE.4 - [DEVERBAL] - Deverbal NMWE
Does the candidate contain a deverbal noun and can the candidate be rephrased (in the given context) using a verbal expression which passes the VMWE tests?
- It is a deverbal nominal MWE (NV), with the corresponding VMWE subcategory, e.g. NV.VID, NV.LVC.full, etc.
- Continue to the next test
Elle est preneuse de notes pour sa camarade => Elle prend des notes pour sa camarade - prend des notes is an LVC.full, so preneuse de notes is an NV.LVC.full
La déclaration de guerre est autorisée par le Parlement The declaration of war is authorized by Parliament → déclarer la guerre à NP is a VID. déclaration de guerre (à NP) is an NV.VID, argument of the verb autoriser.
La presa in considerazione dell'evento è stato importante
była to zabawa jego kosztem => bawili się jego kosztem - -bawili się jego kosztem is a VID, so zabawa jego kosztem is an NV.VID
rzut oka na text => rzuciłam okiem na tekst - rzuciłam okiem is a VID, so rzut oka is an NV.VID
zrobić coś za Bóg zapłaćto do something for God-payto do something for free => zrobić coś licząc, że Bóg za to zapłacito do something counting on God to pay it back - Bóg zapłaciGod will pay is not a verbal MWE, so Bóg zapłaćGod-payis not an NV (but it is an NMWE)
był działaczem ruchu robotniczegohe was an activist in a workers' movement => działal w ruchu robotniczymhe acted in a workers' movement is not a VMWE, so działacz ruchu robotniczegoactivist in a workers' movement is not an NV
Test NMWE.5 - [PRON] - Pronoun
Does the candidate occur on the closed list of MWE pronouns or should the list be extended with this candidate? Such lists need to be established for each language separately. Care should be taken about distinguishing PronIDs from DetIDs.
- It is a pronominal idiom (PronID)
- Continue to the next test
I expect no one to come
we love each other
Je n'ai vu qui que ce soit I not have seen whoever it be.SUBJV.3.SG I didn't see anyone (ProID) → 'qui que ce soit' is a pronominal idiom.
zawiniłam samej sobie I.am.guilty alone myself I'm guilty myself
there is no one right way to tell the story - no one is not a pronoun here but two determiners
to samo się rozwiąże this alone itself will solve this will solve itself - to samo is not a complex pronoun but a simple pronoun to and an adjective samo
dyskusja o moralnych aspektach gospodarki rynkowej discussion about ethical aspects of the market economy - gospodarka rynkowa market economy is not a pronoun but a nominal phrase
Test NMWE.6 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- it is a nominal idiom (NID)
- Continue to the next test
status quo → foreign words like 'status' and 'quo' are considered cranberry words
kith and kinfreinds and relations → 'kith' is not a standalone word
helter-skelter tall tower at a fun-fair → 'helter' and 'skelter' do not exist alone outside this expression
riff-raff ill-behaved people → 'raff' does not exist alone outside this expression
cha-cha(-cha) ballroom dance performed with small steps and swaying hip movements → 'cha' does not exist standalone
méli-mélo confused mixture → 'méli' and 'mélo' are not stand-alone words
frou-frou rustling → 'frou' does not exist outside of this compound
loup-garou werewolf → 'garou' is not a stand-alone word
pont-levis drawbridge → 'levis' is not a stand-alone word
cha-cha-cha ballroom dance performed with small steps and swaying hip movements → 'cha' is not a stand-alone word
bric-à-brac bric-à-brac → 'brac' is not a standalone word (cf. de bric et de broc (AdvID))
casus belli → foreign words like 'casus' and 'belli' are considered cranberry words
a iosa → 'iosa' does not exist outside this expression
a sbafo → 'sbafo' does not exist outside this expression
tran tran → 'tran' is not a stand-alone word
dziś wydaje się to jeszcze science fiction today it still looks like science fiction - 'science' and 'fiction' are not stand-alone words in Polish
odnośnie mass mediów concerning mass media - 'mass' is not a standalone word in Polish
Test NMWE.7 - [IRREG-STRUCT] - Irregular syntactic structure
Does the candidate have an internal syntactic structure which is irregular for its distribution, i.e. it does not have a structure of a nominal.
- It is a nominal idiom (NID)
- Continue to the next test
double-bind dilemma → Adj-V
a hold-up → V-Adv
fast day → V-N
love-hate relationship → V-V N
round about → V-Adv
(un) porte-manteau support-coat coat-rack → V-N
monte-charge raise load goods lift → V-N
(un) franc-parler frank-talk frankness → Adj-V
(un) à-coup at-strike juddering → Preposition-N
il caro prezzi
rapporto amore-odio
Test NMWE.8 - [MORPH] - Morphological inflexibility
Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
(the) grass roots → #grass root
She invested in real estate → She invested in *real estates
(des) vacances d'hiver vacations of winter winter vacation → *vacance d'hiver
(une) respiration mécaniquement assistée respiration mechanically assisted mechanical ventilation → *respirations mecaniquement assistées
la tavola rotonda → *tavole rotonde
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.
Test NMWE.9 - [MODIF] - Prohibited modification
Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
(a) state-of-the-art → #mental state-of-the-art, #state-of-the-fine-art
starting blocks → #starting to run blocks
rowing machine → *rowing slowly machine
runner bean → #slow runner bean
(un) livre d'or book of gold guestbook → *un livre de mon frère d'or, *un livre de cet or
(une) table ronde table round round-table discussion → #une table très ronde
(une) lettre recommandée letter recommended registered letter → #une lettre recommandée par mon voisin
lo stato dell'arte → *lo stato della vera arte
środki masowego przekazu means of mass transfer mass media - #służby bardzo masowego oficjalnego przekazu
Test NMWE.10 - [COORD] - Prohibited coordination
Does coordination of the candidate with another candidate of the same head lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
foul line → *foul and side lines
a can of worms → *a can of worms and tuna
un esprit critique spirit critical critical mind → #un esprit critique et frappeur
un pot à épices jar of spices spice jar → *pot à épices et à lait
pot à eau jug at water water jug → pot à eau et à lait
porta interna → porta interna ed esterna
Test NMWE.11 - [SYNT] - Syntactic inflexibility
Does another regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
a dog’s breakfast a mess → #breakfast of dog
hard shoulder emergency lane → #a shoulder that is hard
les sciences naturelles natural sciences → #les sciences qui sont naturelles
il cuoco capelluto → il cuoio che è capelluto
stan wojenny war state martial law - #stan wojny state of war
Test NMWE.12 - [HEAD] - Semantic head
Is the semantic head h of the candidate c its hypernym, which can be reformulated by "is c a type of h"? Note that sometimes the syntactic and semantic heads do not coincide.
- It is a nominal idiom (NID))
- Continue to the next test
red herring → It is not a type of sea fish, but it suggests an idea of a misleading clue
a square peg (in a round hole) someone who does not fit in → It is not a peg but a person
una testa calda → it is not a type of "testa"
osobowość prawna legal personality legal person - it is not a type of personality
manna z nieba manna from heaven miracle - it is not a type of manna
a bunch of flowers→ these are flowers (here the semantic head 'flowers' is different from the syntactic head 'bunch')
un nuage de lait cloud of milk a dash of milk → It does not refer to a type of cloud but to a small quantity (of milk)
moulin à paroles mill at words blabbermouth → It does not refer to a type of mill but to a person
studente lavoratore
gospodarka rynkowa market economy - it is a type of economy
ruch oporu movement of resistence resistance movement - it is a type of a movement
Test NMWE.13 - [LEX] - Lexical inflexibility
Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning??
- It is a nominal idiom (NID)
- It is
not an MWE, exit
chain reaction → #chain change(s)
deep water → #profound water
vicious circle → vicious cycle but #vicious sphere/round/ring...
vanity case → vanity box but #arrogance/narcissism/self-admiration box/case
boarding pass → boarding card but #bording ticket/voucher/document/...
tête de lard head of lard stubborn → *tête de graisse, *chef de lard
peine perdue effort lost fruitless effort → *peine égarée
mauvaise/méchante langue bad mouth → #bonne/gentille langue
circolo vizioso
milowy krok one-mile step important event - #kilometrowy krok one-kilometer step
gospodarka rynkowa market economy - #gospodarka handlowa/komercyjna/targowa economy of trade/commerce/market
personal/professional... judgement
deep anxiety/love/conversation...
mauvaise odeur/habitude/surprise... bad smell/habit/surprise
méchant garçon/professeur/marchand... mean boy/teacher/merchant
Section 7
Tests for adjectival and adverbial MWEs (AMWEs)
If the DIST test has allowed us to decide that the MWE candidate has an adjectival or an adverbial distribution, the status of this candidate (as an AdjID, AdvID, AV or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.
Like for nominal MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.
Decision tree for adjectival and adverbial MWE candidates
In this tree, a single YES to one of the tests is sufficient to decide that a candidate is an AMWE.- Apply test AMWE.1 - [DEVERBAL: Candidate derives from a VMWE?]
- It is an AV.VID, AV.LVC.full, etc., depending on the outcome of the VMWE tests, exit.
- Apply test AMWE.2 - [CRAN: Candidate contains a cranberry word?]
- It is an AdjID or an AdvID, exit.
- Apply test AMWE.3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is an AdjID or an AdvID, exit.
- Apply test AMWE.4 - [IRREG-STRUCT: Irregular syntactic structure?]
- It is an AdjID or an AdvID, exit.
- Apply test AMWE.5 - [MODIF: Modification of a component prohibited?]
- It is an AdjID or an AdvID, exit.
- Apply test AMWE.6 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is an AdjID or an AdvID, exit.
- It is not a MWE, exit
Test AMWE.1 - [DEVERBAL] - Deverbal AMWE
Does the candidate contain a deverbal adjective or adverb and can the candidate be rephrased (in the given context) using a verbal expression which passes the VMWE tests?
- It is a deverbal adjectival or adverbial MWE (AV), with the corresponding VMWE subcategory, e.g. AV.VID, AV.VPC.full, etc.
- Further tests are required
a plan brought to fruition over a decade (AV.LVC.cause) → bring to fruition is an LVC.cause. We will bring the plan to fruition
a time-killing activity (AV.VID) → kill time is a VID We killed time watching a movie
made-up stories (AV.VPC.full) → make up is a VPC.full. They completely made up these stories.
Ce plat est très arrache-gueule tearing.up-mouth This dish burns the mouth (AV.VID) → arracher la gueule is a VID
Un exercice casse-gueule break-face a risky exercise (AV.VID) → se casser la gueule is a VID
daleko idące uogólnienia far going generalisationsfar reaching generalisation => te uogólnienia idą daleko these generalisations go far - iść dalekoto go far is not a verbal MWE, so daleko idący far goingfar reachingis not an AV (but it is an AMWE)
Test AMWE.2 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution.
- Further tests are required
The boat rocked to and fro on the waves back and forth (AdvID) → fro is not a standalone word in English
She was in fine fettle in good condition (AdjID) → fettle is not a standalone word in English
He drove off in high dudgeon angrily (AdvID) → dudgeon is not used outside this idiom
He was hale and hearty healthy and strong (AdjID) → hale is not used outside this idiom
Une famille de bon aloi of good sterling A family of sterling reputation (AdjID) → aloi is not used outside this expression
boire à tire-larigot drink to excess (AdvID) → larigot is not used standalone in French
manger à la bonne franquette eat without any fuss (AdvID) → franquette is not used standalone in French
construire un abri de bric et de broc construct a shelter from a hodgepodge of objects (AdvID) → 'broc' is not a standalone word
Test AMWE.3 - [MORPH] - Morphological inflexibility
Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
- Further tests are required
by heart learn something in such a way that you can say it from memory (AdvID) → #by hearts
by no means not at all (AdvID) → #by no mean
from time to time sometimes but not often (AdvID) → #from times to times
hot under the collar embarrassed or angry about something (AdjID) → #hot under the collars
larger than life more interesting, obvious than usual (AdjID) → #larger than lives
down to earth practical (AdjID) → #down to earths
By the way, have you decided yet? (AdvID) → *by the ways, have you decided yet?
Elle vient ici à titre exceptionnel at title exceptional She exceptionally comes here (AdvID) → #aux/*à titres exceptionnels
Il pleut. En conséquence, on ne sort pas. in consequence It rains. As a result, we'll not go out (AdvID) → #en conséquences
z powrotem with return back - #z powrotami
daleko idące uogólnienia far going generalisationsfar reaching generalisation - daleko idące uogólnienie far going generalisationfar reaching generalisation, daleko idąca zmiana far going changefar reaching change
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.
Test AMWE.4 - [IRREG-STRUCT] - Irregular syntactic structure
Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?
- It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
- Further tests are required
one of a kind unique (AdjID) → *of a kind in the sense of of a unique kind
four-in-hand knot a method of tying a necktie (AdjID) → four is an orphan
back in the day back then (AdvID) → #in the day vs. in the old days
At bottom, he is a kind person (AdvID) → #at a/the bottom (of N)
By and large, the project was a success (AdvID) → Unusual coordination (preposition and adjecvive)
Elle est sous pression au travail She is under pressure at work (AdjID) → pression is not determined (unusual) cf. sous une grande pression
Un costume sur mesure on measure made-to-measure (AdjID) → mesure is not determined (unusual) cf. sur la mesure de N
Un plat aigre-doux sour-sweet A sweet and sour plate (AdjID) → Unusual coordination with hyphen
co roku every year.DATevery year (AdvID) → the adposition 'c' requires an accusative for all other nouns
daleko idące uogólnienia far going generalisationsfar reaching generalisation has a regular Adv-Adj structure
Test AMWE.5 - [MODIF] - Prohibited modification
Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?
- It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
- Further tests are required
Dans tous les cas, c'est foutuin all the cases In any case, it's screwed (AdvID) → #Dans tous les cas connus/possibles/qu'on connaît
na serio on seriously seriously - *na bardzo serio on very seriously
z powrotem with retun back - *z ostatecznym powrotem with final retunr
Test AMWE.6 - [LEX] - Lexical inflexibility
Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
- It is an adjectival or an adverbial idiom (AdjID or AdvID), depending on its distribution
- It is not a MWE, exit
He is on cloud nine very happy (AdjID) → #on cloud ten
ice-cold drinks (AdjID) → #snow-cold
She thinks she is over the hill no longer young (AdjID) → #over the mountain
a hot pink dress (AdjID) → #hot red, #cold pink
À la limite, on reporte la réunion at the limit if necessary (AdvID) → #à l'extrémité, #au seuil
Par-dessus le marché, il a plu over above the market On top of that, it rained (AdvID) → #sur le marché, #par-dessus le bazar/commerce/pacte
w celu manipulacji in the aim of manipulation in order to manipulate - w zamierzeniu manipulacji in the intention of , w zamyśle manipulacji in the intention of
daleko idące uogólnienia far going generalisationsfar reaching generalisation -#daleko maszerujące/posuwające się/jadące uogólnienia
Section 8
Tests for functional MWEs (FuncMWEs)
If the DIST test has allowed us to decide that the MWE candidate has a distribution of a function word (determiner, adposition, conjunction or interjection) the status of this candidate (as an DetID, AdpID, ConjID, IntID or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.
Like for nominal, adjectival and adverbial MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.
Decision tree for functional MWE candidates
In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a FuncMWE.- Apply test FuncMWE.1 - [CRAN: Candidate contains a cranberry word?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.2 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.3 - [IRREG-STRUCT: Irregular syntactic structure?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.4 - [MODIF: Modification of a component prohibited?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.5 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- It is not a MWE, exit
Test FuncMWE.1 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
by dint of repetition (AdpID) through repetition - 'dint' is not a standalone word in English
on behalf of everyone (AdpID) instead of - 'behalf' is not a standalone word in English
à l'instar de ces héros (AdpID) at the equivalent of as these heroes - 'instar' is not a standalone word in French
la plupart de ces héros (DetID) the greater.part of most of these heroes - 'plupart' is not a standalone word in French
in the end of - all components are standalone word
dans un supermarché in a supermarket - all components are standalone words
po to, by wiedzieć for it, to know in order to know - all components are standalone words
Test FuncMWE.2 - [MORPH] - Morphological inflexibility
Does the candidate contain a content word (noun, verb, adjective or adverb), and does a morphological change of this word that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
Big deal! (IntID) → #Big deals!
a great deal of experience (DetID) → #deals of
du fait de la crise sanitaire (AdpID) of the fact of the crisis sanitarydue to the public health crisis→ #des faits de la crise sanitaire
after the meeting/meetings → compositional expressions
po to, by wiedzieć for it, to know in order to know - both components do not inflect, so there could be no morphological flexibility
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.
Test FuncMWE.3 - [IRREG-STRUCT] - Irregular syntactic structure
Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
good gracious (IntjID) → Adj + Adj with no N head
mercy me! (IntjID) → N + Pronoun with omitted verb and agent
Ça alors! that well My! (IntjID) → Pronoun followed by an adverb
peu de gens little of people few people (DetID) → Adv + Preposition
po to, by wiedzieć for it, to know in order to know - regular structure of an adverbial: adp-pron
Test FuncMWE.4 - [MODIF] - Prohibited modification
Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
spoons as well as knives (ConjID) → spoons *as well and good as knives
a little salt (DetID) → #a little but strong salt vs. a little but strong person
en sorte que cela se calme in sort that it calms so that it calms (ConjID) → *en bonne sorte que cela se calme
des tas de choses Det.ind.pl lots of things lots of things(DetID) → #des tas très hauts de choses vs. des tas énormes de blé
jak to? how this? howcome? - #jak samo to?
po to, by wiedzieć for it, to know in order to know - *po samo to for only it
w czasie wojny in time of war during war - #w długim/trudnym/niebezpiecznym czasie wojny
Test FuncMWE.5 - [LEX] - Lexical inflexibility
Does the candidate contain a content word (noun, verb, adjective or adverb), and does a regular replacement of this components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- It is not a MWE, exit
in consequence of the sentence (AdpID) → #in result of the sentence
as long as you finish your homework (ConjID) → *as short/large as you finish your homework
Give me a little money (DetID) → *Give me a small money
Repas préparé par les soins de Madame X (AdpID) by cares of Meal prepared by Mme X → *Repas préparé par l'attention/la prévenance/la sollicitude de Mme X
Il n'est pas venu sous prétexte qu' il était malade (ConjID) under pretext that He didn't come on the pretext that he was ill → *Il n'est pas venu sous excuse qu' il était malade
jak też pretensje as also reproaches and reproaches (ConjID) - *jak oraz pretensje
coś tam jeszcze something there more something more (PronID) - #coś tu jeszcze
wpół do piątej at.half to five half past four (AdpID) - #wpół po piątej
Section 9
Language-specific tests
Language-specific tests may be necessary in one of 3 cases:
- a VMWE category may be universal or quasi-universal but it may require different tests in different languages,
- any category specific to a language must be associated with appropriate tests in the same language,
- universal tests can build upon more elementary language-specific tests (e.g. to distinguish a particle from a preposition).
Section 9.1
Language-specific categories (LS)
Language-specific categories can be proposed for annotation in this task provided that they are carefully defined and accompanied by linguistic tests that allow to distinguish them from other categories. We recommended not redefining the universal and quasi-universal categories described here, but introducing new names and abbreviations in order to answer such needs.
When a new language(-group)-specific category is introduced, we encourage the use of the LS category with a dotted extension, e.g. LS.SIM or LS.PROV (for "language-specific simile" or "language-specific proverb").
Section 9.2
Particles versus prepositions and prefixes
The following tests allow to properly identify prepositional verb particles in cases where they might be homographic with prepositions in prepositional phrases (PPs) or with verbal prefixes. The word to be discriminated is referred to as a candidate word. The tests are language-specific and concern English, German and Swedish.
English-specific test for distinguishing particles from preposition
The following tests concern English words which can be either a preposition or a particle depending on the context, e.g. up, on, through, etc. If a candidate word passes any of the two tests it can be categorized as a particle.
Test PREP.EN.1 - [FIN-PART] - Sentence-final particle
Can the sentence be reformulated so that the candidate word w occurs at the end of a clause which is: (i) affirmative or imperative, (ii) headed by the verb governing w, and (iii) not a relative clause?
- the candidate word is a particle
- go to the next test
I took off my clothes. I took my clothes off.
She tries to take in her clients. She tries to take her clients /in.
He has been off alcohol*He has been alcohol off.
Test PREP.EN.2 - [AD-INS] - Adjunct insertion
Is an insertion of a circumstantial adjunct prohibited between the governing verb and the candidate word?
- the candidate word is a particle
- it is not a particle
I took off my clothes at once. *I took at once off my clothes.
She always tries to take in her clients. *She tries to take always in her clients.
He has been off alcohol recently. He has been recently off alcohol.
This test might be redundant with respect to test PREP.EN.1. It it occurs to be so (after a large-scale annotation), it may be deleted.
German-specific tests for distinguishing particles from prepositions and verbal prefixes
The following tests concern German words which can be both a particle and either a preposition or a verbal prefix, depending on the context, e.g. mit, um, vor, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.
Test PREP.DE.1 - [FIN-PART] - Sentence-final particle
Does the candidate word occur at the end of the sentence or can the sentence be reformulated so as to put the candidate word at the end?
- it is a particle
- other tests are needed
Ich schlage vor allen zu verzeihen. I propose to forgive everyone Ich schlage es vor I propose it
Der Mülleimer wurde umgefahren. The trash bin was knocked down Er fuhr den Mülleimer um. He knocked down the trash bin
Er umfuhr den ganzen See mit dem Fahrrad. He drove around the whole lake with a bike *Er fuhr ihn um.
Test PREP.DE.2 - [SEP-PART] - Separable particle
Can the verb and the candidate word be spelled both separately and together?
- it is a particle
- other tests are needed
Er fuhr das Schild um. He drove over the sign Er sollte das Schild nicht umfahren He should not drive over the sign
Sprechen Sie mit ihm! Speak with him! *Sie sollen ihm mitsprechen.
Swedish-specific tests for distinguishing particles from prepositions and verbal prefixes
Many words are ambiguous between particles and prepositons, e.g. för, upp, … Accordingly, the following sentence may have two different senses:
The difference can only be judged by the stress/intonation pattern. In the first case, with a particle, the stress is not on the verb but on the particle. In the second case, with a prepositional object, the main stress is on the verb, with only secondary stress on the preposition.
Test PART.SV.1 - [PART-STRESS] - Stress on the particle
Is the main stress on the candidate word rather than on the verb?
- it is a particle
- it is not a particle
Section 9.3
Identifying multiword tokens
The relation between words and tokens is not always 1-to-1. If a single token contains more than one word then it is a potential MWE. For the purpose of MWE annotation it is, therefore, important provide a possibly clear-cut definition of a word. This section contains language-specific tests for identifying multiword tokens (MWTs). Currently the tests concern Swedish.
Swedish-specific tests for identifying MWTs
Test MWT.SV.1 - [VERB-MWT] - Verbal MWT
Does the candidate token function as a verb?
- we do not have to decide if it is an MWT (for the purpose of VMWE annotation)
- go to the next test
sysselsättning task-settingemployment
förklara for-clearexplain
klargöra clear-makeclarify
Test MWT.SV.2 - [SPLIT-MWT] - Splittable MWT
Split the candidate token into its component parts. Can it be used as an expression in the split form (possibly with slightly shifted semantics)?
- it is an MWT
- go to the next test
avbryta off-breakcancel, bryta av break offbreak off
Test MWT.SV.3 - [CRAN-MWT] - Cranberry component in a MWT
If you split the token into its component words, is any of these words a cranberry word (i.e. it cannot be used as a standalone word, with the same part-of-speech)?
- it is not an MWT
- it is an MWT
erbjuda er-offer offer → er is possible as a pronoun but not as a particle
försvåra for-difficult make difficult → svåra is possible as an adjective but not as a verb
jämföra compare → jäm is not used as a stand-alone word
för|klara for|clear explain
klar|göra creal|make clarify
Section 9.4
Language-specific inherently clitic verbs (LS.ICV)
Inherently Clitic Verbs (LS.ICV) together with the Inherently Reflexive Verbs (IRV) are pronominal verbs. LS.ICV are formed by a full verb combined with one or more non-reflexive clitic that represents the pronominalization of one or more complement (CLI). LS.ICV is annotated when (a) the verb never occurs without one non-reflexive clitic, e.g. entrarci to be relevant to something colloquial form, or (b) when the LS.ICV and the non-clitic versions have clearly different senses or subcategorization frames.
LS.ICVs represent a specific category for some Romance languages, and they are particularly frequent in the Italian language. It is often challenging to distinguish LS.ICV from IRV, particularly because some clitics may be ambiguous, like se/si which is a polyfunctional clitic pronoun and grammatical marker (and has many functions such as reflexive, reciprocal, impersonal, passivizing, aspectual, middle).
If the CLI has a clear reflexive meaning the VMWE might be an IRV.
We start by listing the various categories of LS.ICVs before providing tests to decide whether to annotate a given occurrence as an LS.ICV.
- Inherently clitic verbs ⇒ ANNOTATE as LS.ICV
- The verb without the CLI does not exist
infischiarsene (not worry about) vs *infischiare
- The verb without the CLI does exist, but has a very different meaning
darla (gl.: give it) (transl. fuck around) ≠ dare (give)
prenderle (gl.: take them) (transl. be beaten) ≠ prendere (take)
prenderci (gl.: take it) (transl. grasp the truth) ≠ prendere (take)
starci (gl.: stay there) (transl. agree) ≠ stare (stay) - The verb has more than one CLI of which the second one is an invariable object complement.
fregarsene (gl.: matter self of-it) (transl.don’t care about)
infischiarsene (transl. not worry about)
curarsene (gl.: take care self of-it) (transl. care about)
prendersela (gl.: take self it.FEM)(transl. be angry/upset)
sentirsela (gl.: feel self it.FEM) (transl. be in the mood of)
sentirselo (gl.: feel self it.MASC) (transl. feel)
vedersela (gl.: see self it.FEM)(transl. to manage something) - The verb has two non-reflexive invariable CLIs:
farcela (gl.: make there it.FEM) (transl. succeed)
- The verb has a different meaning with respect to an intensive use of the same two non-reflexive invariable CLIs:
andarsene (gl.: go away self from-there) (transl. die) ≠ andarsene (go away)
bersela (gl.: drink slef it.FEM) (transl. believe) ≠ bersela (drink)
- The verb without the CLI does not exist
LS.ICV-specific decision tree
- Apply test LS.ICV.1 - [CL-INHERENT]
- Annotate as LS.ICV
- Apply test LS.ICV.2 - [CL-DIFF-SENSE]
- Annotate as LS.ICV
- Apply test LS.ICV.3 - [CL-DIFF-SUBCAT]
- Annotate as LS.ICV
- Exit
Test LS.ICV.1 - [CL-INHERENT] Inherent clitic
Does the verb only exist with the CLI and never occurs without it?
- annotate as LS.ICV
infischiarsi ⇒ *infischiare
infischiarsene ⇒ *infischiare - next test
Test LS.ICV.2 - [CL-DIFF-SENSE] - Different sense
Given the same verb without the CLI/CLIs, are all of its meanings clearly different from the inherently clitic form?
- annotate as LS.ICV
smetterla (gl.: quit it) (transl. knock it off) ≠ smettere (quit)
prenderle (gl.: take them) (transl. get beaten up) ≠ prendere (take)
prenderci (gl.: take it)(transl. grasp the truth) ≠ prendere (take)
starci (gl.: stay there)(transl. up for it) ≠ stare (stay)
curarsene (gl.: take care self of-it) (transl. care about) ≠ curare (take care)
prendersela (gl.: take self it.FEM)(transl. be angry/upset)≠ prendere (take)
sentirsela (gl.: feel slef it.FEM) (transl. be in the mood of) ≠ sentire (feel)
darla (gl.: give it.FEM) (transl. fuck around) ≠ dare (give) - next test
Test ICV.3 - [CL-DIFF-SUBCAT] - Different subcategorization frame
Is the subcategorization frame of the simple verb without the CLI different from the subcategorization frame of the LS.ICV?
- annotate as LS.ICV
X se la prende con Y ⇔ X prende Y
- Exit
Section 9.5
Italian-specific decision tree
For Italian, a language-specific category called inherently clitic verbs (LS.ICV) has been defined. This implies a modified version of the annotation decision tree.
Steps 1-4 are still valid in Italian. But Step 3 should be realized with the decision tree below instead of the generic decision tree.
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the test IT.S.1 - [CLITICS-ONLY: Are all lexicalized dependents of the verb clitics?]
- Apply the LS.ICV-specific tests ⇒ LS.ICV tests positive?
- Annotate as a VMWE of category LS.ICV
- It is not a VMWE, exit
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Non-reflexive clitic ⇒ Apply LS.ICV-specific tests ⇒ LS.ICV tests positive?
- Annotate as a VMWE of category LS.ICV
- It is not a VMWE, exit
- Particle ⇒ Apply IVPC-specific tests ⇒ IVPC tests positive?
- Annotate as a VMWE of category IVPC.full or IVPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
Test IT.S.1 - [CLITICS-ONLY] Clitics only
Are all lexicalized dependents of the verb clitics??
- apply LS.ICV tests
- next test
Section 9.6
Hindi-specific decision tree
For Hindi, LVCs can be formed by a verb and a noun, or by a verb and an adjective which is morphologically identical to an eventive noun. This implies a modified version of the annotation decision tree.
Steps 1-4 are still valid in Hindi. But Step 3 should be realized with the decision tree below instead of the generic decision tree.
- Apply test S.1 - [1HEAD: Unique verb as functional syntactic head of the whole?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.2 - [1DEP: Verb v has exactly one lexicalized dependent d?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Apply test S.4 - [CATEG: What is the morphosyntactic category of d?]
- Reflexive clitic ⇒ Apply IRV-specific tests ⇒ IRV tests positive?
- Annotate as a VMWE of category IRV
- It is not a VMWE, exit
- Particle ⇒ Apply IVPC-specific tests ⇒ IVPC tests positive?
- Annotate as a VMWE of category IVPC.full or IVPC.semi
- It is not a VMWE, exit
- Verb with no lexicalized dependent ⇒ Apply MVC-specific tests ⇒ MVC tests positive?
- Annotate as a VMWE of category MVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category ID
- It is not a VMWE, exit
- Extended NP or an adjective which is morphologically identical to an eventive noun ⇒ Apply LVC-specific decision tree ⇒ LVC tests positive?
- Annotate as a VMWE of category LVC
- Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
- Another category ⇒ Apply the VID-specific tests ⇒ VID tests positive?
- Annotate as a VMWE of category VID
- It is not a VMWE, exit
Section 10
Annotation management
This section groups the documentation on practical aspects of the annotation campaign management. Some of these aspects are specific to this shared task, such as the edition of examples by language leaders and the use of the annotation platform FLAT. Others are more generic and concern the guidelines in general, such as the FAQ section.
Section 10.1
Frequently Asked Questions (FAQ)
Annotators often face questions and challenging examples. When several annotators ask the same question, we will update the list of frequently asked questions.
However, we suggest that language teams set up another communication platform to deal with questions that are specific to a language. This can take the form of a shared online document, a wiki, a dedicated bug tracking system or mailing list. We also suggest keeping track of decisions taken considering borderline examples (with a list of expressions to which the decision applies). These should be kept in a centralized document or page that all annotators can access.
Whenever you think that a question can also be interesting to other languages, please notify the organizers and we will try to update this page.
- How to define an unexpected change in meaning?
- How to annotate lexicalized words which belong to contractions, compounds, and acronyms?
- How to annotate coordinated VMWEs sharing some components?
- How to annotate elliptical occurrences of VMWEs?
- How to annotate VMWEs that seem to belong to more than one category?
- How to annotate embedded VMWEs?
- Are existential expressions with there is/are considered VMWEs?
- How to categorize VMWEs which seem LVCs but do not pass all LVC tests?
- Why are verb+noun constructions with pure operator verbs (to commit, to make, to have etc.) considered LVCs?
- Does the IRV category include verbs with non-reflexive clitics?
- Should nominalizations of VMWEs be annotated?
- How to express hesitation between different VMWE categories?
- How can one decide what are the semantic arguments of a noun for borderline cases?
- How does one decide if a more or less frozen determiner is a lexicalized VMWE component?
- Should I annotate compound and serial verbs as VMWEs? Of which category?
- If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?
- In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?
- In the LVC decision tree, should I test that the noun keeps its original meaning?
- How can I easily browse the already existing annotations in my corpus?
Check the glossary entry that defines unexpected change in meaning
In some languages adpositions (pre- or post-positions), clitics and determiners are subject to contractions (i.e. they yield multiword tokens, MWTs). If they are properly split by the tokenizer, only the lexicalized parts of each contraction should be annotated. If you use FLAT for annotating, the display of split contractions is twofold: both in its folded and unfolded version. Only the latter should be subject to annotation, e.g. Jean bénéficie du de le traitement Jean benefits from the treatment, Jean donne du de le grain à moudre à son fils Jean gives grain to grind to his sonJean gives an occasion to act to his son.
Sometimes, however, tokenizers might not handle contraction splitting properly. In this case, a lexicalized component of a VMWE can be merged with an external word:
A similar problem occurs in languages with productive compounding, where a lexicalized component of a VMWE and a free modifier can build up a multitoken word (since compound splitting might not be a standard feature of a tokenizer):
Heisshunger haben to have hot hunger to be ravenously hungry
Yet another related phenomenon concerns acronyms whose spelled-out versions may contain predicative nouns which in the abbreviated versions boil down to single letters:
the book underwent OCR (optical character recognition)
the program carries out a PCA (principal component analysis)
le patient fait un AVC (accident vasculaire cérébral)
Since the current annotation format is token-based, we prohibit correcting tokenization errors and compound splitting by the annotators for the sake of coherence. Therefore the annotation of such contractions, compounds and acronyms finds no fully satisfactory solution in our schema. We propose to annotate a whole MWT each time it contains a word which is part of a VMWE. Annotators should add a textual comment about the mixed status of this MWT:
Heisshunger → MWT containing a lexicalized VMWE Hunger and an additional modifier heiss
A component shared by two or more coordinated VMWEs should be annotated as belonging to both of them.
Such hesitation issues should normally be solved by the structural tests. For instance, consider the German expression sich eine Frage stellen SELF a question put to doubt. It may seem to belong to both IRV, since sich is required only if stellen co-occurs with Frage, and LVC, since Frage keeps its original meaning and stellen brings no additional meaning. However, test S.2 [1DEP] indicates that an expression like this should be annotated as a VID, since the verb has more than one lexicalized syntactic dependent.
Similarly, the French expression avoir peur have fear to be afraid seems to have features of a VID. Unlike most LVCs, it does not allow a determiner *avoir une peur have a fear , except when the noun is modified avoir une grande peur have a great fear . However, test S.4 [CATEG] in the generic decision tree 2, and the LVC-specific decision tree indicate that it belongs to the LVC category.
Candidate VMWEs embedded in other VMWEs should be annotated only if they have a VMWE status also outside the particular context. For instance, the VMWE to let the cat out of the bag should be annotated as a VID, and its embedded VMWE to let out as a VPC.
On the other hand, the French expression se faire des idées SELF make DET.PL ideas to imagine things which are not true, se faire should not be annotated as IRV, since it is not inherently reflexive as a standalone verb+clitic combination.
Hesitations about a possible LVC status can arise with respect to existential constructions with nouns introducing events or properties (see test LVC.1 [N-PRED]) as in:
Namely, the noun keeps its original sense and the existential verb to be or to have brings no additional meaning. However, a candidate LVC must also pass test LVC.4 [V-REDUC]. This requires the modification of the noun by the verb's subject, which is impossible with impersonal and empty subjects like there. Therefore, such candidates cannot be LVCs.
Note, however, that existential expressions themselves can be VMWEs of the VID type. For instance, in the French example il y a des plaintes it there has complaints there are complaints, two dependents of the verb a has are lexicalized: il it and y there , therefore it is a VID (see test S.2 [1DEP]).
If at least one of the five LVC tests (9 to 13) is not passed, the candidate is not considered an LVC. For the sake of a deterministic VMWE categorization and higher inter-annotator agreement, we admit a definition of an LVC which might seem more restrictive than some linguistic studies usually assume. Thus, we exclude from the LVC scope:
- expressions in which the verb's syntactic subject is not necessarily the noun's semantic subject, like to give courage or to make an impression. These candidates do not pass test LVC.4 [V-REDUC].
- expressions where the lexicalized nominal dependent of the verb is its subject, as in the problem lies in something; these candidates do not pass test LVC.4 [V-REDUC].
- expressions with aspectual verbs, as in to start, to pursue, to stop a walk. These do not pass test LVC.3 [V-LIGHT] since they add (aspectual) semantics to the noun. The only exception is when the noun itself is already aspectual, as in to come into bloom
Pure operator verbs, i.e. such verbs which never have any semantics per se but only carry the grammatical (tense, mood etc.) information, seem to contradict the intuition behind a VMWE. Namely, they usually select a whole semantic class of nouns. For instance to commit selects any negative act (a crime, a suicide, a theft) and to perform selects any activity (a task, an experiment, a miracle). In this sense, their complements resemble open slots and the whole combinations resemble collocations. However, for the sake of a deterministic VMWE categorization and higher inter-annotator agreement, we do include verb+noun combinations with pure operator verbs, such as to commit a crime and to perform a task, into the LVC category. This is because such combinations pass all tests (LVC.0 through LVC.4). We found no other reliable tests which would distinguish such productive cases from less productive ones like to make a decision. In particular, some studies (e.g. Bonial 2014) show that there exist no truly productive light verbs. Therefore, all examples cited here to be classified as LVCs.
No, the IRV category only includes (some) combinations of a head verb with a reflexive clitic. As indicated in the borderline cases page of IRV category, other pronouns, whenever lexicalized, trigger the VID category. Recall that whenever more than one dependent of the verb is lexicalized (including or not a reflexive clitic), the VMWE is always categorized as an ID
The only nominal VMWE variants within our annotation scope are those:
- headed by the gerund stemming from the head verb of the VMWE - taking of the decision, and
- in which a noun stemming from a VMWE is modified by a participle or a relative clause headed by the verb stemming from the same VMWE - the decisions taken yesterday, the decision which he took.
Other nominalizations are excluded:
puesta a punto setting to point set-up
For practical reasons (e.g. compatibility with an existing annotation, or usefulness for a particular application) they can be considered language-specific VMWEs but then a new category should be defined for them, so as to keep the universal and the quasi-universal categories intact
Once identified in a text, each VMWE is to be assigned to exactly one category. Note that in this version of the guidelines we no longer admit "hesitation labels" (e.g. LVC/VID) used in the pilot annotation. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.
The goal of test LVC.1 is to identify whether a noun is predicative, that is, whether it requires at least one semantic argument. For many classes of abstract nouns, however, it can be tricky to apply the test. We advise listing in a separate document those classes of nouns that pass test LVC.1 in your language. Language teams can also provide links to the documentation of semantic annotation projects such as NomBank for English, which usually include tests and descriptions that help identifying semantic arguments.
We suggest considering that the following categories pass test LVC.1:
Ο Γιάννης έχει συνάχι = ο Γιάννης είναι άρρωστος (αρρώστεια is a hypernym of συνάχι)
Relations:
Ο Γιάννης έχει σχέση με κάποιον = Ο Γιάννης σχετίζεται με κάποιον
Ο Γιάννης έχει επαφές με κάποιον = Ο Γιάννης επικοινωνεί με κάποιον (επικοινωνία is a synonym of επαφή)
Mental content (internal to a cognizer):
Ο Γιάννης έχει ανησυχία = Ο Γιάννης ανησυχεί
Ο Γιάννης έχει μια ιδέα = Ο Γιάννης σκέφτεται (σκέψη is a synonym of ιδέα)
Ο Γιάννης έχει την άποψη = Ο Γιάννης κρίνει (κρίση is a synonym of άποψη)
John has a flu = John is ill (illness is a hypernym of flu)
Relations:
John has contact with somebody = John contacts somebody
John has an affair with somebody = John is involved with somebody (involvement is a synonym of affair)
Mental content (internal to a cognizer):
John has a worry = John worries
John has an idea = John thinks (thought is a synonym of idea)
John has an opinion = John believes (belief is a synonym of opinion)
Miha je v dvomih Miha is in doubts = Miha dvomi Miha doubts
Miha je mnenja Miha is of opinion = Miha meni Miha believes
Miha ima predstavo/pojma Miha has an idea = Miha meni Miha thinks (predstava, pojem are synonyms of idea in this context)
Please notice that events and states that have no semantic arguments do not pass test LVC.1, even if they have verbal/adjectival paraphrases:
Informational content (external to a cognizer): information, news
Informational content (external to a cognizer): informacije, novice information, news
Finally, notice that not any verb + predicative noun combination forms an LVC. Additionally, the verb needs to be "light", not adding semantics to the noun. The remaining LVC tests guarantee this.
Most of the time, it is easy to test whether a determiner is lexicalized by searching alternatives in corpora (or on the web). For instance, the is lexicalized in to kick the bucket because searches for other determiners (this, a, some, three, many, etc.) either do not return any result or return only literal uses of this verb phrase.
However, borderline cases do exist, in which alternatives are rare but possible, specially for LVCs and decomposable VIDs. For instance, while the standard form of the idiom spill the beans forbids some determiners (#spill three/twenty beans), it is possible to find some variation (spill these/many/all/my/his/more/no beans).
We argue that the selection of some determiners (but not all) by a VMWE is comparable to selected prepositions for verbs. Thus, it can be seen as a regular grammatical phenomenon, suggesting that when the determiner varies, then it should not be included in the annotation scope. Possesive pronouns (my, her, their, etc.) and reflexive clitics (myself, herself, themselves, etc.) are exceptions to this rule (see also Section 1.4). Namely, when they are constrained to agree in number and person with the subject (I do my best, *I do your best), they are realized by different lexemes, i.e., strictly speaking, they are not lexicalized. We consider, however, that - with respect to lexicalization - they constitute single lexemes inflecting for number and gender.
Patricular language teams may of course adopt their own criteria for annotating partly frozen determiners. Then, these decisions should be documented in language-specific guidelines.
It depends. In many Indo-European languages (including Germanic, Romance and Balto-Slavic families), verbal chains using auxiliary and modal verbs are used to express tense, mood, modality and aspect. This is a regular linguistic phenomenon, fully productive, that can be applied to any verb and should not be annotated at all.
On the other hand, some languages have idiomatic compound and serial verbs, that is, VMWEs whose lexicalized components are two verbs, and where of them does not express tense, mood, modality and/or aspect with respect to the other one. Therefore, we have created a new category in edition 1.1 to annotate these constructions, called multi-verb construction (MVC), covering examples such as:
to make do
vouloir dire want say to mean
voler dire want say to mean
można wytrzymaćone can standthe situatiion is reasonably good
ouvir falar hear speak to know/remember vaguely
The guidelines determine that only lexicalized components should be annotated. Therefore, we suggest that, in such cases, if the NP is compositional, only the head of the NP is included in the scope of the LVC. This may lead to the annotation of odd LVCs that actually never occur by themselves without a modifier. This is not a problem and is already the case for other VMWEs, e.g. the ones that only occur with a determiner, but the determiner is not lexicalized. The only cases where the NP should be included as a whole is if the complement is a non-compositional MWE, so that it would not make any sense to annotate only the head.
κάνω στάση εργασίας to-make stop work.SG.GEN to go on strike, to strike → the expression στάση εργασίας is non-compositional (term)
mener une vie de débauche to have a life of pleasures
faire un faux pas make a false step to commit a faux pas → the expression faux pas is non-compositional
fazer roleta russa to make russian roulette to play russian roulette → the expression roleta russa is non-compositional
ter uma situação financeira/profissional/estável to have a financial/professional/stable situation
Notice that these suggestions also apply to LVCs whose nominal complements are introduced by prepositions (i.e. verb+PP LVCs). As usual, the preposition should be included if it is lexicalized and then the NP introduced by the preposition is analyzed exactly as described above.
If the complex dependent is an acronym, you may want to add the textual comment "PART" to indiate that only part of the full version is lexicalized (generally, the head), just like for contractions and compounds.
Depending on the language, aspect can be realised by various lexical, morphological and syntactic means.
- We consider aspect a morpological feature in the following cases:
- Perfective or continuous aspect introduced by inflection and/or analytical tenses:
- Perfective or imperfective aspect inherent to the verb (independently of its inflected form), recognisable either by a prefix or by an ending:
John was making a presentation
he called her while having a walkJan was een presentatie aan het maken Jan was making a presentationpełnić rolęfulfil.IMPERF a roleto play a role
wypełnić rolęfulfil.PERF a roleto play a role
wypełniać rolęfulfil.PERF a roleto play a roleTaja je postavljala vprašanjaTaja was asking questions
ves čas je dajal napačne napovedi he was always giving wrong forecasts - We consider aspect a semantic feature in the following cases:
- Starting, continuation or completion is expressed by precise verbs which usually modify other verbs:
η Μαρία άρχισε τη συζήτηση Maria started the conversation
ο Γιάννης διέκοψε την κουβέντα John interrupted the discussionAnthony started his presentation in advance
the weather interrupted the transmission twice
we kept our show regardless of the reactionsde regen onderbrak de wedstrijd the rain interrupted the matchTomaž je začel svoje predavanje Tomaž started his lecture
Politik je nadaljeval svojo napoved reform the politician continued his forecast about reforms
naredili bomo konec onesnaževanju we will make end to pollution we will put an end to pollution
In Test LVC.3, we verify whether the verb adds "light" semantics to the predicative noun. When aspect is expressed as a morphological feature, such as in the first item above, we consider that the verb is light and test LVC.3 passes. However, when aspect is a semantic feature rather than a morphological feature, test LVC.3 fails and we do not have an LVC.
The previous version (1.0) of the annotation guidelines contained Test 10 [N-SEM], which checked if the noun in an LVC candidate preserves one of its original senses. If it did not, the candidate was not an LVC.
In the current version of the guidelines we have abandoned this test because:
- it proved hard to establish the list original senses of a noun,
- this test was superfluous with respect to Test LVC.4 [V-REDUC],
- in some verbal idioms (VIDs) the noun also keeps its original sense, so the test can be misleading for the LVC vs. VID distinction.
Grew-match is the perfect tool for this purpose. It can be used in two modes
- As a corpus browser - here you can ask Grew queries and the MWEs matching these querries will be diaplayed. The 3 latest versions of your corpus are uploaded on Grew-Match (select the correct langauge). In particular, the latest version is the one which is loaded in the development branch of your language repository (see here).
- As a consistency check tool - available from the language table in the PARSEME wiki. This tool groups all sentences containing the same MWEs (like here for Polish).
Section 10.2
Adding new examples in your language
It is often useful to have examples of a phenomenon shown in your own language. Examples in the guidelines are presented as in the template below:
Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. Languages use color codes according to their language groups. See the section on notation for more information.
In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.
Editing or adding examples
The shared examples edition spreadsheet used in previous versions of the guidelines is not used any more, all modifications are done on online and are visible immediately. To edit or add examples to the guidelines, you need to create an account on the guidelines 2.0 examples edition platform. You also have to ask Takuya Nakamura, Agata Savary or Carlos Ramisch to grant you the edition rights for your language.
Once you are logged in, you will see some buttons close to each example.
- The 'copy' button copies the source of the example, and is useful if you want to copy the example of another language and then translate it.
- The 'source' button is always available for languages you have the right to edit, and allows you to edit the example's XML-like source code, as described below.
- The 'edit' button is only shown for examples that follow the formatting rules, and allows you to edit the example using a user-friendly interface.
Instructions to create well formatted examples (or correct the ill-formatted ones in 'source') are available in the example edition instructions.
When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language (use the 'copy' button), and then adapting it to your language. You can then paste the example in your language's 'source' mode. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example, you should always check if it is relevant in the context.
If there is something wrong or suspicious with your example, the interface will show an error or warning message.
If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the example empty or add a n.a. comment.
Examples with tags
Let us analyse the English example below, shown in 'source' mode:
MWEs with <lex>their lexicalized components</lex> in English are indicated like this.
As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex>
and </lex>
. When writing an example, you will often have to use XML tags. We describe below the most important ones.
Bold: you should surround lexicalized components with the tags <lex>
and </lex>
. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>
. This code is presented as follows:
- He will take a shower
Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe>
and </nmwe>
tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe>
yields the following:
- This is not an MWE
Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u>
and </u>
. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe>
yields the following:
- This is not an MWE
Latin-script transcription:
You can optionally provide latin-script transcription if your language does not use latin characters.
Latin-script transcriptions must be surrounded by the tags <latin>
and </latin>
.
For example, the code الدرس <latin>ad-dars</latin>
generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.
- الدرس ad-dars
Gloss icon:
You should also provide English glosses and translation for your examples.
Glosses and translations should always be provided in English, and never in another language.
Glosses must be surrounded by the tags <gl>
and </gl>
.
Translations must be surrounded by <trans>
and </trans>
.
English examples can also use the tag <trans>
to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans>
generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>
.
- défendre son bifteck defend one's beefsteak to defend one's interests
Comments:
Some examples are presented followed by an explanation or comment, in normal font (black color). This is done by using the tags <n>
and </n>
. For example, the code some words <n>→ further details</n>
generates this:
- some words → further details
Newline:
Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they can be presented on separate lines using the tag <br/>
. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). This tag will be treated by the 'edit' interface to break examples that can be edited separately. For example, the code example 1 <br/> example 2 <br/> example 3
will be rendered as follows:
- example 1
example 2
example 3
Inside normal text, you may also use tags such as <i>
(italics), <strong>
(bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.
Section 10.3
Annotation platform FLAT
The annotation will be performed using the online annotation platform FLAT. The documentation of the platform annotation is provided in a separate document. Check the useful links below:
- The FLAT user manual for the PARSEME annotation guidelines version 1.2
- Link to the PARSEME shared task FLAT platform
Section 10.4
Best practices
Annotating VMWEs in text is a hard task. Many tests are semantic and require not only a strong knowledge about the language, but also knowledge of advanced notions in linguistics. As a consequence, ensuring annotation quality and, above all, intra- and inter-annotator consistency, is a challenge. We provide here a set of hints that you can use to try to optimize the annotation effort and ensure the quality of the resulting corpus.
Resources and people
This website only covers the annotation guidelines. Do not forget that many other resources are available on the PARSEME shared task 1.1 website. That website is not for system authors, but for language leaders, annotators and organizers. It contains many useful data, notably the names and contacts of people that can help you, and user manuals for FLAT, for the language leaders, etc. Also, you can use the mailing lists if you need to ask questions that could be relevant for other teams as well. In short, don't be shy to ask if you would like to do something but you're not exactly sure where to start :-)
NotVMWE label
The new FLAT configurations for edition 1.1 allow you to use an optional annotation label called NotVMWE. This is not a new VMWE category, but an auxiliary label which simply means "this is not a VMWE". NotVMWE is an optional and useful label you can use to indicate that something should not be annotated, specially if it is a borderline case. Adding this annotation allows you to add a textual comment saying why you decided not to annotate this construction (e.g. after discussing it with fellow annotators and recording the decision in the list of solved cases).
While you don't need to use this label, we recommend that you use it for challenging/hard cases which, in the end, you decide not to annotate as a VMWE. This kind of annotation will be useful when performing consistency checks. Of course, NotVMWE labels will all be removed in the final released corpora, since this kind of information is irrelevant for shared task participants.
List of solved cases
In edition 1.0, some languages have ensured consistency by keeping a separate shared document (e.g. a Google spreadsheet) where hard/challenging cases were documented. We advise language leaders to implement such a list of solved cases. This allows all annotators to contribute to the discussion of hard cases, and to reach a common decision that can be later applied systematically to all occurrences of the expression and for similar expressions. From our experience, this greatly enhances the satisfaction of annotators and saves some valuable time during the consistency checks. Even for languages that have a single annotator, she/he can keep a personal list of difficult cases and their decisions, to ensure intra-annotator consistency.
Consistency checks
Once all files have been annotated, language leaders will perform the final consistency checks using semi-automatic tools. During these consistency checks, all occurrences of a single expression annotated by all annotators will be shown together. There, language leaders may change annotations performed by individual annotators if they are incoherent with the other annotations. Therefore, do not worry too much if you are unsure about an annotation. Try to be as consistent as possible, but if you do not remember a particular annotation performed earlier, it is not necessary to search through the corpus on FLAT (this is quite time-consuming). If there is some minor inconsistency, it will probably be corrected later by the language leader. But note your decision down on the list of solved cases so that next time you come across the same expression (or a similar one) you do not spend so much time thinking about it.
Intuition and tradition vs. guidelines
You may sometimes (often) find that the guidelines do not reflect your intuition about a given construction, or that they contradict the linguistic tradition and literature in your language. We understand that this is frustrating, but please, remember that our main objective is achieving universal modelling of MWEs while preserving diversity. Therefore, please refrain from using undocumented criteria (a.k.a. intuition), or tests that are only known/documented in your language.
The guidelines were designed taking feedback from many language teams into account. They are also meant to continuously evolve, and we do count on you to play an active role in this process. Therefore, if you disagree with their current version, please, choose one of the two options:
- Follow the guidelines anyway to ensure the corpus-to-guidelines consistency, but express your criticism (documented with glossed and translated examples in your language), best via Gitlab issues. You may also add comments to those annotations which you would like to modify once the guidelines have been enhanced.
- Create a language-specific section for the guidelines, describing your own tests and decision trees. We will be happy to publish it online.
Inter-annotator agreement
Usually, data annotation campaigns require measuring inter-annotator agreement (e.g. kappa) to verify that the guidelines are clear and that the annotators are well trained. We encourage language teams to measure inter-annotator agreement. However, in the PARSEME shared task, the organizers do not set any hard threshold on the kappa value required to accept your annotations as part of the shared task. This is a collaborative effort, so we do not feel comfortable with making such requirements to language teams.
Furthermore, VMWE annotation is a very hard task so inter-annotator agreement is expected to be low. We recommend that language teams use complementary tools and resources to compensate for the low agreement, such as the list of solved cases and consistency checks mentioned on this page. After the annotation is completed, we may ask you to double-annotate a sample of your data so that we can calculate inter-annotator agreement, for instance, to report it on a corpus description article. But you should not worry too much about this: do your best in trying to understand the guidelines, do not hesitate to suggest improvements, and try to train annotators as much as possible, for instance, with pilot annotations and discussions. This way, you will ensure that the data released in the shared task for your language will be of high quality. And remember you will have the opportunity to improve it incrementally for the next shared task.
TODO label
We have introduced a new label on FLAT called "{change-me} TODO". This label is a temporary mark-up used to indicated that a given VMWE must be dealt with by a human annotator. It will be used when a corpus is automatically converted and some annotations must be manually checked. For instance, the OTH category from shared task 1.0 disappeared in edition 1.1. Therefore, all VMWEs annotated as OTH in the 1.0 corpora will be automatically converted using the TODO label. This means that all TODO labels must be changed into a valid new category (e.g. VID). In the final annotated corpora, any remaining TODO label will be removed, since this is not actually a VMWE category but just an auxiliary label.
Existence questions and corpus queries
Some tests ask if is possible/impossible to find some attested variant of a candidate. While for many cases this is straightforward (the variant can be easily found), some borderline cases will inevitably occur in which it is hard to tell if a given variant is impossible or just very rare.
Decisions for hard cases like this should not be made based solely on introspection and intuition. In case of doubts, we recommend that annotators:
- check existing lexicons for their languages
- perform corpus queries using any available large raw monolingual corpus
- run web queries, e.g. using Sketch Engine, Linguee or plain Google
- discuss the case with other annotators, reach a decision and mark it in the list of solved cases
In all cases, the list of lexicons, monolingual corpora and/or web platforms to consult should be agreed upon in advance by all annotators.
Section 11
Glossary
Candidate VMWE
A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.
Collocation
A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:
играя футбол to play football
drastically drop
el diagrama muestra the diagram shows
coger el tren to take the train
przyznać rację to admit right to admit that someone is right
uprawiać sport to practice sports
wzruszać ramionami to shrugging one's shoulders
drastično zmanjšati drastically reduce
Cranberry word
A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:
jemanden einen Besuch abstatten
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
sprawiedliwości stało się zadośćjustice has been done
Extended nominal phrase
An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:
- noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
- prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
- noun phrases with case markers
- noun phrases with postpositions
преди всичко before everything
dla wszystkich for everyone
z prawdziwego zdarzeniafrom a true event genuine
ENP is close to the UD understanding of the nominal phrase.
Particles
Particles are hard to distinguish from homographic prepositions:
ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
to get up a hill
jestem za ustawąI an for the lawI am in favor of the law
The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:
den See umfahren to drive around the lake
Ongelukken kunnen worden voorkomen accidents may be prevented
Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.
Reflexive clitics
Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:
Semantic argument
A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.
- Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to
specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit
implies the existence of its semantically mandatory participants. For instance, a visit cannot hold
if there is no visitor or no visitee, courage is a property of a being,
a presentation implies the existence of a presenter, of an audience and of a
presented topic. Some participants are not semantically mandatory, for instance the addressee is
not semantically mandatory for a whisper because one can whisper without an addressee.
We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps
delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to
LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the
predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.priti v poštev to come into consideration to be considered
imeti mnenje to have an opinion to believe - Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.
Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.
Subcategorization frame
A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:
- return: [NP]subject + [NP]direct object + [to
NP]oblique
- Example: [my sister]subject returned [the book]direct-object [to the library]oblique
Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).
Syntactic and semantic heads
The syntactic head of a construction is the part of the construction which determines the morphosyntactic valence constraints of the whole construction. For instance, in The producers of tobacco use a form of asbestos in this kind of filter, the syntactic head of producers of tobacco is producers, since it determines e.g. the plural form of the verb use.
The semantic head of a construction is the part of the construction which determines the lexico-semantic selectional restrictions of the whole construction. In the sentence above producers is also the semantic head, since it determines the semantic type of the whole construction (here: human), which agrees with the constraints imposed by the verb use.
Cases in which syntactic and semantic heads differ include transparent nouns: part of the room, liter of wine, her jerk of a husband, etc. For instance in The majority of tobacco producers uses a form of asbestos in this kind of filter, the syntactic head of majority of tobacco producers is majority and the semantic head is producers.
Bibliography:
- Charles J. Fillmore, Collin F. Baker, and Hiroaki Sato. 2002. Seeing Arguments through Transparent Structures. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
- Alan Cruise. 2006. A Glossary of Semantics and Pragmatics, Edinburgh University Press.
- Adam Przepiórkowski. On heads and coordination in valence acquisition. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing (CICLing 2007), number 4394 in Lecture Notes in Computer Science, pages 50–61, Berlin, 2007. Springer-Verlag.
Syntactic argument
Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.
Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.
Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.
Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.
Syntactic operator
A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:
Angst haben to have fear
ein Verbrechen begehen to commit a crime
to have fear
to commit a crime
tener miedo
hacer ilusión
een misdrijf plegen to commit a crime
Unexpected change in meaning
An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:
the same does not hold for:
That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:
is a VMWE.
Similarly, Test IVPC.1 [PART-REDUC] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:
Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
to look up into the sky implies to look into the sky (it is not an IVPC)
Ungrammaticality
Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).
Section 12
Contact
These guidelines were written by many authors. If you have questions, comments, suggestions, you can contact the people in charge of the PARSEME corpora initiative.
You are welcome to also contribute to this initiative in other ways - see why and how.