corpora of multiword expressions - version 1.2 (2020)
shared task on semi-supervised identification of verbal multiword expressions - edition 1.2 (2020)
A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.
A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:
цените се покачват prices rise
играя футбол to play football
- eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
the graphic shows
responder a una petición to answer a request
el diagrama muestra the diagram shows
coger el tren to take the train
zalać rynek to flood the market to dominate the market
przyznać rację to admit right to admit that someone is right
uprawiać sport to practice sports
wzruszać ramionami to shrugging one's shoulders
občutno zmanjšati significantly reduce
drastično zmanjšati drastically reduce
A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:
- вземам на мушка някого/нещо take on target to critisise heavily somebody/something
jemandem Angst einjagen to-someone chase-in fear to frighten someone
jemanden einen Besuch abstatten
- to go astray
sin decir ni chus ni mus → chus is not a stand-alone word without to_say neither chus nor mus without saying a word
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
- se mettre martel en tête SELF put a hammer in head to worry a lot
odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
sprawiedliwości stało się zadośćjustice has been done
- pune pe roate - roate is a form found only in expressions in the literary language
- biti si kvit owe nothing to somebody; each party got what it deserved/asked for
Extended nominal phrase
An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:
- noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
- въпрос question, зелена светлина green light
- explanation, the dog, many old documents
- explicación, el perro, muchos documentos antiguos
- explication, le chien, quelques documents anciens
- ludzie people, najbliżsi współpracownicy closest collaborators
- razlaga, pes, številni stari dokumenti explanation, the dog, many old documents
- prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
за здраве for (good) health
преди всичко before everything
- on the bed, after the lesson, in front of the window
- en la cama, después de la clase, enfrente de la ventana
- sur le lit, après le cours, devant la fenêtre
ze stanowiska from a position
dla wszystkich for everyone
z prawdziwego zdarzeniafrom a true event genuine
- na postelji, po pouku, pred hišo, za steno on the bed, after the lesson, in front of the house, behind the wall
- noun phrases with case markers
- предавам богу дух give to god.GEN soul to die
- ludzi people.GEN, najbliższymi współpracownikami closest.INST collaborators.INST
- mačka cat (nominative), mačke cat (genitive), mački cat (dative), mačko cat (accusative), o mački cat (prepositional), z mačko cat (instrumental)
- noun phrases with postpositions
- n. a.
ENP is close to the UD understanding of the nominal phrase.
Particles are hard to distinguish from homographic prepositions:
ich schlage vor allen zu verzeihen I propose to forgive everyone
ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
to get up a petition
to get up a hill
jestem zaI an forI am in favor
jestem za ustawąI an for the lawI am in favor of the law
- n. a.
The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:
das Schild um|fahren to drive over the sign
den See umfahren to drive around the lake
- n. a.
Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.
Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:
- се, си
- mich, dich, sich, uns, euch
- me, te, se, nos, os
- me, te, se, nous, vous
- mi, ti, si, ci, vi
- się, sobie
- me, te, se, nos, vos
- mă/m-, te, se/s-, ne, vă/v-, se/s- (for accusative); îmi/mi-/-mi, îți/ți-/-ți, își/și-/-și, ne, vă/-vă/v-, își/și-/-și (for dative)
- se, si
A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.
- Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit implies the existence of its semantically mandatory participants. For instance, a visit cannot hold if there is no visitor or no visitee, courage is a property of a being, a presentation implies the existence of a presenter, of an audience and of a presented topic. Some participants are not semantically mandatory, for instance the addressee is not semantically mandatory for a whisper because one can whisper without an addressee.
We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
- To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.
- To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.
- To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.
- To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.
- To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.
- To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.
- To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.
priti v poštev to come into consideration to be considered
imeti mnenje to have an opinion to believe
- Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.
Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.
A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:
- return: [NP]subject + [NP]direct object + [to NP]oblique
- Example: [my sister]subject returned [the book]direct-object [to the library]oblique
Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).
Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.
Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.
Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.
Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.
A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:
- отдавам почит to give tributeto pay tribute
eine Entscheidung treffen to make a decision
Angst haben to have fear
ein Verbrechen begehen to commit a crime
to make a decision
to have fear
to commit a crime
tomar una decisión
- oddać hołd to give-back tributeto pay tribute
- priti v poštev to come into consideration to consider
Unexpected change in meaning
An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:
- давам ти книгаI give you a book → давам ти тетрадка/роман/том/учебник I give you a notebook/novel/volume/textbook
- Ich gebe dir mein Buch I give you my book → Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
- Te doy mi libro I give you my book → Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
- I give you my book → I give you my notebook/novel/volume/publication
- daję ci książkęI give you a book → daję Ci zeszyt/powieść/tom/publikację I give you a notebook/novel/volume/publication
- îți dau carteaI give you the book → îți dau caietul/romanul/volumul/publicația I give you the notebook/novel/volume/publication
- dam ti besedo I give you a wordI promise → #dam ti izraz/zlog/glagol I give you a word/syllable/verb
the same does not hold for:
- давам ти дума I give you a wordI give you my word → #давам ти слово/израз/текст I give you a word/expression/text
- Ich gebe Dir mein Wort I give you my word, i.e. I promise → #Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
- Te doy mi palabra I give you my word, i.e. I promise → #Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
- I give you my word → #I give you my notebook/novel/volume/publication
- daję ci słowo I give you a wordI give you my word → daję Ci wyraz/sylabę/czasownik I give you a word/syllable/verb
- Îți dau cuvântul I give you my word → #Îți dau caietul/romanul/volumul/publicația I give you my notebook/novel/volume/publication
- dati komu besedo to give (someone) a wordto promise someone
That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:
- давам своята дума to give one's word to someone
- jmd. sein Wort geben to give one's word to s.o.
- to give one's word to someone
- dar a alguien tu palabra to give one's word to s.o.
- dać komuś słowo to give someone a wordI give one's word to someone
- a-ți da cuvântul cuiva to give your word to someone
is a VMWE.
Similarly, Test VPC.1 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:
Ich fange das Buch an I begin to read the book does not imply Ich fange das Buch I catch the book
Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
to check in upon arrival does not imply to check upon arrival (it is VPC)
to look up into the sky implies to look into the sky (it is not a VPC)
Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).