Annotation guidelines
corpora annotated for multiword expressions
Glossary
Candidate VMWE
A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.
Collocation
A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:
играя футбол to play football
drastically drop
el diagrama muestra the diagram shows
coger el tren to take the train
przyznać rację to admit right to admit that someone is right
uprawiać sport to practice sports
wzruszać ramionami to shrugging one's shoulders
drastično zmanjšati drastically reduce
Cranberry word
A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:
jemanden einen Besuch abstatten
no decir ni chus ni mus → chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
hacer algo a troche y moche → troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
sprawiedliwości stało się zadośćjustice has been done
Extended nominal phrase
An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:
- noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
- prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
- noun phrases with case markers
- noun phrases with postpositions
преди всичко before everything
dla wszystkich for everyone
z prawdziwego zdarzeniafrom a true event genuine
ENP is close to the UD understanding of the nominal phrase.
Particles
Particles are hard to distinguish from homographic prepositions:
ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
to get up a hill
jestem za ustawąI an for the lawI am in favor of the law
The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:
den See umfahren to drive around the lake
Ongelukken kunnen worden voorkomen accidents may be prevented
Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.
Reflexive clitics
Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:
Semantic argument
A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.
- Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to
specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit
implies the existence of its semantically mandatory participants. For instance, a visit cannot hold
if there is no visitor or no visitee, courage is a property of a being,
a presentation implies the existence of a presenter, of an audience and of a
presented topic. Some participants are not semantically mandatory, for instance the addressee is
not semantically mandatory for a whisper because one can whisper without an addressee.
We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps
delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to
LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the
predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.priti v poštev to come into consideration to be considered
imeti mnenje to have an opinion to believe - Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.
Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.
Subcategorization frame
A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:
- return: [NP]subject + [NP]direct object + [to
NP]oblique
- Example: [my sister]subject returned [the book]direct-object [to the library]oblique
Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).
Syntactic argument
Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.
Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.
Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.
Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.
Syntactic operator
A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:
Angst haben to have fear
ein Verbrechen begehen to commit a crime
to have fear
to commit a crime
tener miedo
hacer ilusión
een misdrijf plegen to commit a crime
Unexpected change in meaning
An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:
the same does not hold for:
That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:
is a VMWE.
Similarly, Test VPC.1 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:
Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
to look up into the sky implies to look into the sky (it is not a VPC)
Ungrammaticality
Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).