Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the
corpora annotated for multiword expressions
Tests for functional MWEs (FuncMWEs)
If the DIST test has allowed us to decide that the MWE candidate has a distribution of a function word (determiner, adposition, conjunction or interjection) the status of this candidate (as an DetID, AdpID, ConjID, IntID or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.
Like for nominal, adjectival and adverbial MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.
Decision tree for functional MWE candidates
In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a FuncMWE.- Apply test FuncMWE.1 - [CRAN: Candidate contains a cranberry word?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.2 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.3 - [IRREG-STRUCT: Irregular syntactic structure?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.4 - [MODIF: Modification of a component prohibited?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- Apply test FuncMWE.5 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is a DetID, AdpID, ConjID or IntID, exit.
- It is not a MWE, exit
Test FuncMWE.1 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
by dint of repetition (AdpID) through repetition - 'dint' is not a standalone word in English
on behalf of everyone (AdpID) instead of - 'behalf' is not a standalone word in English
à l'instar de ces héros (AdpID) at the equivalent of as these heroes - 'instar' is not a standalone word in French
la plupart de ces héros (DetID) the greater.part of most of these heroes - 'plupart' is not a standalone word in French
in the end of - all components are standalone word
dans un supermarché in a supermarket - all components are standalone words
Test FuncMWE.2 - [MORPH] - Morphological inflexibility
Does the candidate contain a content word (noun, verb, adjective or adverb), and does a morphological change of this word that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
Big deal! (IntID) → #Big deals!
a great deal of experience (DetID) → #deals of
du fait de la crise sanitaire (AdpID) of the fact of the crisis sanitarydue to the public health crisis→ #des faits de la crise sanitaire
after the meeting/meetings→compositional expressions
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.
Test FuncMWE.3 - [IRREG-STRUCT] - Irregular syntactic structure
Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
good gracious (IntjID) → Adj + Adj with no N head
mercy me! (IntjID) → N + Pronoun with omitted verb and agent
Ça alors! that well My! (IntjID) → Pronoun followed by an adverb
peu de gens little of people few people (DetID) → Adv + Preposition
Test FuncMWE.4 - [MODIF] - Prohibited modification
Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- Further tests are required
spoons as well as knives (ConjID) → spoons *as well and good as knives
a little salt (DetID) → #a little but strong salt vs. a little but strong person
en sorte que cela se calme in sort that it calms so that it calms (ConjID) → *en bonne sorte que cela se calme
des tas de choses Det.ind.pl lots of things lots of things(DetID) → #des tas très hauts de choses vs. des tas énormes de blé
jak to? how this? howcome? - #jak samo to?
Test FuncMWE.5 - [LEX] - Lexical inflexibility
Does the candidate contain a content word (noun, verb, adjective or adverb), and does a regular replacement of this components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?
- It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
- It is not a MWE, exit
in consequence of the sentence (AdpID) → #in result of the sentence
as long as you finish your homework (ConjID) → *as short/large as you finish your homework
Give me a little money (DetID) → *Give me a small money
Repas préparé par les soins de Madame X (AdpID) by cares of Meal prepared by Mme X → *Repas préparé par l'attention/la prévenance/la sollicitude de Mme X
Il n'est pas venu sous prétexte qu' il était malade (ConjID) under pretext that He didn't come on the pretext that he was ill → *Il n'est pas venu sous excuse qu' il était malade
jak też pretensje as also reproaches and reproaches (ConjID) - *jak oraz pretensje
coś tam jeszcze something there more something more (PronID) - #coś tu jeszcze
wpół do piątej at.half to five half past four (AdpID) - #wpół po piątej