Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the
corpora annotated for multiword expressions
Tests for nominal MWEs (NMWEs)
If the DIST test has allowed us to decide that the MWE candidate has a nominal distribution, the status of this candidate (as NID, PronID, NV or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.
The role of the first 3 tests, NMWE.1, NMWE.2 and NMWE.3 is to eliminate a candidate if it is a named entity (or a definite description).
The tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions. In practice, however, it turns out that specific tests are often not applicable to some NMWE classes, and more generic tests (e.g. LEX) are required. As a consequence, generic tests, appearing towards the end of the list, may end up being used quite frequently.
Decision tree for nominal MWE candidates
- Apply test NMWE.1 - [SPECIF-REF: Candidate refers to a specific entity?]
- Apply test NMWE.2 - [NAMING-CONV: Naming convention applies to the whole class?]
- Apply test NMWE.3 - [SEM-TYPE: Person, organization, location, product or event?]
- It is a proper name or a definite description, not an MWE, exit
- It is not a proper name, continue to test NMWE.4
- It is not a proper name, continue to test NMWE.4
- It is not a proper name, continue to test NMWE.4
- Apply test NMWE.4 - [DEVERBAL: Candidate derives from a VMWE?]
- It is an NV.VID, NV.LVC.full, etc., depending on the outcome of the VMWE tests, exit.
- Apply test NMWE.5 - [PRON: Candidate on the list of MWE pronouns?]
- It is a PronID, exit.
- Apply test NMWE.6 - [CRAN: Candidate contains a cranberry word?]
- It is an NID, exit.
- Apply test NMWE.7 - [IRREG-STRUCT: Irregular syntactic structure?]
- It is an NID, exit.
- Apply test NMWE.8 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
- It is an NID, exit.
- Apply test NMWE.9 - [MODIF: Modification of a component prohibited?]
- It is an NID, exit.
- Apply test NMWE.10 - [COORD: Coordination prohibited?]
- It is an NID, exit.
- Apply test NMWE.11 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
- It is an NID, exit.
- Apply test NMWE.12 - [HEAD: Semantic head is hypernym?]
- It is an NID, exit.
- Apply test NMWE.13 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
- It is an NID, exit.
- It is not an MWE, exit
Test NMWE.1 - [SPECIF-REF] - Specific reference
In the given context, does the candidate refer to one or more specific entities, rather than being used generically?
- It might be a proper name, go to test NMWE.2
- It is not a proper name, continue to test NMWE.4
Many Johns Smiths live in London → Johns Smiths refers to several specific persons
He used the cold weapon hidden under his coat → cold weapon refers to a specific weapon
The two cold weapons were found at the place of the crime → cold weapon refers to several specific weapons
The theory of relativity was proposed by Einstein → there is only one theory of relativity, so it must be single and specific
the UN Secretary-General visited Greece → at the moment of writing there is only one UN Secretary-General (so he/she must be single and specific)
Universal Dependencies is a collection of treebanks - Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
I ate a cold lunch - cold lunch refers to a specific meal
Le (café) Descartes the (café) Descartes → le (café) Descartes refers to a specific place
Il cachait une/l' arme blanche sous le manteau He was hiding a/the cold weapon under his coat → arme blanche has a(n) (in)definite specific reference
Le Secrétaire général de l'ONU est en visite officielle en Grèce The secretary general of the UN is in visit official in Greece The UN Secretary-General is officially visiting Greece → 'Secrétaire général' de l'ONU is specific at the moment of writing
Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here - Maje Kowalskie refers two two specific persons
Posłużył się białą bronią przyniesioną w torbie He used the white weapon brought in his bagHe used the cold weapon brought in his bag → biała broń refers to a specific weapon
W pobliżu znaleziono kilka białych broni Nearby several white weapons were found Nearby several cold weapons were found →białe bronie refers to several specific weapons
paradox Banacha i Tarskiego został opisany w 1924 roku the Banach-Tarski paradox was described in 1924 → there is only one Banach-Tarski paradox (so it must be single and specific)
Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at the moment of writing there is only one such secretary
Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays - Anonimowi Alkoholicy Anonymous Alcoholics refers to a single specific organization
Zjadłam zimny obiadI ate a cold lunch - zimny obiad refers to a specific meal
Cold weapons are prohibited on a plane → cold weapons is used generically, i.e. refers to the whole class
I avoid cold lunches - cold lunches is used generically, i.e. refers to all instances of the class
The UN Secretary-General is the chief administrative officer of the United Nations →UN Secretary-General is used generically, i.e. refers to the whole class
J'évite de porter une chemise blanche I avoid wearing a white shirt → chemise blanche does not refer to a specific occurrence
Białe bronie są zabronione na pokładzie White weapons are forbidden onboardCold weapons are forbidden onboard → białe bronie white weapons cold weapons is used generically, i.e. refers to the whole class
Nie lubię zimnych obiadów I don't like cold lunches - zimne obiady cold lunches refers to all instances of a class
Test NMWE.2 - [NAMING-CONV] - Concept naming convention
Does the naming convention between the candidate c and an entity e refer to all instances of a whole semantic class? In other words, can c refer to another entity e' based on the properties of e’, with no need of an extra naming convention?
- It is not a proper name, go to test NMWE.4
- It could be a proper name, continue to test NMWE.3. Note that the answer might be no in two cases:
- The is no other e' in the concept denoted by the candidate
- There could be another e' in the same class as e but the naming convention does not apply to it
The two cold weapons were found at the place of the crime → if another entity e' occurs which has the same properties as the ones in this sentence (it is a weapon that does not use explosives or fire), e' can be called cold weapon with no need of an extra naming convention
the UN Secretary-General visited Greece → at a different moment in time, there can be another person e' playing the same role, so she/he can be called UN Secretary-General with no need for an extra naming convention
I ate a cold lunch → if another entity e' occurs which has the same properties as the one in this sentence (it is a lunch which is cold), e' can be called cold lunch with no need of an extra naming convention
Le Secrétaire général de l'ONU a un mandat de 5 ans The UN Secretary-General has a five-year term →any e' may be designated by c with no extra conventions, as long as it occupies the function c
W pobliżu znaleziono kilka białych broni Nearby several white weapons were foundNearby several cold weapons were found → as above
Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization→ at a different moment in time, there can be another person e' playing the same role, so she/he can be called Sekretarz stanu w Ministerstwie Cyfryzacji the Secretary of State at the Ministry of Digitalization with no need for an extra naming convention
Zjadłam zimny obiadI ate a cold lunch → if another entity e' occurs which has the same properties (it is a lunch which is cold), e' can be called zimny obiadcold lunch with no need of an extra naming convention
Universal Dependencies is a collection of treebanks - there is no other e' which could be called Universal Dependencies refers to a single specific collection of treebanks and there is only one such collection
Anonimowi Alkoholicy spotykają się w czwartki Anonymous Alcoholics meet on Thursdays → there is no other e' which could be called Anonimowi Alkoholicy Anonymous Alcoholics
Many Johns Smiths live in London →- as above
Dwie Maje Kowalskie mają tu konta Two Majas Kowalska have accounts here →- as above
Test NMWE.3 - [SEM-TYPE] - Semantic type
Is the entity e referred to by the candidate c a PERSON, ORGANIZATION, LOCATION, HUMAN PRODUCT or EVENT?
- The candidate is a proper name or a definite description, not an MWE, exit.
- It is not a proper name, continue to test NMWE.4
Universal Dependencies → a treebank collection is a human product
Einstein's mother → definite description
Black Sea → location
l'Organisation des nations unies → an ORGANISATION
Charante-Maritime →a LOCATION
le Petit Robert →a HUMAN PRODUCT
la Nuit Blanche → an EVENT
Ξενοφῶν ἈθηναῖοςXenophōn Athēnaios Xenophon, the Athenian Xenophon.NOM.sg.m Athenian.NOM.sg.m
Hołd pruski 1525 Prussian Tribute 1525 → event
Morze Martwe Dead Sea → location
Zygmunt III Waza Sigismund III Vasa → person
Alzheimer's disease → a disease is not a human product nor an event
Test NMWE.4 - [DEVERBAL] - Deverbal NMWE
Does the candidate contain a deverbal noun and can the candidate be rephrased (in the given context) using a verbal expression which passes the VMWE tests?
- It is a deverbal nominal MWE (NV), with the corresponding VMWE subcategory, e.g. NV.VID, NV.LVC.full, etc.
- Continue to the next test
Elle est preneuse de notes pour sa camarade => Elle prend des notes pour sa camarade - prend des notes is an LVC.full, so preneuse de notes is an NV.LVC.full
La déclaration de guerre est autorisée par le Parlement The declaration of war is authorized by Parliament → déclarer la guerre à NP is a VID. déclaration de guerre (à NP) is an NV.VID, argument of the verb autoriser.
była to zabawa jego kosztem => bawili się jego kosztem - -bawili się jego kosztem is a VID, so zabawa jego kosztem is an NV.VID
rzut oka na text => rzuciłam okiem na tekst - rzuciłam okiem is a VID, so rzut oka is an NV.VID
był działaczem ruchu robotniczegohe was an activist in a workers' movement => działal w ruchu robotniczymhe acted in a workers' movement is not a VMWE, so działacz ruchu robotniczegoactivist in a workers' movement is not an NV
Test NMWE.5 - [PRON] - Pronoun
Does the candidate occur on the closed list of MWE pronouns or should the list be extended with this candidate? Such lists need to be established for each language separately. Care should be taken about distinguishing PronIDs from DetIDs.
- It is a pronominal idiom (PronID)
- Continue to the next test
I expect no one to come
we love each other
Je n'ai vu qui que ce soit I not have seen whoever it be.SUBJV.3.SG I didn't see anyone (ProID) → 'qui que ce soit' is a pronominal idiom.
there is no one right way to tell the story - no one is not a pronoun here but two determiners
to samo się rozwiąże this alone itself will solve this will solve itself - to samo is not a complex pronoun but a simple pronoun to and an adjective samo
Test NMWE.6 - [CRAN] - Cranberry word
Does the candidate expression contain a cranberry word?
- it is a nominal idiom (NID)
- Continue to the next test
status quo → foreign words like 'status' and 'quo' are considered cranberry words
kith and kinfreinds and relations → 'kith' is not a standalone word
helter-skelter tall tower at a fun-fair → 'helter' and 'skelter' do not exist alone outside this expression
riff-raff ill-behaved people → 'raff' does not exist alone outside this expression
cha-cha(-cha) ballroom dance performed with small steps and swaying hip movements → 'cha' does not exist standalone
méli-mélo confused mixture → 'méli' and 'mélo' are not stand-alone words
frou-frou rustling → 'frou' does not exist outside of this compound
loup-garou werewolf → 'garou' is not a stand-alone word
pont-levis drawbridge → 'levis' is not a stand-alone word
cha-cha-cha ballroom dance performed with small steps and swaying hip movements → 'cha' is not a stand-alone word
bric-à-brac bric-à-brac → 'brac' is not a standalone word (cf. de bric et de broc (AdvID))
Test NMWE.7 - [IRREG-STRUCT] - Irregular syntactic structure
Does the candidate have an internal syntactic structure which is irregular for its distribution, i.e. it does not have a structure of a nominal.
- It is a nominal idiom (NID)
- Continue to the next test
double-bind dilemma → Adj-V
a hold-up → V-Adv
fast day → V-N
love-hate relationship → V-V N
round about → V-Adv
(un) porte-manteau support-coat coat-rack → V-N
monte-charge raise load goods lift → V-N
(un) franc-parler frank-talk frankness → Adj-V
(un) à-coup at-strike juddering → Preposition-N
Test NMWE.8 - [MORPH] - Morphological inflexibility
Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
(the) grass roots → #grass root
She invested in real estate → She invested in *real estates
(des) vacances d'hiver vacations of winter winter vacation → *vacance d'hiver
(une) respiration mécaniquement assistée respiration mechanically assisted mechanical ventilation → *respirations mecaniquement assistées
Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.
Test NMWE.9 - [MODIF] - Prohibited modification
Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
(a) state-of-the-art → #mental state-of-the-art, #state-of-the-fine-art
starting blocks → #starting to run blocks
rowing machine → *rowing slowly machine
runner bean → #slow runner bean
(un) livre d'or book of gold guestbook → *un livre de mon frère d'or, *un livre de cet or
(une) table ronde table round round-table discussion → #une table très ronde
(une) lettre recommandée letter recommended registered letter → #une lettre recommandée par mon voisin
Test NMWE.10 - [COORD] - Prohibited coordination
Does coordination of the candidate with another candidate of the same head lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
foul line → *foul and side lines
a can of worms → *a can of worms and tuna
un esprit critique spirit critical critical mind → #un esprit critique et frappeur
un pot à épices jar of spices spice jar → *pot à épices et à lait
pot à eau jug at water water jug → pot à eau et à lait
Test NMWE.11 - [SYNT] - Syntactic inflexibility
Does another regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?
- It is a nominal idiom (NID)
- Continue to the next test
a dog’s breakfast a mess → #breakfast of dog
hard shoulder emergency lane → #a shoulder that is hard
les sciences naturelles natural sciences → #les sciences qui sont naturelles
Test NMWE.12 - [HEAD] - Semantic head
Is the semantic head h of the candidate c its hypernym, which can be reformulated by "is c a type of h"? Note that sometimes the syntactic and semantic heads do not coincide.
- It is a nominal idiom (NID))
- Continue to the next test
red herring → It is not a type of sea fish, but it suggests an idea of a misleading clue
a square peg (in a round hole) someone who does not fit in → It is not a peg but a person
a bunch of flowers→ these are flowers (here the semantic head 'flowers' is different from the syntactic head 'bunch')
un nuage de lait cloud of milk a dash of milk → It does not refer to a type of cloud but to a small quantity (of milk)
moulin à paroles mill at words blabbermouth → It does not refer to a type of mill but to a person
Test NMWE.13 - [LEX] - Lexical inflexibility
Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning??
- It is a nominal idiom (NID)
- It is
not an MWE, exit
chain reaction → #chain change(s)
deep water → #profound water
vicious circle → vicious cycle but #vicious sphere/round/ring...
vanity case → vanity box but #arrogance/narcissism/self-admiration box/case
boarding pass → boarding card but #bording ticket/voucher/document/...
tête de lard head of lard stubborn → *tête de graisse, *chef de lard
peine perdue effort lost fruitless effort → *peine égarée
mauvaise/méchante langue bad mouth → #bonne/gentille langue
personal/professional... judgement
deep anxiety/love/conversation...
mauvaise odeur/habitude/surprise... bad smell/habit/surprise
méchant garçon/professeur/marchand... mean boy/teacher/merchant