Annotation guidelines (version 2.0; UNDER CONSTRUCTION)
Used by the PARSEME corpora annotated for multiword expressions


Tests for functional MWEs (FuncMWEs)

If the DIST test has allowed us to decide that the MWE candidate has a distribution of a function word (determiner, adposition, conjunction or interjection) the status of this candidate (as an DetID, AdpID, ConjID, IntID or non-MWE) is to be checked by the decision diagram below. This diagram has a unique entry point and the tests should be applied in the defined order. Each test is clickable and explained with examples in the sections below.

Like for nominal, adjectival and adverbial MWEs, the tests below are ordered from more specific ones to more generic ones. Specific tests are those that can be more clearly formulated and answered. Hence, they have priority over subsequent tests that rely on less formalised notions.

Decision tree for functional MWE candidates

In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a FuncMWE.
  • Apply test FuncMWE.1 - [CRAN: Candidate contains a cranberry word?]
    • It is a DetID, AdpID, ConjID or IntID, exit.
    • Apply test FuncMWE.2 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
      • It is a DetID, AdpID, ConjID or IntID, exit.
      • Apply test FuncMWE.3 - [IRREG-STRUCT: Irregular syntactic structure?]
        • It is a DetID, AdpID, ConjID or IntID, exit.
        • Apply test FuncMWE.4 - [MODIF: Modification of a component prohibited?]
          • It is a DetID, AdpID, ConjID or IntID, exit.
          • Apply test FuncMWE.5 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
            • It is a DetID, AdpID, ConjID or IntID, exit.
            • It is not a MWE, exit

Test FuncMWE.1 - [CRAN] - Cranberry word

Does the candidate expression contain a cranberry word?

  • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
    • in lieu of vacation (AdpID) instead of vacation - 'lieu' is not a standalone word in English
      by dint of repetition (AdpID) through repetition - 'dint' is not a standalone word in English
      on behalf of everyone (AdpID) instead of - 'behalf' is not a standalone word in English
    • parce que (ConjID) because - 'parce' is not a standalone word in French
      à l'instar de ces héros (AdpID) at the equivalent of as these heroes - 'instar' is not a standalone word in French
      la plupart de ces héros (DetID) the greater.part of most of these heroes - 'plupart' is not a standalone word in French
    • ととも(共)に COM.together.DAT with (AdpID) → 'とも(共)' is not a free word
  • Further tests are required
    • in front of - all components are standalone words
      in the end of - all components are standalone word
    • au lieu de in place of instead of - all components are standalone words
      dans un supermarché in a supermarket - all components are standalone words

Test FuncMWE.2 - [MORPH] - Morphological inflexibility

Does the candidate contain a content word (noun, verb, adjective or adverb), and does a morphological change of this word that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

  • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
    • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → The noun ꜥw (arm) is only used in the singular in this compound preposition.
    • in place of vacation (AdpID) #in places of vacation
      Big deal! (IntID) #Big deals!
      a great deal of experience (DetID) #deals of
    • à la place de Luc (AdpID) at the place of Luc instead of Luc #aux places de Luc
      du fait de la crise sanitaire (AdpID) of the fact of the crisis sanitarydue to the public health crisis #des faits de la crise sanitaire
    • καθ’ ὅτιkath’ hoti in that according.to that
    • w imię przyjaźni in name of friendship in the name of a friendship - #w imiona przyjaźni in the names of friendships
  • Further tests are required
    • on the ground that (ConjID) for reasons based on the fact that ground may be plural: on the grounds that
      after the meeting/meetings→compositional expressions
    • au côté de Luc (AdpID) at.the side of Luc on/at the side of Luc côté may be plural: combattre aux côtés des Alliésfight alongside the allies
    • ὁ δέho de this the.NOM.sg.m PRT

Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, etc. - depending on the target language's morphology.

Test FuncMWE.3 - [IRREG-STRUCT] - Irregular syntactic structure

Does the candidate have an irregular internal syntactic structure, i.e. the language's regular grammar rules do not allow a phrase with this structure?

  • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
    • (OEG) 𓈖 𓈖𓏏𓏏 n-n.tt “for (n) (the fact) that (n.tt) because (PT 716e, T) → A preposition, such as n, followed by the conjunction n.tt is an idiosyncratic feature in Egyptian.
    • in that (ConjID) that is a conjunction preceded by a preposition
      good gracious (IntjID) → Adj + Adj with no N head
      mercy me! (IntjID) → N + Pronoun with omitted verb and agent
    • bien que well that although (ConjID) que is a conjunction preceded by an adverb
      Ça alors! that well My! (IntjID) → Pronoun followed by an adverb
      peu de gens little of people few people (DetID) → Adv + Preposition
    • εἰ δὲ μήei de mē if not if PRT not
  • Further tests are required

Test FuncMWE.4 - [MODIF] - Prohibited modification

Does one of the lexicalized components of the candidate prohibit a modification (by adjectives, relative clauses, adverbs, determiners, PPs, etc.) which would be considered grammatical in a regular construction of the same syntactic structure? In other words, can you think of such a modification which would normally be allowed but which here leads to ungrammaticality or to an unexpected change in meaning?

  • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
    • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → No modification is attested in m-ꜥw, e.g. * m-ꜥw pn "in this arm".
    • in addition to (AdpID) #in great addition
      spoons as well as knives (ConjID) → spoons *as well and good as knives
      a little salt (DetID) #a little but strong salt vs. a little but strong person
    • en guise de conclusion in guise of conclusion as a/in conclusion (AdpID) *en juste guise de conclusion
      en sorte que cela se calme in sort that it calms so that it calms (ConjID) *en bonne sorte que cela se calme
      des tas de choses Det.ind.pl lots of things lots of things(DetID) #des tas très hauts de choses vs. des tas énormes de blé
    • οὐ μὲν γάρou men gar for not
    • w imię przyjaźni in name of friendship in the name of our friendship - #w pierwsze/piękne/ważne imię przyjaźni
      jak to? how this? howcome? - #jak samo to?
  • Further tests are required

Test FuncMWE.5 - [LEX] - Lexical inflexibility

Does the candidate contain a content word (noun, verb, adjective or adverb), and does a regular replacement of this components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

  • It is a determiner, adposition, conjunction or interjection idiom (DetID, AdpID, ConjID or IntID), depending on its distribution.
    • (OEG) 𓅓 𓂝 𓋴𓏏𓈙 m-ꜥw Śtẖ in (m) the arm (ꜥw) of Seth Śtẖ from Seth (PT 65b, N) → If ꜥw “arm” is replaced with ꜥb “association”, it results m-ꜥb “in uniting” i.e. “together with” “in the company of”, e.g. m-ꜥb nčr(.w) (PT 736c, T) “in uniting the gods” i.e. “in the company of the gods”.
    • in the view of the evidence (AdpID) #in the perspective/perception of the evidence
      in consequence of the sentence (AdpID) #in result of the sentence
      as long as you finish your homework (ConjID) *as short/large as you finish your homework
      Give me a little money (DetID) → *Give me a small money
    • Je viens de la part de votre voisin (AdpID) of the part of I come on behalf of your neighbor → *Je viens du nom/de la direction de votre voisin
      Repas préparé par les soins de Madame X (AdpID) by cares of Meal prepared by Mme X → *Repas préparé par l'attention/la prévenance/la sollicitude de Mme X
      Il n'est pas venu sous prétexte qu' il était malade (ConjID) under pretext that He didn't come on the pretext that he was ill → *Il n'est pas venu sous excuse qu' il était malade
    • καὶ δὴ καὶkai dē kai as well as and PRT and
    • mimo że nie wiedziałam although that I didn't knew although I didn't knew (ConjID) - *wbrew że nie wiedziałam
      jak też pretensje as also reproaches and reproaches (ConjID) - *jak oraz pretensje
      coś tam jeszcze something there more something more (PronID) - #coś tu jeszcze
      wpół do piątej at.half to five half past four (AdpID) - #wpół po piątej
  • It is not a MWE, exit

An error has occured !