Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.1 (2018)


Inherently reflexive verbs (IRV)

Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.

Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.

Namely, we will only annotate a REFLV as an inherently reflexive verb (IRV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verbs constitute a quasi-universal category.

IReflVs are a difficult category to annotate due to various problematic cases. Note in particular that in some languages, e.g. Slavic, the reflexive clitics inflect and should be considered not only in their most frequent case, i.e. accusative.

We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IRV.

  • Inherently reflexive ⇒ ANNOTATE as IRV
    • The verb without the RCLI does not exist
      • усмихвам се to smile, страхувам се to be afraid
      • stydět se to be ashamed, divit se to wonder
      • sich schämen to be ashamed, sich wundern to wonder
      • suicidarse to suicide, abstenerse to abstain
      • n.a.
      • s'évanouir to faint, se suicider to suicide
      • suicidarsi to suicide, arrabbiarsi to get angry
      • dowiedzieć się to find out, bać się to be afraid
      • queixar-se to complain, abster-se to abstain
      • a se teme to be afraid with obligatory ACC reflexive clitic
        a își însuși to appropriate with obligatory DAT reflexive clitic
      • sramovati se to be ashamed, bati se to be afraid
      • att försova sig to sleep in
        att gifta sig to get married
    • The verb without the RCLI does exist, but has a very different meaning
      • смея ≠ смея се to dare ≠ to smile, намирам ≠ намирам се to find ≠ to be situated
      • sich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle
      • to find oneself in a difficult situation
        to to help oneself to the cookies
      • recoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insist
      • n.a.
      • s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to act
      • riferire ≠ riferirsi to report, tell ≠ to refer
      • znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manage
      • encontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refer
      • a se îndura ≠ a îndura to have the heart ≠ to suffer
        a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IRV (it passes Test15)
      • dati se it is possible (to do something) ≠ dati to give, dobiti se to meet ≠ dobiti to get
      • att känna sig ledsen/arg to feel sad/angry ≠ to touch
  • Reciprocal ⇒ NOT ANNOTATED
    • The RCLI has a sense of mutually:
      • целувам се to kiss each other, срещам се to meet each other
      • líbat se to kiss each other, potkávat se to meet each other
      • sich küssen to kiss each other, sich treffen to meet each other
      • besarse to kiss each other, verse to see each other
      • n.a.
      • s'embrasser to kiss each other, se rencontrer to meet each other
      • baciarsi to kiss each other
      • całować się to kiss each other, spotykać się to meet each other
      • cumprimentar-se to greet each other, ver-se to see each other
      • a se saluta to greet each other
      • poljubljati se to kiss each other, srečati se to meet each other
  • Reflexive ⇒ NOT ANNOTATED
    • The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
      • мия се to wash oneself, реша се to combe oneself
      • mýt se to wash oneself, drbat se to scratch oneself
      • sich waschen to wash oneself, sich kratzen to scratch oneself
      • mirarse to look at oneself, vestirse to dress oneself
      • n.a.
      • se laver to wash oneself, se parler to talk to oneself
      • lavarsi to wash oneself, vestirsi to dress oneself
      • myć się to wash oneself, drapać się po głowie to scratch oneself on the head
      • apressar-se to hurry oneself, vestir-se to dress oneself
      • a se spăla to wash oneself
      • umivati se to wash oneself, praskati se to scratch oneself
      • att tvätta sig to wash oneself
  • Body part, also called possessive reflexive ⇒ NOT ANNOTATED
    • Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
      • мия си ръцете wash REFL.POSSESSIVE hands wash one's hands
      • mýt si nohy wash RCLI.DAT the feet wash one's feet
      • sich das Bein brechen RCLI the leg break break one's leg
      • rascarse el brazo scratch.RCLI the arm scratch one's arm
      • n.a.
      • se gratter la tête RCLI scratch the head scratch one's head
      • grattarsi la testa RCLI scratch the head scratch one's head
      • myć sobie nogi wash RCLI.DAT the feet wash one's feet
      • impossible, uses possessive instead
      • a-şi rupe mâna RCLI.DAT break arm break one's arm
      • umivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's arm
  • Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
    • The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
    • Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
      • книги се пишат трудно books write.PL RCLI difficult it is difficult to write books
      • die Häuser verkaufen sich gut the houses sell RCLI well the houses sell well
      • las casas se venden bien the houses RCLI sell well the houses sell well
      • n.a.
      • les pots se vendent bien the pots RCLI sell well the pots sell well
      • le case si affittano the houses RCLI rent the houses are rented
      • domy dobrze się sprzedają houses sell.PL RCLI well houses sell well
      • as casas se vendem bem the houses RCLI sell well the houses sell well
      • casele se vând bine houses-the RCLI sell well houses sell well
      • hiše se dobro prodajajo the houses sell RCLI well the houses sell well
  • Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
    • In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
      • трудно се пишат книги difficult RCLI write.PL books it is difficult to write books
      • se alquilan casas RCLI rent houses people rent houses
      • n.a.
      • si affittano case RCLI rent houses people rent houses
      • dobrze sprzedają się te domy well sell RCLI these houses these houses sell well Polish is a relatively free word-order language and a postverbal subject is a regular (even if stylistically marked) alternation.
      • alugam-se casas rent-RCLI houses people rent houses
      • se vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
        se construiesc locuințe noi RCLI built houses new new houses are built
      • nove hiše se gradijo new houses RCLI built new houses are built
  • Impersonal ⇒ NOT ANNOTATED
    • The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
    • There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
    • The verb is in third person singular, even when the object is plural
      • не се вечеря късно not RCLI have dinner late it is not good to have dinner late
      • hier tanzt es sich gut here dances it RCLI well people dance well here
      • se busca a actores RCLI searches to actors people look for actors
        se trabaja mejor aquí RCLI works better here people work better here
      • n.a.
      • il se dit des bêtises it RCLI says silly things people say silly things
      • si lavora troppo RCLI works too much people work too much
        si affitta molte case RCLI rents many houses people rent many houses
      • za dużo się pracuje too much RCLI works people work too much
        bzdury się opowiada nonsense RCLI tells people tell nonsense
      • dorme-se muito sleeps-RCLI much people sleep a lot
        conta-se histórias tells-RCLI stories people tell stories Transitive impersonal is considered wrong by traditional grammar but it is found in corpora.
      • se lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
        se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
        se aleargă dimineața RCLI run in the morning people run in the morning
      • govori se/govorijo se neumnosti it says/they say RCLI silly things people say silly things
  • Inchoative ⇒ NOT ANNOTATED
    • Similar to middle, but the RCLI marks a less productive syntactic alternation:
      • the direct object of the transitive version appears as subject of the REFLV
      • the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
        • вратата се отваря the door opens
        • dveře se otvírají the door opens
        • die Tür öffnet sich the door opens
        • la puerta se abrió the door opened
        • n.a.
        • la porte s'est subitement ouverte the door suddenly opened
        • la porta si apre the door opens
        • drzwi się otwierają the door opens
        • o vaso se quebrou the vase broke
        • mașina s-a stricat the car broke down
          ușa s-a deschis the door opened
        • vrata se odpirajo the door opens
        • dörren öppnar sig the door opens

IRV-specific decision tree

  • Apply test IRV.1 - [INHERENT]
    • Annotate as IRV
    • Apply test IRV.2 - [DIFF-SENSE]
      • Annotate as IRV
      • Apply test IRV.3 - [DIFF-SUBCAT]
        • Annotate as IRV
          • verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
            • It is not a VMWE, exit
            • Annotate as IRV
          • verb has a subject ⇒ Apply test IRV.5 - [MIDDLE-INCHO]
            • It is not a VMWE, exit
            • Apply test IRV.6 - [REFL]
              • It is not a VMWE, exit
                • subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
                  • It is not a VMWE, exit
                  • Annotate as IRV
                • subject is PLURAL ⇒ Apply test IRV.8 - [RECIPRO]
                  • It is not a VMWE, exit
                  • Annotate as IRV

Test IRV.1 (prev. 14) - [INHERENT] Inherent clitic

Does the verb only exist with the RCLI and never occurs without it?

  • annotate as IRV
    • страхувам се ⇒ *страхувам to be afraid
      усмихвам се ⇒ *усмихвам to smile
    • sich schämen ⇒ *schämen to be ashamed
      sich wundern ⇒ *wundern to wonder
    • suicidarse ⇒ *suicidar to suicide
      abstenerse ⇒ *abstener to abstain
    • n.a.
    • s'évanouir ⇒ *évanouir to faint
      se suicider ⇒ *suicider to suicide
    • suicidarsi ⇒ *suicidare to suicide
    • dowiedzieć się ⇒ *dowiedzieć to find out
      bać się ⇒ *bać to be afraid
      wydarzyć się ⇒ *wydarzyć to happen
    • queixar-se ⇒ *queixar to complain
      abster-se ⇒ *abster to abstain
    • a se teme ⇒ *a teme to be afraid
      a își însuși ⇒ *a însuși to appropriate
    • sramovati se ⇒ *sramovati to be ashamed
      čuditi se ⇒ *čuditi to wonder
  • next test

Test IRV.2 (prev. 15) - [DIFF-SENSE] - Different sense

Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?

  • annotate as IRV
    • намирам се ≠ намирам to be situated ≠ to find
      радвам се≠ радвам to feel happy ≠ to make happy
    • sich verstehen ≠ verstehen to get along well ≠ to understand
    • to find oneself in a difficult situation
      to to help oneself to the cookies
    • recogerse ≠ recoger to go home ≠ to pick up, to gather
    • n.a.
    • s'apercevoir ≠ apercevoir to realize ≠ to see
      s'agir ≠ agir to be ≠ to act
    • riferirsi ≠ riferire to refer ≠ to report, to tell
    • znajdować się ≠ znajdować to find oneself ≠ to be
      sprawdzić się≠ sprawdzić to prove appropriate ≠ to check
      wybrać się≠ wybrać to go ≠ to choose
    • encontrar-se ≠ encontrar to be ≠ to meet
      referir-se ≠ referir to concern ≠ to refer
    • a se îndura ≠ a îndura to have the heart to ≠ to suffer
    • razumeti se ≠ razumeti to get along well ≠ to understand
  • next test

Test IRV.3 (prev. 16) - [DIFF-SUBCAT] - Different subcategorization frame

Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same syntactic argument as the RCLI in the REFLV version?

  • annotate as IRV
    • X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses Y
    • X se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Y
    • n.a.
    • X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
      X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
      X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to Vinf
    • X si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot Y
    • X tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
      X dziwi się Y.dat ⇔ Y dziwi X ⇔ Z dziwi X Y.inst X surprises SELF Y.dat ⇔ Y surprises X ⇔ Z surprises X Z.inst
    • X se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot Y
    • X se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that Y
  • next test

Test IRV.4 (prev. 17) - [IMPERS] - Impersonal

When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?

  • do NOT annotate as verbal MWE
    • не се вечеря късно ⇔ хората не вечерят късно not RCLI have dinner late it is not good to have dinner late
    • hier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well here
    • se duerme mucho ⇔ las personas duermen mucho people sleep a lot
      se busca a actores ⇔ la gente busca a actores people look for actors
    • n.a.
    • il se dit des bêtises ⇔ les personnes disent des bêtises people say silly things
    • si dorme molto ⇔ le persone dormono molto people sleep a lot
      si affitta molte case ⇔ le persone affittano molte case people rent many houses
    • pracuje się za dużo ⇔ ludzie pracują za dużo people work too much
      opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsense
    • dorme-se muito ⇔ as pessoas dormem muito people sleep a lot
      conta-se histórias ⇔ as pessoas contam histórias people tell stories
    • se lucrează până târziu ⇔ lumea lucrează până târziu people work until late
      se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morning
    • govorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsense
  • annotate as IRV

Test IRV.5 (prev. 18) - [MIDDLE-INCHO] - Middle or Inchoative

When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?

  • do NOT annotate as verbal MWE
    • някой отваря вратата ⇒ вратата се отваря somebody opens the door ⇒ the door opens
    • man kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
      jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opens
    • la gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
      alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door opened
    • n.a.
    • on vend bien ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
      quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opens
    • qualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
      qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opens
    • ktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
      ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
      ktoś nasila skargi ⇒ skargi nasilają się somebody increases complaints ⇒ complaints increase
      ktoś rozgrywa mecz ⇒ mecz rozgrywa się somebody plays a game ⇒ the game plays
    • alguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ tell.PL-RCLI stories somebody tells stories ⇒ stories are told
      alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy ⇒ the boy RCLI calmedsomebody calmed the boy down ⇒ the boy calmed down
      o juiz casou João com Maria ⇒ João se casou com Maria the judge married João with Maria ⇒ João RCLI married with Maria the judge married João with Maria ⇒ João got married to Maria
      o juiz casou Maria e João ⇒ Maria e João se casaram the judge married Maria and João ⇒ Maria and João RCLI married the judge married Maria and João ⇒ Maria and João got married
      alguém lembrou João do meu aniversário ⇒ João se lembrou do meu aniversário somebody reminded João of my birthday ⇒ João RCLI reminded of my birthday somebody reminded João of my birthday ⇒ João remembered my birthday
    • cineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
      cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door opened
    • nekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
      nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door opened
  • next test

Test IRV.6 (prev. 19) - [REFL] - Reflexive

When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?

  • do NOT annotate as verbal MWE
    • Павел лекува себе си ⇒ Павел се лекува Pavel heals himself
    • Paul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himself
    • Paul washes only himself ⇒ Paul washes himself
    • Pablo se lava a sí mismo ⇒ Pablo se lava Paul washes himself
    • n.a.
    • Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
      Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himself
    • Paolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
      Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himself
    • Paweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
      Paweł bogacie tylko siebie ⇒ Paweł bogaci się Paul enriches himself Paul gets rich
    • Paulo só lava a si mesmo ⇒ Paulo se lava Paul washes himself
    • Paul se spală doar pe sine ⇒ Paul se spală. Paul washes himself
    • Pavel praska sam sebe ⇒ Pavel se praska Paul scratches himself
  • next test

Test IRV.7 (prev. 20) - [REFL-MUTUAL] - Reflexive-mutual

Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning?

  • do NOT annotate as verbal MWE The test applies only if test 15 has failed. For example, for "X se marie" 'X gets married' in French, it is odd though possible to say 'X and Y marry each other', but this does not mean 'X gets married', because it is only possible if X and Y are marriage officiants
    • Павел се мие ⇔ те се мият един друг they wash each other
    • Paul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each other
    • Pablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each other
    • n.a.
    • Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each other
    • Paolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each other
    • Paweł się myje ⇔ oni myją się nawzajem they wash each other
    • Paulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each other
    • el se spală ⇔ ei se spală unul pe altul they wash each other
    • Pavel se umiva ⇔ umivajo drug drugega they wash each other
  • annotate as IRV

Test IRV.8 (prev. 21) - [RECIPRO] - Reciprocal

Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning? That is:

  • Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
  • Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
  • do NOT annotate as verbal MWE
    • Павел и Елена се целуват ⇔ Павел целува Елена и Елена целува Павел Pavel and Elena kiss
    • Paul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
      die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each other
    • Pablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
      los niños se abrazan ⇔ los niños abrazan a los niños the children hug each other
    • n.a.
    • Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
      les jours se suivent ⇔ les jours suivent les jours the days follow each other
    • Giovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
      i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altro
    • Paweł i Elena się całują ⇔ Paweł całuje Elenę and Elenę całuje Paweł Paweł and Elena kiss
    • João e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
      os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each other
    • Ion şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
      participanții se salută ⇔ participanții îi salută pe participanți the participants greet each other
    • Pavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each other
  • annotate as IRV

Problematic cases and remarks

Polysemy

Keep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IRV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.

Clitics position and concatenation

In some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.

  • In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
    • s'abstenir to abstain
  • In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
    • enamorarse to fall in love
    • alzarsi to get up
  • In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
    • queixar-se-ia would complain
  • In Romanian the clitic and the verb are either separate or have a hyphen between them:
    • se aude un clopot RCLI hears a bell a bell is heard
      s-aude un clopot RCLI-hears a bell a bell is heard

The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.

Overlap VID - IRV

Some idiomatic constructions include reflexive clitics. Two cases are possible:

  • If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the VID:
    • пилците се броят наесен chicken REFL are counted in the autumn the true results can be seen only at the endкокошките се броят the hens REFL counted
    • sich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun N
    • darse cuenta de to realize ⇒ *darse N de for any noun N
      meterse en líos to get in troubleREFLV not annotated in literal equivalents like meterse en una tienda to get in a store
    • n.a.
    • se rendre compte de to realize ⇒ *se rendre N de for any noun N
      s'arracher les cheveux RCLI tear the hair worryREFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nail
    • rendersi conto di to realize ⇒ *si rende N di for any noun N
      si strappa i capelli RCLI tear the hair to worryREFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nail
    • zdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun N
    • dar-se mal to faildar-se ADV intransitive is acceptable only for antonym bem well
      meter-se numa fria to get-RCLI in a cold to get in troubleREFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabin
    • a-și smulge părul din cap
    • puliti si lase tear RCLI the hair to worryREFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrows
  • If the REFLV would be annotated as IRV in syntactically comparable literal constructions, annotate both the IRV and the VID as embedded MWEs (rare):
    • смея се през сълзи laugh REFL through tears to laugh bitterly
    • n.a.
    • rozlatywać się w proch scatter itself into dust disappear
    • virar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/become
    • a i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IRV; both the IRV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of a VID (6.4_R)
      a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to go
    • režati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IRV
Overlap LVC - IRV

It is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IRV:

  • Fragen stellen to ask questionssich Fragen stellen to doubt/hesitate
  • hacer preguntas to ask questionshacerse preguntas to doubt/hesitate
  • n.a.
  • poser des questions to ask questionsse poser des questions to doubt/hesitate
  • no examples found for RO

In this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as VID, since there are two syntactic arguments:

  • sich Fragen stellen to doubt/hesitate
  • hacerse preguntas to doubt/hesitate
  • n.a.
  • se poser des questions
  • no examples found for RO

Notice that annotating only the verb and the RCLI as IRV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IRV:

  • sich stellen to surrender
  • hacerse get used to
  • n.a.
  • se poser to sit/lay down
Dative clitics and double clitics

In some languages, e.g. Polish, clitics inflect for case. Most cases of IRV seem to be restricted to the accusative case:

  • страхувам се to be afraid
  • bát se to be afraid
  • n.a.
  • bać się to be afraid
  • a se sinchisito RCLI.ACC care to care
    a se sfiito RCLI.ACC be.shy to be shy
    a se căito RCLI.ACC repent to repent
  • bati se to be afraid

However, other cases can appear in IRV:

  • отивам си to go oneself.DAT to go away
  • poradit si to advise oneself.DAT to manage
  • n.a.
  • radzić sobie to advise oneself.DAT to manage
  • a-și însuși to-RCLI.DAT appropriateto appropriate - with a Dative clitic
    a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative clitic
  • drzniti si to dare oneself.DAT to dare

Some expressions can have double clitics. Only the first two words belong to the IRV:

  • надсмивам се над себе си to laugh RCLI.acc at RCLI.DAT to laugh at myself
  • n.a.
  • przyglądać się sobie to observe RCLI.acc RCLI.DAT to observe each other
    radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneself
  • n.a.
  • nasmehniti se sebi to smile at oneself
Non-reflexive clitics

This category does not cover other types of pronouns and clitics. They are covered by regular VID tests and should be annotated as such. Examples of constructions that should be annotated as VID rather than IRV include:

  • es gibt it gives there is
  • n.a.
  • l'emporter to take it away to win
    s'en aller to self from-it go to leave
    en avoir marre to have from-it enough to be fed up
    il y avoir it at-it haveto exist
  • prender-ci to take to-it to make the right choice
    prender-le to take it to be beaten
  • dá-lhe João! give to-him/her, João! show them what you got, João!
  • a-i arde to CL.DAT burn to have a desire
    a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
    a-i repugnato CL.DAT loathe to loathe
    a-i priito CL.DATto be favourable to sb.
  • ucvreti jo to escape her to escape something/someone by running