Annotation guidelines
corpora annotated for multiword expressions
Adding new examples in your language
It is often useful to have examples of a phenomenon shown in your own language. Examples in the guidelines are presented as in the template below:
Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. Languages use color codes according to their language groups. See the section on notation for more information.
In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.
Editing or adding examples
The shared examples edition spreadsheet used in previous versions of the guidelines is not used any more, all modifications are done on online and are visible immediately. To edit or add examples to the guidelines, you need to create an account on the guidelines 1.3 examples edition platform. You also have to ask Carlos Ramisch or Agata Savary to grant you the edition rights for your language.
Once you are logged in, you will see some buttons close to each example.
- The 'copy' button copies the source of the example, and is useful if you want to copy the example of another language and then translate it.
- The 'source' button is always available for languages you have the right to edit, and allows you to edit the example's XML-like source code, as described below.
- The 'edit' button is only shown for examples that follow the formatting rules, and allows you to edit the example using a user-friendly interface.
Instructions to create well formatted examples (or correct the ill-formatted ones in 'source') are available in the example edition instructions.
When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language (use the 'copy' button), and then adapting it to your language. You can then paste the example in your language's 'source' mode. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example, you should always check if it is relevant in the context.
If there is something wrong or suspicious with your example, the interface will show an error or warning message.
If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the example empty or add a n.a. comment.
Examples with tags
Let us analyse the English example below, shown in 'source' mode:
MWEs with <lex>their lexicalized components</lex> in English are indicated like this.
As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex>
and </lex>
. When writing an example, you will often have to use XML tags. We describe below the most important ones.
Bold: you should surround lexicalized components with the tags <lex>
and </lex>
. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>
. This code is presented as follows:
- He will take a shower
Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe>
and </nmwe>
tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe>
yields the following:
- This is not an MWE
Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u>
and </u>
. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe>
yields the following:
- This is not an MWE
Latin-script transcription:
You can optionally provide latin-script transcription if your language does not use latin characters.
Latin-script transcriptions must be surrounded by the tags <latin>
and </latin>
.
For example, the code الدرس <latin>ad-dars</latin>
generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.
- الدرس ad-dars
Gloss icon:
You should also provide English glosses and translation for your examples.
Glosses and translations should always be provided in English, and never in another language.
Glosses must be surrounded by the tags <gl>
and </gl>
.
Translations must be surrounded by <trans>
and </trans>
.
English examples can also use the tag <trans>
to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans>
generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>
.
- défendre son bifteck defend one's beefsteak to defend one's interests
Comments:
Some examples are presented followed by an explanation or comment, in normal font (black color). This is done by using the tags <n>
and </n>
. For example, the code some words <n>→ further details</n>
generates this:
- some words → further details
Newline:
Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they can be presented on separate lines using the tag <br/>
. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). This tag will be treated by the 'edit' interface to break examples that can be edited separately. For example, the code example 1 <br/> example 2 <br/> example 3
will be rendered as follows:
- example 1
example 2
example 3
Inside normal text, you may also use tags such as <i>
(italics), <strong>
(bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.