PARSEME Shared Task 1.3 - Annotation guidelines

Annotation guidelines
corpora annotated for multiword expressions

Adding new examples in your language

It is often useful to have examples of a phenomenon shown in your own language. Examples in the guidelines are presented as in the template below:

MWEs with their lexicalized components in Arabic are indicated like this.

MWEs with their lexicalized components in Bulgarian are indicated like this.

MWEs with their lexicalized components in Czech are indicated like this.

MWEs with their lexicalized components in German are indicated like this.

MWEs with their lexicalized components in Greek are indicated like this.

MWEs with their lexicalized components in English are indicated like this.

MWEs with their lexicalized components in Spanish are indicated like this.

MWEs with their lexicalized components in Basque are indicated like this.

MWEs with their lexicalized components in Farsi are indicated like this.

MWEs with their lexicalized components in French are indicated like this.

MWEs with their lexicalized components in Irish are indicated like this.

MWEs with their lexicalized components in Hebrew are indicated like this.

MWEs with their lexicalized components in Hindi are indicated like this.

MWEs with their lexicalized components in Croatian are indicated like this.

MWEs with their lexicalized components in Hungarian are indicated like this.

MWEs with their lexicalized components in Indonesian are indicated like this.

MWEs with their lexicalized components in Italian are indicated like this.

MWEs with their lexicalized components in Japanese are indicated like this.

MWEs with their lexicalized components in Lithuanian are indicated like this.

MWEs with their lexicalized components in Maltese are indicated like this.

MWEs with their lexicalized components in Polish are indicated like this.

MWEs with their lexicalized components in Portuguese are indicated like this.

MWEs with their lexicalized components in Romanian are indicated like this.

MWEs with their lexicalized components in Slovene are indicated like this.

MWEs with their lexicalized components in Swedish are indicated like this.

MWEs with their lexicalized components in Turkish are indicated like this.

MWEs with their lexicalized components in Chinese are indicated like this.

Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. Languages use color codes according to their language groups. See the section on notation for more information.

In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.

Editing or adding examples

The shared examples edition spreadsheet used in previous versions of the guidelines is not used any more, all modifications are done on online and are visible immediately. To edit or add examples to the guidelines, you need to create an account on the guidelines 1.3 examples edition platform. You also have to ask Carlos Ramisch or Agata Savary to grant you the edition rights for your language.

Once you are logged in, you will see some buttons close to each example.

The 'copy' button copies the source of the example, and is useful if you want to copy the example of another language and then translate it.
The 'source' button is always available for languages you have the right to edit, and allows you to edit the example's XML-like source code, as described below.
The 'edit' button is only shown for examples that follow the formatting rules, and allows you to edit the example using a user-friendly interface.

Instructions to create well formatted examples (or correct the ill-formatted ones in 'source') are available in the example edition instructions.

When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language (use the 'copy' button), and then adapting it to your language. You can then paste the example in your language's 'source' mode. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example, you should always check if it is relevant in the context.

If there is something wrong or suspicious with your example, the interface will show an error or warning message.

If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the example empty or add a n.a. comment.

Examples with tags

Let us analyse the English example below, shown in 'source' mode:

MWEs with <lex>their lexicalized components</lex> in English are indicated like this.

As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex> and </lex>. When writing an example, you will often have to use XML tags. We describe below the most important ones.

Bold: you should surround lexicalized components with the tags <lex> and </lex>. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>. This code is presented as follows:

He will take a shower

Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe> and </nmwe> tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe> yields the following:

This is not an MWE

Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags  and . For example, the code <nmwe>This is not an MWE</nmwe> yields the following:

This is not an MWE

Latin-script transcription: You can optionally provide latin-script transcription if your language does not use latin characters. Latin-script transcriptions must be surrounded by the tags <latin> and </latin>. For example, the code الدرس <latin>ad-dars</latin> generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.

الدرس ad-dars

Gloss icon: You should also provide English glosses and translation for your examples. Glosses and translations should always be provided in English, and never in another language. Glosses must be surrounded by the tags <gl> and </gl>. Translations must be surrounded by <trans> and </trans>. English examples can also use the tag <trans> to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans> generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>.

défendre son bifteck defend one's beefsteak to defend one's interests

Comments: Some examples are presented followed by an explanation or comment, in normal font (black color). This is done by using the tags <n> and </n>. For example, the code some words <n>→ further details</n> generates this:

some words → further details

Newline: Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they can be presented on separate lines using the tag  . This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). This tag will be treated by the 'edit' interface to break examples that can be edited separately. For example, the code example 1 example 2 example 3 will be rendered as follows:

example 1
example 2
example 3

Inside normal text, you may also use tags such as  (italics),  (bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.

An error has occured !

Annotation guidelines corpora annotated for multiword expressions

Adding new examples in your language

Editing or adding examples

Examples with tags

Annotation guidelines
corpora annotated for multiword expressions