Project outcomes
- Software
  - MWE identification software
  - Other software
- Language resources and datasets

Project outcomes

The goal of our project is to develop linguistic resources (lexicons, corpora, annotation guidelines) and software (parsers, MWE identifiers and linkers). Some of them are currently under development and will be published here when they are ready.

Software

MWE identification software

Tools which annotate multiword expressions automatically in running text, developed within the project or in close collaboration with PARSEME-FR project members.

ATILF-LLF system ("Transition" in shared task 2017)
VarIDE
Veyn
LGTagger
mwetoolkit
Seen2Seen et Seen2Unseen (cf. PARSEME shared task 1.2)

Some of these tools can be tested online on the PARSEME-FR demonstrator

Other software

Demonstrator of MWE identifiers and a corpus-lexicon browser
On-line PARSEME-FR corpus browser on Grew-match
PARSEME shared task utilities

Language resources and datasets

Verbal MWE-annotated corpora of the PARSEME shared tasks

The datasets of the PARSEME shared task contain 18-20 languages, including French, and can be downloaded from:

Corpus edition 1.0 (2017) on LINDAT/CLARIN. The French dataset was used in the PARSEME Shared Task on identification of verbal multiword expressions (edition 1.0, 2017). You can also download the French data only from the ORTOLANG platform.
Corpus edition 1.1 (2018) on LINDAT/CLARIN. The French dataset was used in the PARSEME Shared Task on identification of verbal multiword expressions (edition 1.1, 2018).
Corpus edition 1.2 (2020) on gitlab (temporary). The French dataset was used in the PARSEME Shared Task on identification of verbal multiword expressions (edition 1.2, 2020).
Annotation guidelines

Full-MWE annotated Sequoia treebank

Released on LINDAT/CLARIN as part of the Deep sequoia corpus
Annotation guidelines

MWE and coreference corpus

French corpus annotated both for MWEs and for coreference - data, annotation tools, manual validation results, statistics and reports.

Manually annotated web sample

4618 sentences with positive and negative examples of 90 selected verbal MWEs in French. The sentences stem from Wikipedia and webcrawling and were taken from the CoNLL shared task corpus.
Paper describing the construction of this dataset

Multilingual corpus of literal occurrences of multiword expressions

Corpus in Basque, German, Greek, Polish and Portuguese
Paper describing the construction and study of the corpus

French metagrammar with verbal MWEs

FrenchTAG metagrammar with encoded verbal MWEs
Paper describing the encoding process and the evaluation

Project-internal resources

For project members: PARSEME-FR GitLab

Table of Contents