Work Package 1: MWE representation and annotation


In the framework of the PARSEME Shared Task on identification of verbal MWEs, Agata Savary, Carlos Ramisch and Marie Candito participated in the writing of the annotation guidelines (Savary et al. MWE 2017). Marie Candito, Mathieu Constant, Carlos Ramisch, Agata Savary, Yannick Parmentier, Caroline Pasquer and Jean-Yves Antoine produced the French dataset (Candito et al. TALN 2017). This dataset, composed of the Sequoia corpus and the French UD treebank (about 19,000 sentences), includes 5,000 annotated verbal MWEs,

Work in progress

The annotation of the Sequoia corpus is now being extended to all MWEs, using annotation guidelines under construction. The release of the data is planned for the end of 2017.