User Tools

Site Tools

Agence Nationale de la Recherche

wp2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wp2 [2016/02/17 19:14]
agata.savary
wp2 [2017/09/18 17:50] (current)
matthieu.constant wp2
Line 1: Line 1:
 __Work Package 2__: **MWE Lexicon** __Work Package 2__: **MWE Lexicon**
-  * **Partners in charge**: LI (Agata Savary) and LIGM (Mathieu Constant) +  * **Partners in charge**: LI (Agata Savary) and ATILF (Mathieu Constant) 
-  * **Partners involved**: LI, LIF, LIGM+  * **Partners involved**: LI, LIF, ATILF, LIGM
   * **Objectives**: Build a unified and enriched MWE lexicons, including morphological, distributional, syntactic and semantic information. Multiword NEs will get special treatment as they will be associated with pragmatic information (i.e. linking with the LOD). The encoded features will be of varying nature - either symbolic or numeric.   * **Objectives**: Build a unified and enriched MWE lexicons, including morphological, distributional, syntactic and semantic information. Multiword NEs will get special treatment as they will be associated with pragmatic information (i.e. linking with the LOD). The encoded features will be of varying nature - either symbolic or numeric.
   * **Final products**:    * **Final products**: 
Line 13: Line 13:
     * **WP 2.5**: Converting the lexicon to a standard export format     * **WP 2.5**: Converting the lexicon to a standard export format
     * **WP 2.6**: Projection on treebanks     * **WP 2.6**: Projection on treebanks
 +
 +-----
 +
 +**Results**
 +
 +Before the actual construction of the unified MWE lexicon, some preliminary studies have been performed:  
 +  * a state-of-the-art of the different formats of MWE lexicons by Agata Savary in the framework of the PARSEME COST Action.
 +  * experiments for extracting linguistic information from various existing MWE lexicons (training period at LIGM in 2016 by Manolo Iborra, supervised by Mathieu Constant)
 +  * inventory and documentation of the properties in the lexicon-grammar tables of frozen expressions, as well as selection of lexical entries based on WP1 criteria (training period at LIGM, by Fabrice Beltran, supervised by Eric Laporte).
 +
 +
 +Preparatory work for next WP2 tasks has also been undertaken: 
 +  * Carlos Ramisch and colleagues developed methods based on word embeddings to perform discovery and semantic processing of MWEs (Cordeiro et al. ACL 2016, Ramisch et al. ACL 2016, Ramisch et al. LREC 2016, Ramisch et al. MWE 2017, Vargas et al. MWE 2017). 
 +  * Waszczuk et Savary (BSNLP 2017) designed an algorithm to project heterogeneous MWE lexicons on a constituent treebank. Cordeiro et al. (SemEval 2016) developed a symbolic method to identify MWEs in a text from a lexicon.
 +
 +**Work in progress**
 +
 +A group of researchers (Mathieu Constant, Agata Savary, Jean-Yves Antoine, Caroline Pasquer, Takuya Nakamura, Carlos Ramisch) is presently working on an internal format of lexicon in order to encode fine-grained properties of MWEs. 
 +
 +In the meantime, Agata Savary and colleagues are exploring the platform XMG in order to have an object-oriented encoding of MWEs.(cf. Lichte et al. to appear).
 +
wp2.1455732894.txt.gz · Last modified: 2016/02/17 19:14 by agata.savary