This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
2018-lifat-m2-1 [2018/09/21 14:11] agata.savary |
2018-lifat-m2-1 [2018/09/21 14:42] agata.savary |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
* **Domain:** Natural Language Processing | * **Domain:** Natural Language Processing | ||
Line 27: | Line 27: | ||
The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered: | The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered: | ||
- | * (i) candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; | + | * candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; |
- | * (ii) the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives // | + | * the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives // |
Possible extensions of the objectives: | Possible extensions of the objectives: | ||
- | * (iii) integrating MWE discovery with MWE identification in // | + | * integrating MWE discovery with MWE identification in // |
- | * (iv) | + | * coupling word embedding-based lexical replacement with semantic resources such as WordNet. |
Line 56: | Line 56: | ||
===== References ==== | ===== References ==== | ||
- | + | * Baldwin, T. and Kim, S. N. (2010) [[https:// | |
- | Marie Candito, Mathieu Constant, Carlos Ramisch, Agata Savary, Yannick Parmentier, Caroline Pasquer, and Jean-Yves Antoine. Annotation d’expressions polylexicales verbales en français. In Jean-Yves Antoine Iris Eshkol, editor, 24e conférence sur le Traitement Automatique des Langues Naturelles (TALN), Actes de TALN, volume 2 : articles courts, pages 1–9, Orléans, France, 06 2017. | + | * Farahmand, M. Henderson, J., [[http:// |
- | + | * Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http:// | |
- | Maurice Gross. Lexicon-grammar and the syntactic analysis of French. In Proc. of COLING-ACL 1964, pages 275–282, Stanford, CA, 1984. Association for Computational Linguistics. | + | * Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, |
- | + | * Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http:// | |
- | Agata Savary, Carlos | + | |
- | + | * Savary, A., Jacquemin, Ch. (2003): [[https:// | |
------------------------------ | ------------------------------ | ||