User Tools

Site Tools

Agence Nationale de la Recherche

2018-lifat-m2-1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2018-lifat-m2-1 [2018/09/21 14:17]
agata.savary [References]
2018-lifat-m2-1 [2018/09/21 14:42]
agata.savary
Line 1: Line 1:
-====== Lexicon-to-corpus multiword expression browser  ======+====== Verbal Multiword Expression Discovery in French Based on Seen Data and Distributional Semantics  ======
  
   * **Domain:** Natural Language Processing   * **Domain:** Natural Language Processing
Line 27: Line 27:
  
 The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered: The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered:
-  * (i) candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; +  * candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; 
-  * (ii) the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.+  * the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.
  
 Possible extensions of the objectives: Possible extensions of the objectives:
  
-  * (iii) integrating MWE discovery with MWE identification in //varIDE// +  * integrating MWE discovery with MWE identification in //varIDE// 
-  * (iv)  coupling word embedding-based lexical replacement with semantic resources such as WordNet.+  * coupling word embedding-based lexical replacement with semantic resources such as WordNet.
  
  
Line 56: Line 56:
  
 ===== References ==== ===== References ====
- +  * Baldwin, T. and Kim, S. N. (2010) [[https://people.eng.unimelb.edu.au/tbaldwin/pubs/handbook2009.pdf|Multiword Expressions]], in Nitin Indurkhya and Fred J. Damerau (eds.)  Handbook of Natural Language Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267-292. 
-Baldwin, T. and Kim, S. N. (2010) [[https://people.eng.unimelb.edu.au/tbaldwin/pubs/handbook2009.pdf|Multiword Expressions]], in Nitin Indurkhya and Fred J. Damerau (eds.)  Handbook of Natural Language Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267-292. +  Farahmand, M. Henderson, J., [[http://www.aclweb.org/anthology/W16-1809||Modeling the non-substitutability of multiword expressions with distributional semantics and a loglinear model]], Proceedings of the ACL 2016 Workshop on MWEs. Berlin, pp.61-66, 2016. 
- +  Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http://www.aclweb.org/anthology/J09-1005|Unsupervised type and token identification of idiomatic expressions]]. Computational Linguistics 35(1):61–103 
-Farahmand, M. Henderson, J., [[http://www.aclweb.org/anthology/W16-1809||Modeling the non-substitutability of multiword expressions with distributional semantics and a loglinear model]], Proceedings of the ACL 2016 Workshop on MWEs. Berlin, pp.61-66, 2016. +  Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, 675-682 
- +  Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http://aclweb.org/anthology/C18-1219|If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions]], in the Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), Santa Fe, USA.  
-Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http://www.aclweb.org/anthology/J09-1005|Unsupervised type and token identification of idiomatic expressions]]. Computational Linguistics 35(1):61–103 +  Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http://aclweb.org/anthology/W18-4925|Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions]]. the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.222 - 240. 
- +  Savary, A., Jacquemin, Ch. (2003): [[https://link.springer.com/content/pdf/10.1007%2F978-3-540-45115-0_6.pdf|Reducing Information Variation in Text]], in Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access, Proceedings of TESTIA 2000, 8th ELSNET European Summer School on Language and Speech Communication, Lecture Notes in Artificial Intelligence 2705, Springer Verlag, pp. 145-181.
-Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, 675-682 +
-  +
-Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http://aclweb.org/anthology/C18-1219|If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions]], in the Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), Santa Fe, USA.  +
- +
-Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http://aclweb.org/anthology/W18-4925|Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions]]. the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.222 - 240. +
- +
-Savary, A., Jacquemin, Ch. (2003): [[https://link.springer.com/content/pdf/10.1007%2F978-3-540-45115-0_6.pdf|Reducing Information Variation in Text]], in Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access, Proceedings of TESTIA 2000, 8th ELSNET European Summer School on Language and Speech Communication, Lecture Notes in Artificial Intelligence 2705, Springer Verlag, pp. 145-181. +
- +
 ------------------------------ ------------------------------
  
2018-lifat-m2-1.txt · Last modified: 2018/09/21 14:42 by agata.savary