Differences

This shows you the differences between two versions of the page.

--- 2018-lifat-m2-1 [2018/09/21 14:17]
agata.savary [References]
+++ 2018-lifat-m2-1 [2018/09/21 14:42]
agata.savary
@@ Line 1: / Line 1: @@
-====== Lexicon-to-corpus multiword expression browser  ======
+====== Verbal Multiword Expression Discovery in French Based on Seen Data and Distributional Semantics  ======
   * **Domain:** Natural Language Processing
@@ Line 27: / Line 27: @@
 The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered:
-  * (i) candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings;
+  * candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings;
-  * (ii) the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.
+  * the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.
 Possible extensions of the objectives:
-  * (iii) integrating MWE discovery with MWE identification in //varIDE//
+  * integrating MWE discovery with MWE identification in //varIDE//
-  * (iv)  coupling word embedding-based lexical replacement with semantic resources such as WordNet.
+  * coupling word embedding-based lexical replacement with semantic resources such as WordNet.
@@ Line 56: / Line 56: @@
 ===== References ====
+  * Baldwin, T. and Kim, S. N. (2010) [[https://people.eng.unimelb.edu.au/tbaldwin/pubs/handbook2009.pdf|Multiword Expressions]], in Nitin Indurkhya and Fred J. Damerau (eds.)  Handbook of Natural Language Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267-292.
-Baldwin, T. and Kim, S. N. (2010) [[https://people.eng.unimelb.edu.au/tbaldwin/pubs/handbook2009.pdf|Multiword Expressions]], in Nitin Indurkhya and Fred J. Damerau (eds.)  Handbook of Natural Language Processing, Second Edition, CRC Press, Boca Raton, USA, pp. 267-292.
+  * Farahmand, M. Henderson, J., [[http://www.aclweb.org/anthology/W16-1809||Modeling the non-substitutability of multiword expressions with distributional semantics and a loglinear model]], Proceedings of the ACL 2016 Workshop on MWEs. Berlin, pp.61-66, 2016.
+  * Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http://www.aclweb.org/anthology/J09-1005|Unsupervised type and token identification of idiomatic expressions]]. Computational Linguistics 35(1):61–103
-Farahmand, M. Henderson, J., [[http://www.aclweb.org/anthology/W16-1809||Modeling the non-substitutability of multiword expressions with distributional semantics and a loglinear model]], Proceedings of the ACL 2016 Workshop on MWEs. Berlin, pp.61-66, 2016.
+  * Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, 675-682
+  * Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http://aclweb.org/anthology/C18-1219|If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions]], in the Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), Santa Fe, USA.
-Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http://www.aclweb.org/anthology/J09-1005|Unsupervised type and token identification of idiomatic expressions]]. Computational Linguistics 35(1):61–103
+  * Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http://aclweb.org/anthology/W18-4925|Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions]]. the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.222 - 240.
+  * Savary, A., Jacquemin, Ch. (2003): [[https://link.springer.com/content/pdf/10.1007%2F978-3-540-45115-0_6.pdf|Reducing Information Variation in Text]], in Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access, Proceedings of TESTIA 2000, 8th ELSNET European Summer School on Language and Speech Communication, Lecture Notes in Artificial Intelligence 2705, Springer Verlag, pp. 145-181.
-Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, 675-682
-Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http://aclweb.org/anthology/C18-1219|If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions]], in the Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), Santa Fe, USA.
-Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http://aclweb.org/anthology/W18-4925|Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions]]. the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.222 - 240.
-Savary, A., Jacquemin, Ch. (2003): [[https://link.springer.com/content/pdf/10.1007%2F978-3-540-45115-0_6.pdf|Reducing Information Variation in Text]], in Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access, Proceedings of TESTIA 2000, 8th ELSNET European Summer School on Language and Speech Communication, Lecture Notes in Artificial Intelligence 2705, Springer Verlag, pp. 145-181.
 ------------------------------

Syntactic Parsing and Multiword Expressions in French

User Tools

Site Tools

Differences

Page Tools