User Tools

Site Tools

Agence Nationale de la Recherche

2018-lifat-m2-1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
2018-lifat-m2-1 [2018/09/21 14:08]
agata.savary
2018-lifat-m2-1 [2018/09/21 14:42] (current)
agata.savary
Line 1: Line 1:
-====== Lexicon-to-corpus multiword expression browser  ======+====== Verbal Multiword Expression Discovery in French Based on Seen Data and Distributional Semantics  ======
  
   * **Domain:** Natural Language Processing   * **Domain:** Natural Language Processing
Line 27: Line 27:
  
 The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered: The objectives of this internship are to exploit word embeddings for discovery of new MWEs based on their semantic proximity to the previously seen MWEs, contained in a lexicon or in an annotated corpus (resources of both types belong to the outcomes of the PARSEME-FR project). The discovery should lead to (semi-)automatic enrichment of these initial resources. Two stages are to be considered:
-  * (i) candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; +  * candidates for new MWEs are generated by replacing individual components of known MWEs by their semantically close words, established notably via word embeddings; 
-  * (ii) the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.+  * the candidates generated in this way are filtered based on their corpus frequency or contexts of occurrence; for instance, adjectives //chaud/froid// ‘hot/cold’ tend to co-occur more frequently with //*prendre* un **bain**/une **douche**// ‘to take a bath/shower’ than with //**prendre** une **baignoire**// (spacieuse/solide...) ‘take a (huge/solid) bathtub’.
  
 Possible extensions of the objectives: Possible extensions of the objectives:
  
-  * (iii) integrating MWE discovery with MWE identification in //varIDE// +  * integrating MWE discovery with MWE identification in //varIDE// 
-  * (iv)  coupling word embedding-based lexical replacement with semantic resources such as WordNet.+  * coupling word embedding-based lexical replacement with semantic resources such as WordNet.
  
  
-===== Profile ==== +===== Candidate's profile ==== 
-  * Master 1 or Master 2 in computational linguistics or computer science+  * 2nd-year master student in computational linguisticscomputer science or alike 
-  * Good knowledge of French,  +  * Interests in linguistics and familiarity with language technology 
-  * Interests in linguistics and familiarity with language technology, +  * Good knowledge of French 
-  * Programming skills (pythonweb programming).+  * Good programming skills, preferably in Python
  
 ===== Important dates ==== ===== Important dates ====
-  * Application deadline: 15 January 2018 (or until filled) +  * Application deadline: 15 December 2018 (or until filled) 
-  * Notification: 25 January 2018 +  * Notification: 15 January 2018 
-  * Position starts: around March 2018 +  * Position starts: around February-March 2018 
-  * Position ends: July-August 2018+  * Position ends: around July-August 2018 
 + 
 +===== How to apply ===== 
 +Send your CV and a cover letter to: 
 +  * Caroline Pasquer: first.last@etu.univ-tours.fr 
 +  * Agata Savary, Jean-Yves Antoine: first.last@univ-tours.fr 
 +  * Carlos Ramisch: first.last@lis-lab.fr
  
  
 ===== References ==== ===== References ====
- +  * BaldwinT. and KimS. N. (2010) [[https://people.eng.unimelb.edu.au/tbaldwin/pubs/handbook2009.pdf|Multiword Expressions]]in Nitin Indurkhya and Fred J. Damerau (eds.)  Handbook of Natural Language ProcessingSecond EditionCRC PressBoca Raton, USA, pp. 267-292. 
-Marie CanditoMathieu ConstantCarlos RamischAgata SavaryYannick ParmentierCaroline Pasquerand Jean-Yves AntoineAnnotation d’expressions polylexicales verbales en françaisIn Jean-Yves Antoine Iris Eshkoleditor24e conférence sur le Traitement Automatique des Langues Naturelles (TALN)Actes de TALNvolume 2 articles courts, pages 1–9OrléansFrance06 2017. +  * Farahmand, MHenderson, J., [[http://www.aclweb.org/anthology/W16-1809||Modeling the non-substitutability of multiword expressions with distributional semantics and a loglinear model]]Proceedings of the ACL 2016 Workshop on MWEs. Berlinpp.61-662016. 
- +  * Afsaneh FazlyPaul Cook and Suzanne Stevenson. 2009. [[http://www.aclweb.org/anthology/J09-1005|Unsupervised type and token identification of idiomatic expressions]]. Computational Linguistics 35(1):61103 
-Maurice GrossLexicon-grammar and the syntactic analysis of FrenchIn Proc. of COLING-ACL 1964pages 275–282StanfordCA1984Association for Computational Linguistics. +  * PengJ.AharodnikK., Feldman, A.. (2018)A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, 675-682 
- +  * PasquerC.SavaryA., AntoineJ.-Y., Ramisch, C. (2018b) [[http://aclweb.org/anthology/C18-1219|If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions]], in the Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), Santa Fe, USA.  
-Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico SangatiVeronika Vincze, Behrang QasemiZadehMarie CanditoFabienne CapVoula GiouliIvelina Stoyanova, and Antoine DoucetThe PARSEME shared task on automatic identification of verbal multiword expressionsIn Proc. of EACL 2017 Workshop on MWEspages 31–47ValenciaApril 2017 +  Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http://aclweb.org/anthology/W18-4925|Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions]]. the Joint Workshop on Linguistic AnnotationMultiword Expressions and Constructions (LAW-MWE-CxG-2018)Aug 2018Santa FeUnited States. Proceedings of the Joint Workshop on Linguistic AnnotationMultiword Expressions and Constructions (LAW-MWE-CxG-2018), pp.222 - 240. 
- +  * Savary, A., Jacquemin, Ch. (2003): [[https://link.springer.com/content/pdf/10.1007%2F978-3-540-45115-0_6.pdf|Reducing Information Variation in Text]], in Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access, Proceedings of TESTIA 2000, 8th ELSNET European Summer School on Language and Speech CommunicationLecture Notes in Artificial Intelligence 2705Springer Verlagpp. 145-181.
 ------------------------------ ------------------------------
-===== How to apply ===== 
  
-Applications should be sent to Mathieu.Constant@univ-lorraine.fr. They should include a CV, a cover letter, and possibly support letters by teacher. 
2018-lifat-m2-1.1537531693.txt.gz · Last modified: 2018/09/21 14:08 by agata.savary