This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
2018-lifat-m2-1 [2018/09/21 13:56] agata.savary created |
2018-lifat-m2-1 [2018/09/21 14:42] (current) agata.savary |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
* **Domain:** Natural Language Processing | * **Domain:** Natural Language Processing | ||
Line 11: | Line 11: | ||
===== Motivation and context ===== | ===== Motivation and context ===== | ||
- | The internship will take place in the framework of the PARSEME-FR project, which involves several NLP teams in France. The aim is to boost applications in Natural Language Processing (NLP), by focusing on one of their major challenges: multiword expressions (MWEs). | + | The internship will take place in the framework of the [[http:// |
- | MWEs are groups of words which exhibit unpredicted properties (Baldwin & Kim, 2010). Most prominently, | + | MWEs are groups of words which exhibit unpredicted properties (Baldwin & Kim, 2010). Most prominently, |
- | One of the main aims of MWE-oriented NLP research is to model such expressions so as to optimize their automatic processing (for instance, to avoid their literal translation in machine translation systems). Two major MWE-related NLP tasks include MWE discovery and MWE identification. In the former, the input consists in large quantities of raw texts and the output is a list of potential MWEs. In the latter, and identifier takes a text on input and automatically annotates (points at) the occurrences of MWEs in it. MWE identification is a pre-requisite for downstream applications such as machine translation (which may want to treat MWEs with dedicated procedures). | + | One of the main aims of MWE-oriented NLP research is to model such expressions so as to optimize their automatic processing (for instance, to avoid their literal translation in machine translation systems). Two major MWE-related NLP tasks include |
- | Automatic identification of MWEs in 19 languages was addressed by the PARSEME shared task1 (Ramisch et al., 20182018), in which the BdTln team participated with the VarIDE system (Pasquer et al., 2018a). The results of the shared task show that identifying unseen MWEs (i.e. those MWEs which do not occur in the training data) is particularly challenging. Thus, identification should, ideally, exploit not only annotated corpora but also MWE lexicons and MWE discovery methods. | + | Automatic identification of MWEs in 19 languages was addressed by the PARSEME shared task1 (Ramisch et al., 20182018), in which the BdTln team participated with the VarIDE system (Pasquer et al., 2018a). The results of the shared task show that identifying |
+ | ===== Topics ===== | ||
+ | This internship is dedicated to discovering how MWE discovery could benefit from the previously seen data, rather than be performed from scratch. The hypothesis to be tested is that new (unseen) MWEs of certain types can be discovered due to their semantic similarity with known (previously seen) MWEs. For instance, knowing that //**haute température**// | ||
+ | |||
+ | To perform lexical substitution, | ||
===== Objectives ===== | ===== Objectives ===== | ||
- | The main objective | + | The objectives |
- | The internship can be divided in the following tasks: | + | * candidates for new MWEs are generated by replacing individual components |
- | * Study the linguistic properties | + | * the candidates generated |
- | * Develop a tool to automatically link annotated verbal MWEs and their corresponding entries | + | |
- | * Develop | + | |
- | * Optionally, extend the work to all multiword expressions, | + | |
- | ===== Profile ==== | + | Possible extensions |
- | * Master 1 or Master 2 in computational linguistics or computer science, | + | |
- | * Good knowledge | + | |
- | * Interests in linguistics and familiarity with language technology, | + | |
- | * Programming skills (python, web programming). | + | |
- | ===== Important dates ==== | + | |
- | * Application deadline: 15 January 2018 (or until filled) | + | * coupling word embedding-based lexical replacement with semantic resources such as WordNet. |
- | * Notification: | + | |
- | | + | |
- | * Position ends: July-August 2018 | + | |
- | ===== References | + | ===== Candidate' |
+ | * 2nd-year master student in computational linguistics, | ||
+ | * Interests in linguistics and familiarity with language technology | ||
+ | * Good knowledge of French | ||
+ | * Good programming skills, preferably in Python | ||
- | Marie Candito, Mathieu Constant, Carlos Ramisch, Agata Savary, Yannick Parmentier, Caroline Pasquer, and Jean-Yves Antoine. Annotation d’expressions polylexicales verbales en français. In Jean-Yves Antoine Iris Eshkol, editor, 24e conférence sur le Traitement Automatique des Langues Naturelles | + | ===== Important dates ==== |
+ | * Application deadline: 15 December 2018 (or until filled) | ||
+ | * Notification: 15 January 2018 | ||
+ | * Position starts: around February-March 2018 | ||
+ | * Position ends: around July-August 2018 | ||
- | Maurice Gross. Lexicon-grammar | + | ===== How to apply ===== |
- | + | Send your CV and a cover letter to: | |
- | Agata Savary, | + | * Caroline Pasquer: first.last@etu.univ-tours.fr |
+ | | ||
+ | * Carlos Ramisch: first.last@lis-lab.fr | ||
+ | ===== References ==== | ||
+ | * Baldwin, T. and Kim, S. N. (2010) [[https:// | ||
+ | * Farahmand, M. Henderson, J., [[http:// | ||
+ | * Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. [[http:// | ||
+ | * Peng, J., Aharodnik, K., Feldman, A.. (2018). A Distributional Semantics Model for Idiom Detection - The Case of English and Russian. Special Session on Natural Language Processing in Artificial Intelligence, | ||
+ | * Pasquer, C., Savary, A., Antoine, J.-Y., Ramisch, C. (2018b) [[http:// | ||
+ | * Ramisch C., Cordeiro, S., Savary, A., Vincze, V. et al. (2018) [[http:// | ||
+ | * Savary, A., Jacquemin, Ch. (2003): [[https:// | ||
------------------------------ | ------------------------------ | ||
- | ===== How to apply ===== | ||
- | Applications should be sent to Mathieu.Constant@univ-lorraine.fr. They should include a CV, a cover letter, and possibly support letters by teacher. |