User Tools

Site Tools

Agence Nationale de la Recherche

job-2019-lis-postdoc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
job-2019-lis-postdoc [2018/12/26 10:07]
carlos.ramisch created
job-2019-lis-postdoc [2018/12/26 10:33]
carlos.ramisch
Line 22: Line 22:
 Albeit this progress, the performance of MWE identification systems is still not on pair with the performance of other text analysis tools. For instance, the best MWE identification system at the [[http://multiword.sourceforge.net/sharedtaskresults2018 | PARSEME shared task 2018]] obtained an F-measure of 54 points, whereas the best parser at the [[http://universaldependencies.org/conll18/results.html | CoNLL 2018 shared task]] obtained an LAS of 75.84.  Albeit this progress, the performance of MWE identification systems is still not on pair with the performance of other text analysis tools. For instance, the best MWE identification system at the [[http://multiword.sourceforge.net/sharedtaskresults2018 | PARSEME shared task 2018]] obtained an F-measure of 54 points, whereas the best parser at the [[http://universaldependencies.org/conll18/results.html | CoNLL 2018 shared task]] obtained an LAS of 75.84. 
  
-Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the task. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic, and there is little to generalise from one MWE to another. As a consequence, most systems are able to cope with variants of observed expressions, but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://aclweb.org/anthology/S16-1140 | Cordeiro et al 2016]]).+Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the phenomenon. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic, and there is little to generalise from one MWE to another. As a consequence, most systems are able to cope with variants of observed expressions, but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://aclweb.org/anthology/S16-1140 | Cordeiro et al 2016]]).
  
-**The goal of this post-doc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover MWEs in corpora abound ([[http://aclweb.org/anthology/J90-1003 | Church and Hanks 1990]], [[http://aclweb.org/anthology/J93-1007 | Smadja 1993]], [[http://www.aclweb.org/anthology/C10-3015 | Ramisch et al 2010]], [[http://aclweb.org/anthology/D15-1290 | Riedl and Biemann 2015]], [[http://aclweb.org/anthology/D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This post-doc represents an opportunity to explore this promising research direction.+**The goal of this postdoc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover new MWEs in raw corpora abound ([[http://aclweb.org/anthology/J90-1003 | Church and Hanks 1990]], [[http://aclweb.org/anthology/J93-1007 | Smadja 1993]], [[http://www.aclweb.org/anthology/C10-3015 | Ramisch et al 2010]], [[http://aclweb.org/anthology/D15-1290 | Riedl and Biemann 2015]], [[http://aclweb.org/anthology/D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This postdoc represents an opportunity to explore this promising research direction.
  
-The TALEP team has experience with supervised MWE identification using recurrent neural networks ([[http://aclweb.org/anthology/W18-4933 | Zampieri et al 2018]]), statistical MWE discovery tools ([[http://www.lrec-conf.org/proceedings/lrec2016/pdf/271_Paper.pdf | Cordeiro et al 2018]]) and automatic compositionality prediction using word embeddings ([[http://aclweb.org/anthology/P16-1187 | Cordeiro et al 2016]]). These will serve as starting points for the exploratory work of this post-doc. Familiarity with (one of) these tools and/or technologies is a plus.+The TALEP team has experience with supervised MWE identification using recurrent neural networks ([[http://aclweb.org/anthology/W18-4933 | Zampieri et al 2018]]), statistical MWE discovery tools ([[http://www.lrec-conf.org/proceedings/lrec2016/pdf/271_Paper.pdf | Cordeiro et al 2018]]) and automatic compositionality prediction using word embeddings ([[http://aclweb.org/anthology/P16-1187 | Cordeiro et al 2016]]). These will serve as starting points for the exploratory work of this postdoc. Familiarity with (one of) these tools and/or technologies is a plus.
  
 ---------------------- ----------------------
 ==== Envronment ==== ==== Envronment ====
  
-This position is funded by the [[http://parsemefr.lis-lab.fr | ANR PARSEME-FR project]], a French spin-off of [[http://www.parseme.eu | PARSEME]]. The PARSEME community gathers partners from 30+ countries interested in the automatic processing of MWEs. Its main event is the [[http://multiword.sf.net/sharedtask2018 | PARSEME shared task]]. The goal of the PARSEME-FR project is to tackle the challenges posed by MWEs in NLP specifically for French. The post-doc will participate in the national project meetings, co-author articles, and interact with other PARSEME-FR and PARSEME members.+This position is funded by the [[http://parsemefr.lis-lab.fr | ANR PARSEME-FR project]], a French spin-off of [[http://www.parseme.eu | PARSEME]]. The PARSEME community gathers partners from 30+ countries interested in the automatic processing of MWEs. Its main event is the [[http://multiword.sf.net/sharedtask2018 | PARSEME shared task]]. The goal of the PARSEME-FR project is to tackle the challenges posed by MWEs in NLP specifically for French. The recruited person will participate in the national project meetings, co-author articles, and interact with other PARSEME-FR and PARSEME members.
  
-The post-doc will be supervised by [[http://pageperso.lis-lab.fr/~carlos.ramisch/ | Carlos Ramisch]] and [[http://pageperso.lis-lab.fr/~alexis.nasr/ | Alexis Nasr]]. The recruited person will become a member of the [[http://www.lis-lab.fr/talep/ | TALEP team]], specialised in computational linguistics. TALEP is a dynamic and international team of [[http://www.lis-lab.fr/ | LIS]], a computer science lab affiliated to CNRS and Aix Marseille University, located on the [[https://sciences.univ-amu.fr/sites-geographiques/site-luminy | Luminy campus]] in Marseille.+The postdoc will be supervised by [[http://pageperso.lis-lab.fr/~carlos.ramisch/ | Carlos Ramisch]] and [[http://pageperso.lis-lab.fr/~alexis.nasr/ | Alexis Nasr]]. The recruited person will become a member of the [[http://www.lis-lab.fr/talep/ | TALEP team]], specialised in computational linguistics. TALEP is a dynamic and international team of [[http://www.lis-lab.fr/ | LIS]], a computer science lab affiliated to CNRS and Aix Marseille University, located on the [[https://sciences.univ-amu.fr/sites-geographiques/site-luminy | Luminy campus]] in Marseille.
  
 [[https://www.univ-amu.fr/ | Aix Marseille University]] is one of the largest universities in France, providing a lively and diverse research environment which attracts scientists carrying out research in many areas, including computational linguistics, in collaboration with leading international organizations. [[https://www.univ-amu.fr/ | Aix Marseille University]] is one of the largest universities in France, providing a lively and diverse research environment which attracts scientists carrying out research in many areas, including computational linguistics, in collaboration with leading international organizations.
job-2019-lis-postdoc.txt · Last modified: 2018/12/26 10:33 by carlos.ramisch