User Tools

Site Tools

Agence Nationale de la Recherche

job-2019-lis-postdoc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
job-2019-lis-postdoc [2018/12/26 10:26]
carlos.ramisch
job-2019-lis-postdoc [2018/12/26 10:31]
carlos.ramisch
Line 22: Line 22:
 Albeit this progress, the performance of MWE identification systems is still not on pair with the performance of other text analysis tools. For instance, the best MWE identification system at the [[http://multiword.sourceforge.net/sharedtaskresults2018 | PARSEME shared task 2018]] obtained an F-measure of 54 points, whereas the best parser at the [[http://universaldependencies.org/conll18/results.html | CoNLL 2018 shared task]] obtained an LAS of 75.84.  Albeit this progress, the performance of MWE identification systems is still not on pair with the performance of other text analysis tools. For instance, the best MWE identification system at the [[http://multiword.sourceforge.net/sharedtaskresults2018 | PARSEME shared task 2018]] obtained an F-measure of 54 points, whereas the best parser at the [[http://universaldependencies.org/conll18/results.html | CoNLL 2018 shared task]] obtained an LAS of 75.84. 
  
-Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the task. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic, and there is little to generalise from one MWE to another. As a consequence, most systems are able to cope with variants of observed expressions, but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://aclweb.org/anthology/S16-1140 | Cordeiro et al 2016]]).+Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the phenomenon. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic, and there is little to generalise from one MWE to another. As a consequence, most systems are able to cope with variants of observed expressions, but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://aclweb.org/anthology/S16-1140 | Cordeiro et al 2016]]).
  
 **The goal of this postdoc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover MWEs in corpora abound ([[http://aclweb.org/anthology/J90-1003 | Church and Hanks 1990]], [[http://aclweb.org/anthology/J93-1007 | Smadja 1993]], [[http://www.aclweb.org/anthology/C10-3015 | Ramisch et al 2010]], [[http://aclweb.org/anthology/D15-1290 | Riedl and Biemann 2015]], [[http://aclweb.org/anthology/D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This postdoc represents an opportunity to explore this promising research direction. **The goal of this postdoc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover MWEs in corpora abound ([[http://aclweb.org/anthology/J90-1003 | Church and Hanks 1990]], [[http://aclweb.org/anthology/J93-1007 | Smadja 1993]], [[http://www.aclweb.org/anthology/C10-3015 | Ramisch et al 2010]], [[http://aclweb.org/anthology/D15-1290 | Riedl and Biemann 2015]], [[http://aclweb.org/anthology/D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This postdoc represents an opportunity to explore this promising research direction.
job-2019-lis-postdoc.txt · Last modified: 2018/12/26 10:33 by carlos.ramisch