User Tools

Site Tools

Agence Nationale de la Recherche

job-2019-lis-postdoc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

job-2019-lis-postdoc [2018/12/26 10:31]
carlos.ramisch
job-2019-lis-postdoc [2018/12/26 10:33] (current)
carlos.ramisch
Line 24: Line 24:
 Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the phenomenon. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic,​ and there is little to generalise from one MWE to another. As a consequence,​ most systems are able to cope with variants of observed expressions,​ but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://​aclweb.org/​anthology/​S16-1140 | Cordeiro et al 2016]]). Part of these figures can be explained by the challenging nature of MWEs, and by the sparse amount of training data. However, the models being employed are also not fully compatible with the nature of the phenomenon. Indeed, supervised learning is based on generalisations made from observations. MWEs are by definition idiosyncratic,​ and there is little to generalise from one MWE to another. As a consequence,​ most systems are able to cope with variants of observed expressions,​ but fail in detecting new ones, those never observed in the training corpus. It is unclear whether sophisticated deep learning architectures can be beaten by much simpler memorisation baselines ([[http://​aclweb.org/​anthology/​S16-1140 | Cordeiro et al 2016]]).
  
-**The goal of this postdoc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover MWEs in corpora abound ([[http://​aclweb.org/​anthology/​J90-1003 | Church and Hanks 1990]], [[http://​aclweb.org/​anthology/​J93-1007 | Smadja 1993]], [[http://​www.aclweb.org/​anthology/​C10-3015 | Ramisch et al 2010]], [[http://​aclweb.org/​anthology/​D15-1290 | Riedl and Biemann 2015]], [[http://​aclweb.org/​anthology/​D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This postdoc represents an opportunity to explore this promising research direction.+**The goal of this postdoc is to improve current MWE identification systems by trying to increase their performance on unseen MWEs**. Therefore, the recruited researcher will study, implement and evaluate original methods to enrich supervised MWE identification models with information automatically extracted from large unannotated corpora. Methods to discover ​new MWEs in raw corpora abound ([[http://​aclweb.org/​anthology/​J90-1003 | Church and Hanks 1990]], [[http://​aclweb.org/​anthology/​J93-1007 | Smadja 1993]], [[http://​www.aclweb.org/​anthology/​C10-3015 | Ramisch et al 2010]], [[http://​aclweb.org/​anthology/​D15-1290 | Riedl and Biemann 2015]], [[http://​aclweb.org/​anthology/​D15-1201 | Yazdani et al 2015]]), but they have rarely been combined in large-scale MWE identification pipelines. This postdoc represents an opportunity to explore this promising research direction.
  
 The TALEP team has experience with supervised MWE identification using recurrent neural networks ([[http://​aclweb.org/​anthology/​W18-4933 | Zampieri et al 2018]]), statistical MWE discovery tools ([[http://​www.lrec-conf.org/​proceedings/​lrec2016/​pdf/​271_Paper.pdf | Cordeiro et al 2018]]) and automatic compositionality prediction using word embeddings ([[http://​aclweb.org/​anthology/​P16-1187 | Cordeiro et al 2016]]). These will serve as starting points for the exploratory work of this postdoc. Familiarity with (one of) these tools and/or technologies is a plus. The TALEP team has experience with supervised MWE identification using recurrent neural networks ([[http://​aclweb.org/​anthology/​W18-4933 | Zampieri et al 2018]]), statistical MWE discovery tools ([[http://​www.lrec-conf.org/​proceedings/​lrec2016/​pdf/​271_Paper.pdf | Cordeiro et al 2018]]) and automatic compositionality prediction using word embeddings ([[http://​aclweb.org/​anthology/​P16-1187 | Cordeiro et al 2016]]). These will serve as starting points for the exploratory work of this postdoc. Familiarity with (one of) these tools and/or technologies is a plus.
job-2019-lis-postdoc.txt · Last modified: 2018/12/26 10:33 by carlos.ramisch