User Tools

Site Tools

Agence Nationale de la Recherche

job-2017-lif-postdoc

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
job-2017-lif-postdoc [2017/09/10 14:43]
carlos.ramisch
job-2017-lif-postdoc [2017/09/15 10:31] (current)
carlos.ramisch
Line 1: Line 1:
-Aix Marseille University offers a 2-years post-doc position in computational linguistics. 
- 
 ===== Identification of multiword expressions off the beaten path : towards MWE-aware semantic parsing ===== ===== Identification of multiword expressions off the beaten path : towards MWE-aware semantic parsing =====
  
Line 11: Line 9:
   * **Supervisors**: [[http://pageperso.lif.univ-mrs.fr/~carlos.ramisch|Carlos Ramisch]] and [[http://pageperso.lif.univ-mrs.fr/~alexis.nasr|Alexis Nasr]]   * **Supervisors**: [[http://pageperso.lif.univ-mrs.fr/~carlos.ramisch|Carlos Ramisch]] and [[http://pageperso.lif.univ-mrs.fr/~alexis.nasr|Alexis Nasr]]
   * **Duration:** 2 years, starting in January 2018 (or earlier)   * **Duration:** 2 years, starting in January 2018 (or earlier)
-  * **Remuneration:** around 2000-2500€/month (depending on experience)+  * **Remuneration:** around 2000-2400€/month net (depending on experience)
   * **Funding:** [[http://parsemefr.lif.univ-mrs.fr/|ANR PARSEME-FR project]]   * **Funding:** [[http://parsemefr.lif.univ-mrs.fr/|ANR PARSEME-FR project]]
   * **Keywords**: multiword expressions, MWE identification, syntactic parsing, word embeddings, recurrent neural networks, compositionality prediction, semantic parsing   * **Keywords**: multiword expressions, MWE identification, syntactic parsing, word embeddings, recurrent neural networks, compositionality prediction, semantic parsing
Line 18: Line 16:
 ==== Context ==== ==== Context ====
  
-We are hiring a post-doc researcher in the domain of natural language processing. This position is funded by the [[http://parsemefr.lif.univ-mrs.fr | ANR PARSEME-FR project ]], a French spin-off of the [[http://www.parseme.eu | PARSEME action]]. The PARSEME community gathers partners from 31 countries around scientific challenges in automatic processing of multiword expressions. The goal of the PARSEME-FR project is to tackle the challenges posed by multiword expressions in natural language processing systems specifically for French. The post-doc will participate in the PARSEME-FR project meetings and interact with other project members in France and with members of the international PARSEME community. +We are hiring a post-doc researcher in the domain of natural language processing. This position is funded by the [[http://parsemefr.lif.univ-mrs.fr | ANR PARSEME-FR project ]], a French spin-off of the [[http://www.parseme.eu | PARSEME action]]. The PARSEME community gathers partners from 31 countries around scientific challenges in automatic processing of multiword expressions. The goal of the PARSEME-FR project is to tackle the challenges posed by multiword expressions in natural language processing systems specifically for French. The post-doc will participate in the national project meetings, co-author articles submitted to top-tier conferences and journals, and interact with other project members in France and with members of the international research community
 + 
 +The post-doc will be supervised by Carlos Ramisch and Alexis Nasr (Aix Marseille University) and will become a member of the TALEP team of LIF, specialised in computational linguistics.  The TALEP team is part of LIF, a computer science lab affiliated to CNRS and Aix Marseille University, located on the Luminy campus in Marseilles. 
 + 
 +The metropolitan area of Aix-Marseilles is a the second largest in France. It offers a vibrant environment conveniently situated on the south coast of France, in the Provence-Alpes-Côte d'Azur region. Aix-Marseilles is a cosmopolite and well connected urban area with Mediterranean climate and surrounded by stunning landscapes such as the Calanques natural park and the Provence region. 
 + 
 +Aix Marseille University is the largest university in France and of the francophone world by its number of students and staff. It provides a lively and diverse research environment which attracts scientists carrying out research in many areas, including computational linguistics, in collaboration with leading international organizations.
  
-The post-doc will be supervised by Carlos Ramisch and Alexis Nasr (Aix-Marseille University) and will become a member of the TALEP team, specialised in computational linguistics. The TALEP team is part of LIF, a computer science lab affiliated to CNRS and Aix Marseille University, located on the Luminy campus in Marseilles, the second-largest city in France. Marseilles is conveniently situated on the south coast of France, in the Provence-Alpes-Côte d'Azur region. It is a cosmopolite and well connected city with Mediterranean climate and surrounded by stunning landscapes such as the Calanques natural park, where the Luminy campus is located. 
  
 -------------------------- --------------------------
Line 27: Line 30:
 One of the main goals of natural language processing (NLP) systems is to automatically find the underlying structure in running text. Many tools and techniques for **text analysis** have been developed to transform sequences of characters into increasingly abstract representations. This process is usually based on a pipeline of modules which subsequently perform operations such as text segmentation, tokenization, part-of-speech tagging, morphological analysis, lemmatization, syntactic and semantic parsing. One of the main goals of natural language processing (NLP) systems is to automatically find the underlying structure in running text. Many tools and techniques for **text analysis** have been developed to transform sequences of characters into increasingly abstract representations. This process is usually based on a pipeline of modules which subsequently perform operations such as text segmentation, tokenization, part-of-speech tagging, morphological analysis, lemmatization, syntactic and semantic parsing.
  
-Multiword expressions (MWEs) such as //French fries, take a break, do one's best// and //spill the beans// often pose problems for text analysis chains because of their idiosyncratic nature (Baldwin and Kim 2009, Sag et al. 2001). Among their notably challenging characteristics, they can include words that only occur in a given expression (e.g. //astray// in //go astray//), they can present irregular syntactic structure (e.g. //by and large//, an adverbial formed by the coordination of a preposition with an adjective), they can be discontinuous (e.g. //to **take** this relevant remark **into account**//), they can be ambiguous (e.g. //a piece of cake// can be something very easy or something you can eat) and they can present some degree of semantic non-compositionality (e.g. a //hot dog// is not literally a //dog//). The task of automatic **MWE identification** consists in finding such irregularities in text, that is, identifying which words are part of a multiword expression, and how they are related to each other.+Multiword expressions (MWEs) such as //French fries, take a break, do one's best// and //spill the beans// often pose problems for text analysis chains because of their idiosyncratic nature (Baldwin and Kim 2009, Sag et al. 2001). Among their notably challenging characteristics, they can include words that only occur in a given expression (e.g. //astray// in //go astray//), they can present irregular syntactic structure (e.g. //by and large//, an adverbial formed by the coordination of a preposition with an adjective), they can be discontinuous (e.g. //to **take** this relevant remark **into account**//), they can be ambiguous (e.g. //a piece of cake// can be something very easy or something you can eat) and they can present some degree of semantic non-compositionality (e.g. a //hot dog// is not literally a //dog//). The task of automatic **MWE identification** consists in finding such irregularities in text, that is, determining which words are part of a multiword expression, and how they are related to each other.
  
 Considerable progress has been made in the last years to understand and model the interactions between MWE identification and syntactic parsing. It has been shown that the automatic discovery of new MWEs can greatly benefit from parsed data (Seretan, 2008). The use of sequence models, such as CRFs and structured perceptrons, has also been explored as a means to tag -- mainly contiguous -- expressions in context prior, to syntactic parsing (Riedl and Biemann 2016, Schneider et al. 2014, Constant and Sigogne 2011). The use of subtrees, such as special multiword constituents (Green et al. 2013) and dependencies (Nasr et al. 2015, Vincze et al. 2013), to learn parsing models has also been investigated to deal with syntactically irregular, ambiguous and discontinuous MWEs. These challenges have also been addressed by joint models such as a synchronous transition-based dependency parser and MWE segmenter (Constant and Nivre 2016). Considerable progress has been made in the last years to understand and model the interactions between MWE identification and syntactic parsing. It has been shown that the automatic discovery of new MWEs can greatly benefit from parsed data (Seretan, 2008). The use of sequence models, such as CRFs and structured perceptrons, has also been explored as a means to tag -- mainly contiguous -- expressions in context prior, to syntactic parsing (Riedl and Biemann 2016, Schneider et al. 2014, Constant and Sigogne 2011). The use of subtrees, such as special multiword constituents (Green et al. 2013) and dependencies (Nasr et al. 2015, Vincze et al. 2013), to learn parsing models has also been investigated to deal with syntactically irregular, ambiguous and discontinuous MWEs. These challenges have also been addressed by joint models such as a synchronous transition-based dependency parser and MWE segmenter (Constant and Nivre 2016).
Line 37: Line 40:
 The first phase will consist in adapting existing sequence models to tag syntactic trees instead of flat word sequences. The first experiments will focus on verbal MWEs using the corpora of the [[http://multiword.sourceforge.net/sharedtask2017/ | PARSEME shared task on the automatic identification of verbal MWEs]]. Therefore, it is crucial to abstract away intervening elements and inflection, working on trees as an input and tagging those nodes and dependencies that are part of a MWE. The main assumption here is that verbal expressions are syntactically regular, thus we can place MWE identification //after// syntactic parsing and before semantic analysis.  The first phase will consist in adapting existing sequence models to tag syntactic trees instead of flat word sequences. The first experiments will focus on verbal MWEs using the corpora of the [[http://multiword.sourceforge.net/sharedtask2017/ | PARSEME shared task on the automatic identification of verbal MWEs]]. Therefore, it is crucial to abstract away intervening elements and inflection, working on trees as an input and tagging those nodes and dependencies that are part of a MWE. The main assumption here is that verbal expressions are syntactically regular, thus we can place MWE identification //after// syntactic parsing and before semantic analysis. 
  
-The second phase will focus on the use of word embeddings and deep learning models to perform the classification task. Methods based on word embeddings are potentially interesting for two main reasons. First, they could find MWEs that are similar to known ones by using vector similarity. Therefore, they could perform better generalizations based on little training data, increasing the coverage of the system. Second, methods based on word embeddings can identify idiomatic MWEs by identifying word combinations whose overall vector is distant from the vectors of its component words (Cordeiro et al. 2016). The system developed in the first and second phases will be evaluated on French corpora, currently under annotation, covering all MWE categories, not only verbal ones.+The second phase will focus on the use of word embeddings and deep learning models to perform the classification of tree nodes as belonging to a given expression. Methods based on word embeddings are potentially interesting for two main reasons. First, they could find MWEs that are similar to known ones by using vector similarity. Therefore, they could perform better generalizations based on little training data, increasing the coverage of the system. Second, methods based on word embeddings can identify idiomatic MWEs by identifying word combinations whose overall vector is distant from the vectors of its component words (Cordeiro et al. 2016). The system developed in the first and second phases will be evaluated on French corpora, currently under annotation, covering all MWE categories, not only verbal ones.
  
 Once the MWE identification system is both precise and robust enough, the third phase consists in applying tree transformation operations on the resulting tagged trees so that they become closer to semantic predicate-argument structures. Verbal MWEs are particularly relevant here, since phenomena such as light-verb constructions, verbal idioms and inherently reflexive verbs should be modelled as atomic multiword predicates. The final output will be a tree (or graph) of identified predicates and their arguments, thus allowing further processing by downstream applications that require the extraction of semantic structures from text. Once the MWE identification system is both precise and robust enough, the third phase consists in applying tree transformation operations on the resulting tagged trees so that they become closer to semantic predicate-argument structures. Verbal MWEs are particularly relevant here, since phenomena such as light-verb constructions, verbal idioms and inherently reflexive verbs should be modelled as atomic multiword predicates. The final output will be a tree (or graph) of identified predicates and their arguments, thus allowing further processing by downstream applications that require the extraction of semantic structures from text.
Line 48: Line 51:
   * Interest in linguistics and familiarity with language technology   * Interest in linguistics and familiarity with language technology
   * Capacity to work independently and as part of a team   * Capacity to work independently and as part of a team
 +
  
 ------------------------- -------------------------
Line 53: Line 57:
  
   * **Application deadline: October 15, 2017 (or until fulfilled)**   * **Application deadline: October 15, 2017 (or until fulfilled)**
-  * Position starts: January 2018 (or earlier if possible)+  * Position starts: December 2017 or January 2018
   * Duration: 2 years, that is, 1 year renewable once   * Duration: 2 years, that is, 1 year renewable once
 +
  
 --------------------- ---------------------
 ==== Application ==== ==== Application ====
  
-Candidates should send the following documents as a single attached document named **LASTNAME-Firstname.pdf**, in French **OR** in English, to Carlos Ramisch and Alexis Nasr (FirstName.LastName@lif.univ-mrs.fr):+**Applications should be sent before October 15, 2017**. Candidates should send the following documents as a single attached document named **LASTNAME-Firstname.pdf**, in French OR in English, to Nuria Gala, Carlos Ramisch and Alexis Nasr (FirstName.LastName@univ-amu.fr), indicating "Application post-doc AMU" in the subject line :
   * a CV, including a list of publications   * a CV, including a list of publications
-  * a cover letter explaining how this position matches your research interests and experience, +  * a cover letter explaining how the offer matches your interests and experience 
-  * and the names and emails of 2 referees to be contacted+  * a copy of their PhD degree or a document indicating the expected defense date 
 +  * and the names and emails of 1 or 2 referees to be contacted 
 + 
 +Candidates applying to both positions should indicate and motivate this in their cover letter, but send a single application file. 
  
 -------------------- --------------------
Line 77: Line 86:
   * [[ http://aclweb.org/anthology/W16-1816 | Riedl M. and Biemann C. (2016) Impact of MWE resources in Multiword Recognition ]]   * [[ http://aclweb.org/anthology/W16-1816 | Riedl M. and Biemann C. (2016) Impact of MWE resources in Multiword Recognition ]]
   * [[ http://aclweb.org/anthology/I13-1024 | Vincze V. et al. (2013) Dependency Parsing for Identifying Hungarian Light Verb Constructions ]]   * [[ http://aclweb.org/anthology/I13-1024 | Vincze V. et al. (2013) Dependency Parsing for Identifying Hungarian Light Verb Constructions ]]
-  
-  
  
job-2017-lif-postdoc.1505047409.txt.gz · Last modified: 2017/09/10 14:43 by carlos.ramisch