This is an old revision of the document!
The PARSEME-FR project offers a 1-year post-doc position (renewable) in Computational linguistics, starting in July or September 2018. Candidates should send their application before May, 15 2018 (see contact information below).
(in addition to the salary, the contract includes health benefits)
The proposed postdoctoral internship concerns data-driven modeling of French reflexive verbal forms. Depending on the candidate skills and interests, a pure NLP or a computational linguistics orientation can be given to the proposed research.
Reflexive verbal forms are a pervasive phenomena accross languages. In French they bear a reflexive clitic (me / te / se / nous / vous) that agrees with the verb's subject. This formal unity hides a range of diverse situations concerning the relation between the forms with or without the reflexive clitic (forms "seV" versus forms "V"):
A key aspect in this range of situations is that some seV differ from the V form in an unexpected way, whereas other cases show a regular relation with the V form. Several typologies have been proposed for French or more generally for Romance languages (e.g. Boons et al., 1976, Creissels, 2007, Dobrovie-Sorin, 2016), although these typologies are often presented as continuums.
We propose to investigate the reflexive forms typology on an empirical basis, both: (i) with an NLP objective of automatically detecting seV forms that cannot be interpreted regularly (ii) and from a linguistic perspective, with the objective to model the relation between seV and V forms. Scientific questions that can be investigate during this postdoc are for instance: - how does a typology infered using observed valency of seV / V forms on large corpora compares to hand-craft typologies? - is it possible to detect syntactic and semantic characteristics correlated with the existence of a certain type of seV form?
Possible techniques are for instance distributional models acquired from large corpora, or clustering for seV classes induction.
For evaluation and boot-strapping for automatic classification, two datasets will be available: - the French part of the annotated corpora (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2282) used for the PARSEME shared task on automatic identification of verbal MWEs (https://typo.uni-konstanz.de/parseme/index.php/2-general/142-parseme-shared-task-on-automatic-detection-of-verbal-mwes) - a corpus of seV forms manually tagged with fine-grained classes (ongoing annotation by L. Barque, M. Candito and R. Huyghe)
Depending on the native language of the hired postdoctoral fellow, reflexives in other languages can also be studied.
The PARSEME-FR project aims at improving linguistic representativeness, precision, robustness and computational efficiency of Natural Language Processing (NLP) applications, notably parsing of French. The project focuses on a major bottleneck of these applications: MultiWord Expressions (MWEs), that is, groups of words that must be treated as units at some level of linguistic processing, such as hot dog, hard disk, kick the bucket, United Nations and pay attention. It is a spin-off of PARSEME, an European IC1207 COST action on the same topic.
The postdoctoral fellow will work in particular with Marie Candito and Lucie Barque, at the Laboratoire de linguistique formelle, at Paris Diderot university, and with Richard Huyghe (Fribourg University) and other members of the project.