Syntactic Parsing and Multiword Expressions in French

This is an old revision of the document!

Postdoctoral fellowship in computational linguistics

The PARSEME-FR project offers a 1-year post-doc position (renewable) in Computational linguistics, starting in July or September 2018. Candidates should send their application before May, 15 2018 (see contact information below).

Duration: 12 months, starting in July or Sept. 2018 (open until filled, with possibility of renewal for another year)
Location: Paris, LLF
Employer: Paris Diderot University
Contract: fixed term position
Remuneration: approx. 2,300€ per month net income

(in addition to the salary, the contract includes health benefits)

Topic: Data-driven modeling of reflexive verbal forms

The proposed postdoctoral internship concerns data-driven modeling of French reflexive verbal forms. Depending on the candidate skills and interests, a pure NLP or a computational linguistics orientation can be given to the proposed research.

Reflexive verbal forms are a pervasive phenomena accross languages. In French they bear a reflexive clitic (me / te / se / nous / vous) that agrees with the verb's subject. This formal unity hides a range of diverse situations concerning the relation between the forms with or without the reflexive clitic (forms "seV" versus forms "V"):

the V form may not exist (se suicider (SE suicide 'to commit suicide'))

the V form may have a different subcat and meaning (s'apercevoir (SE see, 'to realize'))

true reflexive (Anna se voit (Anna SE sees 'Anna sees herself'))

reciprocal (Elles se sont parlé (They SE are spoken 'They spoke to each other'))

mediopassive (Ce problème se rencontre souvent (this problem SE encounters often, 'one often encounters this problem'))

marked anticausative (La branche se fendit (the branch SE split, 'the branch split')))

etc…

A key aspect in this range of situations is that some seV differ from the V form in an unexpected way, whereas other cases show a regular relation with the V form. Several typologies have been proposed for French or more generally for Romance languages (e.g. Boons et al., 1976, Creissels, 2007, Dobrovie-Sorin, 2016), although these typologies are often presented as continuums.

We propose to investigate the reflexive forms typology on an empirical basis, both: (i) with an NLP objective of automatically detecting seV forms that cannot be interpreted regularly (ii) and from a linguistic perspective, with the objective to model the relation between seV and V forms. Scientific questions that can be investigate during this postdoc are for instance: - how does a typology infered using observed valency of seV / V forms on large corpora compares to hand-craft typologies? - is it possible to detect syntactic and semantic characteristics correlated with the existence of a certain type of seV form?

Possible techniques are for instance distributional models acquired from large corpora, or clustering for seV classes induction.

For evaluation and boot-strapping for automatic classification, two datasets will be available: - the French part of the annotated corpora (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2282) used for the PARSEME shared task on automatic identification of verbal MWEs (https://typo.uni-konstanz.de/parseme/index.php/2-general/142-parseme-shared-task-on-automatic-detection-of-verbal-mwes) - a corpus of seV forms manually tagged with fine-grained classes (ongoing annotation by L. Barque, M. Candito and R. Huyghe)

Depending on the native language of the hired postdoctoral fellow, reflexives in other languages can also be studied.

The PARSEME-FR project

The PARSEME-FR project aims at improving linguistic representativeness, precision, robustness and computational efficiency of Natural Language Processing (NLP) applications, notably parsing of French. The project focuses on a major bottleneck of these applications: MultiWord Expressions (MWEs), that is, groups of words that must be treated as units at some level of linguistic processing, such as hot dog, hard disk, kick the bucket, United Nations and pay attention. It is a spin-off of PARSEME, an European IC1207 COST action on the same topic.

The postdoctoral fellow will work in particular with Marie Candito and Lucie Barque, at the Laboratoire de linguistique formelle, at Paris Diderot university, and with Richard Huyghe (Fribourg University) and other members of the project.

Profile

PhD in computational linguistics
Good knowledge of French and English
Excellent record of international publications
Capacity to work both independently and as part of a team
Computer scientist with strong interest in linguistic scientific questions, or linguist with good knowledge of machine learning techniques and statistical modeling

Contact information and applications

Enquiries and / or applications should be sent to Marie Candito (marie.candito at gmail.com), either in English or French.
Applications should contain an extended CV (mentioning the names and contact information of 2 to 3 references) and a cover letter.
The recruitment process may include interviews (possibly by video link).

Syntactic Parsing and Multiword Expressions in French

Sidebar

Table of Contents

Postdoctoral fellowship in computational linguistics

Topic: Data-driven modeling of reflexive verbal forms

The PARSEME-FR project

Profile

Contact information and applications

Syntactic Parsing and Multiword Expressions in French

User Tools

Site Tools

Sidebar

Table of Contents

Postdoctoral fellowship in computational linguistics

Topic: Data-driven modeling of reflexive verbal forms

The PARSEME-FR project

Profile

Contact information and applications

Page Tools