Postdoc Massive collection and curation of monolingual and bilingual data


Faculty of Arts


University of Groningen

The Faculty of Arts is built on a long-standing tradition of four centuries. Our mission is to be a top-ranking faculty with both an excellent education and world-quality research, with a strong international orientation, firmly rooted in the North of the Netherlands. We build and share knowledge benefits to society. We work at a modern, broad and international institution, educating over 5.000 Dutch and international students to become forward-looking, articulate and independent academics. We are a team of hardworking and diverse group of 700 staff members.

The candidate will work in the Computational Linguistics research group (, @GroNlp), which is part of the Center for Language and Cognition Groningen (CLCG), the institutional home for all the linguistic research carried out within the Faculty of Arts.

Job description

Project: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages (MaCoCu)

MaCoCu is a European project under the Connecting Europe Facility (CEF) funding strategy. The project will collect monolingual and parallel data from the Internet and curate it to be used in downstream tasks. It is specially focused on under-resourced languages.

You will work in the local team in Groningen, in close collaboration with the teams of the other three partners of the project: University of Alicante, Prompsit Language Engineering and Jožef Stefan Institute.

Main tasks:

• build multilingual classifiers to identify subsets of the crawled data that are of relevance to the Digital Single Infrastructures: e-Health, e-Justice, etc.
• automatically identify whether text in parallel crawled data is original or translated
• conduct automatic extrinsic evaluations of the crawled data, by building machine translation systems and language models
• coordinate the human evaluation of the crawled data, including setting up evaluation tasks, liaising with evaluators and analyzing the results
• attend the meetings of the project
• participate in dissemination and outreach activities of the project.


• a completed PhD in Machine Translation, Natural Language Processing, or a related discipline
• experience building Natural Language Processing systems based on deep learning approaches, including classifiers and sequence-to-sequence models
• experience conducting human evaluations for Natural Language Processing systems
• an outstanding CV and list of publications.

Conditions of employment

Contract length: 21 months.

We offer you in accordance with the Collective Labour Agreement for Dutch Universities:

- a salary, depending on qualifications and work experience, with a minimum of € 2,790 to a maximum of € 4,402 (salary scale 10) gross per month for a full-time position
- an 8% holiday allowance and 8,3% end-of-year bonus and participation in a pension scheme for employees. Favorable tax agreements may apply to non-Dutch applicants. We offer 232 holiday hours per calendar year for a full-time employment
- an appointment on a temporary basis for 21 months, i.e. until 30 May 2023.

The new appointee will start on 1 September 2021

Job Application

You can fill in the form and upload 3 PDF files (all in English):

1. a letter of motivation
2. your CV, including your education and work experience, evidence of your involvement in the international research community, list of publications and names and contact details of two referees
3. one publication of which you are the main author. You can upload this under “extra” document” on the website.

You may apply for this position until 16 June 23:59 (Central European Summer Time - UTC+2) / before 17 June 2021 by means of the application form (click on "Apply" below on the advertisement on the university website).

Only complete applications submitted by the deadline will be taken into consideration.

The selection interviews will take place in June.

We are an equal opportunity employer that values diversity. We have adopted an active policy to increase the number of female scientists across all disciplines of the university. Therefore, women are encouraged to apply. Our selection procedure follows the guidelines of the Recruitment code (NVP), and European Commission's European Code of Conduct for recruitment of researchers,

Unsolicited marketing is not appreciated.

Additional information

For additional information, please contact:

Dr Antonio Toral

In your application, please always include the job opening ID 221360

Digital application form