Machine Learning Research Engineer in Natural Language Processing and Media Mining

epfl

12 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, French, German

Experience level

Junior

Compensation

CHF 95K

Job location

Tech stack

3d Models

API

Artificial Intelligence

Amazon Web Services (AWS)

Computer Vision

Unix

Cloud Storage

Databases

Github

Information Extraction

Python

Machine Learning

Natural Language Processing

NoSQL

SQL Databases

Text Mining

Web Applications

Deep Learning

Kubernetes

Information Technology

HuggingFace

Job description

The Impresso - Media Monitoring of the Past II project at the EPFL Digital Humanities Laboratory (DHLAB) is seeking a machine learning research engineer for the final phase of the project. The successful candidate will integrate into an active, collaborative development effort and contribute to the application and consolidation of large-scale text mining pipelines for multilingual historical newspaper and radio archives, bridging research, engineering, and digital humanities About the project Impresso is an interdisciplinary research project that brings together computational linguists, computer scientists, digital humanists, historians, and designers from EPFL, the University of Zurich, the University of Lausanne, and the C²DH (Luxembourg), along with over 20 European partners. Funded by the Swiss National Science Foundation and the Luxembourg National Research Fund (2023-2027), the project pioneers new methods for exploring digitized newspaper and radio archives across languages, media, and borders through semantic enrichments and shared multilingual vector spaces Mission You will conduct applied research and engineering in natural language processing and text mining on large-scale, noisy, and multilingual historical texts. Working within an established and actively maintained codebase, you will help advance and consolidate the project's processing pipelines, bridging research and engineering in close collaboration with an interdisciplinary team. Main duties and responsibilities

Apply and adapt existing NLP and computer vision models to large-scale, multilingual historical text and image data.
Fine-tune or design models for additional text mining tasks, in particular media section classification.
Support the creation of ground truth data by adapting the setup of web-based annotation tools, and assist in the management of annotation campaigns and data releases.
Contribute to the maintenance and adaptation of web-serving setups for annotation models (TorchServe).
Support the consolidation, validation, and documentation of existing data, pipeline components, and code modules.

Additional activities (optional / depending on profile)

Collaborate on the design of Impresso WebApp, Datalab and API
Participate in the development and adoption of standards for the representation and exchange of historical data (raw material and annotations)
Contribute to scientific publications and project workshops on media mining, semantic indexing, and sustainability

Requirements

Do you have experience in SQL?, Do you have a Master's degree?, * Experience: 1-3 years as a machine learning engineer or NLP researcher/programmer

Education: MSc or PhD in NLP, Computer Science, Data Science, or a related field, or equivalent professional experience in machine learning/NLP
Technical skills:

Solid expertise in machine learning, with practical experience in deep learning architectures (transformers, language models) and information extraction tasks
Proficiency in Python, Unix-based systems, databases (SQL/NoSQL), cloud storage and computing (S3, Kubernetes, Run:AI), and scripting/automation
Familiarity with collaborative development and code/model management platforms (GitHub, Hugging Face, and related tools)

Mindset: Curious, creative, rigorous, and attentive to detail; motivated by scientific research and cultural heritage applications, with a proactive and problem-solving attitude
Strong sense of teamwork, communication, accountability, and production readiness
Very good command of written and spoken English

Desirable skills

Prior experience in an academic or research context
Experience with historical or digitized documents and interdisciplinary collaboration
Experience with image processing alongside text and language data is a plus
Interest in student supervision and academic publication
Knowledge of French or German