Senior Research Data Engineer - Foundation Models

DeepL
Charing Cross, United Kingdom
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Training Data
Airflow
Amazon Web Services (AWS)
Data analysis
Big Data
C++
Cloud Computing
Data Centers
Information Engineering
ETL
Data Systems
Software Debugging
Python
Natural Language Processing
Open Source Technology
Software Engineering
Unstructured Data
Reinforcement Learning
Data Processing
Large Language Models
Build Management
Kubernetes
Dask
Free and Open-Source Software
Celery
Document Classification
Data Pipelines

Job description

  • Work on ambitious frontier research projects as part of a team consisting of research scientists and research data engineers.
  • Architect, design and build data pipelines that can handle petabytes of multi-modal unstructured data.
  • Build a modern data engineering stack grounded in state-of-the-art technology for orchestration and parallel computation, and make extensive use of actively developing open-source solutions.
  • From the lowest levels of components to the birds-eye view of a system - find performance bottlenecks, debug issues, and create pipelines with a focus on stability.
  • Leverage our large on-prem data centers and AWS cloud infrastructure for blazing data processing.
  • Go beyond "Big Data" and ETL, and engineer and operate complex Python data solutions for real-world unstructured data incl. text, code, image and audio modalities.
  • Collaborate with stakeholders, research scientists, other research data engineers and data tooling and platform teams.
  • Raise the standard for excellence and act as owner and champion for the quality and availability of our foundation model training data.
  • Ensure mission-critical reliability of data pipeline jobs, and maintain high quality code.

Play to your strengths and contribute with creativity, thoroughness, pragmatism, foresight, ingenuity, persistence, and every part of you that elevates the team.

Requirements

Do you have experience in UX?, Do you have a Master's degree?, * Professional experience in data, platform or software engineering, ideally with a focus on large-scale unstructured data.

  • Python: Extensive professional experience in Python software engineering. Ideally, experience in maintaining proprietary or open-source software products.
  • Data: Experience with exploratory data analysis, cleaning, validation and quality control beyond business intelligence and analytics scale.
  • Pipelines: Experience with building reproducible pipelines for storing and processing petabytes of data.
  • Operations: Proficiency in containerization and automatic deployment. Ideally, experience with container orchestration with kubernetes and cloud infrastructure.
  • Scaling: Experience with highly scalable, parallel compute workloads (e.g., Dask, Ray, Celery).
  • Performance: Experience with writing and optimizing highly performant code.
  • Cross-functional Affinity: Ability to work directly with our researchers and engineering stakeholders to translate their needs into data products with the desired user experience and performance.
  • Soft Skills: Excellent problem-solving abilities, strong communication skills, and a collaborative mindset.

Ideally, you have domain-specific experiences:

  • LLM or VLM training data preparation.
  • NLP, text classification, reinforcement learning, model-based/GPU workflows.
  • Dynamic workflow orchestration frameworks like Argo Workflows, Airflow, Dagster or Flyte.
  • Linguistics expertise or speaking multiple languages.
  • Experience in a high-performance programming language like C++, Go or Rust.

About the company

Helping people overcome communication barriers is the heart of what we do. Founded in Germany in 2017 by a team of engineers and researchers, DeepL has developed the world’s most accurate AI translation technology—enabling real-time, human-sounding translation.

Accessible via a web translator, browser extensions, desktop and mobile apps, and an API, DeepL supports a best-in-class translation experience in 34 languages and counting. Our 550-person team operates across four European hubs in Germany, the Netherlands, the UK, and Poland.

Apply for this position