Data Engineer

DeepL

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Tech stack

Airflow

Big Data

Information Engineering

Data Integrity

ETL

Data Warehousing

Python

Performance Tuning

SQL Databases

Data Streaming

Data Processing

Scripting (Bash/Python/Go/Ruby)

Snowflake

Spark

Backend

Data Lake

Information Technology

Real Time Data

Data Pipelines

Amazon Web Services (AWS)

Job description

This role is responsible for building and maintaining the data infrastructure. They ensure data from various sources is efficiently collected, processed, and stored, taking ownership of smaller projects and collaborating with senior engineers on larger initiatives., * Design, build, and maintain efficient data pipelines (ETL/ELT processes) using Apache Spark on AWS EMR to integrate data from various source systems into our data warehouse and data lake.

Build and optimize batch and near real-time data processing jobs on Spark (including performance tuning, partitioning, and cost control on EMR) to support analytics and reporting needs.
Write and refine complex SQL queries and use scripting (e.g., Python) to transform and aggregate large datasets.
Implement data quality measures (such as validation checks and cleansing routines) to ensure data integrity and reliability.
Develop and optimize data warehouse schemas and tables, and define contracts between Spark/EMR pipelines, Iceberg tables, Snowflake models, and dbt transformations.
Collaborate with data analysts, data scientists, and other engineers to understand data requirements and deliver appropriate solutions.
Document pipeline designs, data flows, and data definitions for transparency and future reference, adhering to team standards.
Handle multiple tasks or projects simultaneously, prioritizing work and communicating progress to stakeholders to meet deadlines.

Requirements

Bachelor's or Master's degree in a relevant field (e.g., Computer Science, Mathematics, Physics).
At least 3 years of experience in a data engineering or similar backend data development role.
Strong SQL skills and experience with data modeling and building data warehouse solutions.
Proficiency in at least one programming language (e.g., Python) for data processing and pipeline automation.
Familiarity with ETL tools and workflow orchestration frameworks (e.g., Apache Airflow or similar).
Experience implementing data quality checks and working with large-scale datasets.
Good problem-solving abilities, plus strong communication and teamwork skills to work with cross-functional stakeholders.

Benefits & conditions

Our workforce deserves fair and competitive pay that meets them where they are. With scalable benefits, rewards, and perks, our total rewards programs reflect our commitment to inclusivity and access for all.

About the company

Helping people overcome communication barriers is the heart of what we do. Founded in Germany in 2017 by a team of engineers and researchers, DeepL has developed the world’s most accurate AI translation technology—enabling real-time, human-sounding translation.

Accessible via a web translator, browser extensions, desktop and mobile apps, and an API, DeepL supports a best-in-class translation experience in 34 languages and counting. Our 550-person team operates across four European hubs in Germany, the Netherlands, the UK, and Poland.