Data Engineer

DeepL
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote

Tech stack

Airflow
Big Data
Information Engineering
Data Integrity
ETL
Data Warehousing
Python
Performance Tuning
SQL Databases
Data Streaming
Data Processing
Scripting (Bash/Python/Go/Ruby)
Snowflake
Spark
Backend
Data Lake
Information Technology
Real Time Data
Data Pipelines
Amazon Web Services (AWS)

Job description

This role is responsible for building and maintaining the data infrastructure. They ensure data from various sources is efficiently collected, processed, and stored, taking ownership of smaller projects and collaborating with senior engineers on larger initiatives., * Design, build, and maintain efficient data pipelines (ETL/ELT processes) using Apache Spark on AWS EMR to integrate data from various source systems into our data warehouse and data lake.

  • Build and optimize batch and near real-time data processing jobs on Spark (including performance tuning, partitioning, and cost control on EMR) to support analytics and reporting needs.
  • Write and refine complex SQL queries and use scripting (e.g., Python) to transform and aggregate large datasets.
  • Implement data quality measures (such as validation checks and cleansing routines) to ensure data integrity and reliability.
  • Develop and optimize data warehouse schemas and tables, and define contracts between Spark/EMR pipelines, Iceberg tables, Snowflake models, and dbt transformations.
  • Collaborate with data analysts, data scientists, and other engineers to understand data requirements and deliver appropriate solutions.
  • Document pipeline designs, data flows, and data definitions for transparency and future reference, adhering to team standards.
  • Handle multiple tasks or projects simultaneously, prioritizing work and communicating progress to stakeholders to meet deadlines.

Requirements

  • Bachelor's or Master's degree in a relevant field (e.g., Computer Science, Mathematics, Physics).
  • At least 3 years of experience in a data engineering or similar backend data development role.
  • Strong SQL skills and experience with data modeling and building data warehouse solutions.
  • Proficiency in at least one programming language (e.g., Python) for data processing and pipeline automation.
  • Familiarity with ETL tools and workflow orchestration frameworks (e.g., Apache Airflow or similar).
  • Experience implementing data quality checks and working with large-scale datasets.
  • Good problem-solving abilities, plus strong communication and teamwork skills to work with cross-functional stakeholders.

Benefits & conditions

Our workforce deserves fair and competitive pay that meets them where they are. With scalable benefits, rewards, and perks, our total rewards programs reflect our commitment to inclusivity and access for all.

About the company

Helping people overcome communication barriers is the heart of what we do. Founded in Germany in 2017 by a team of engineers and researchers, DeepL has developed the world’s most accurate AI translation technology—enabling real-time, human-sounding translation.

Accessible via a web translator, browser extensions, desktop and mobile apps, and an API, DeepL supports a best-in-class translation experience in 34 languages and counting. Our 550-person team operates across four European hubs in Germany, the Netherlands, the UK, and Poland.

Apply for this position