Data Engineer Python

KLEEVER
Paris, France
31 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
€ 125K

Job location

Paris, France

Tech stack

Artificial Intelligence
Airflow
Automation of Tests
Batch Processing
Big Data
Code Coverage
Code Review
Databases
Continuous Integration
Data as a Services
Information Engineering
ETL
Data Retrieval
Data Stores
Relational Databases
Fault Tolerance
Python
PostgreSQL
Modular Design
NoSQL
Prometheus
Software Engineering
SQL Databases
Systems Integration
Data Storage Technologies
Azure
GIT
Pytest
Information Technology
Cassandra
Amazon Web Services (AWS)
Software Version Control
Data Pipelines

Job description

As a Senior Data Engineer, you will design, build, and maintain scalable data pipelines and workflows to support our growing data ecosystem. You will focus on creating production-ready ETL processes using Apache Airflow, integrating with diverse data stores, and ensuring all code meets rigorous development standards, including peer review, scalability, and comprehensive test coverage., Develop and optimize ETL pipelines using Apache Airflow to ingest, transform, and load data from various sources into target systems.

Implement production-ready code for data workflows, ensuring scalability, fault tolerance, and adherence to best practices such as modular design, error handling, and automated testing (unit, integration, and end-to-end).

Collaborate with data scientists, analysts, and engineering teams to build and maintain RAG pipelines that enhance AI/ML applications with accurate, context-aware data retrieval.

Participate in code reviews to enforce high coding standards, promote clean, readable code, and integrate CI/CD practices for automated testing and deployment.

Monitor and troubleshoot data pipelines for performance, reliability, and data quality, implementing observability tools to detect and resolve issues proactively.

Design and optimize data storage solutions, integrating with relational and NoSQL databases to support real-time and batch processing needs.

Requirements

Do you have experience in Python?, Do you have a Bachelor's degree?, The ideal candidate is a proficient developer who treats data engineering as software engineering, with hands-on experience in RAG (Retrieval-Augmented Generation) pipelines and a track record of delivering reliable, maintainable systems., Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).

5+ years of hands-on experience as a Data Engineer or in a similar role, with a proven background as a strong developer (e.g., proficiency in Python, SQL, and related languages).

Excellent proficiency with Apache Airflow for orchestrating complex ETL workflows, including DAG creation, scheduling, and dependency management.

Demonstrated experience building scalable ETL pipelines that handle large datasets, with a focus on production-ready implementation including comprehensive test coverage (e.g., using pytest or similar frameworks).

Strong emphasis on software engineering practices: Experience with peer code reviews, version control (e.g., Git), and ensuring code is modular, documented, and scalable to prevent common pitfalls like brittle or unmaintainable pipelines.

Familiarity with data modeling, transformation, and integration in distributed environments.

Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.

Preferred Qualifications:

Experience with RAG pipelines, including vector databases and embedding techniques for AI-driven applications.

Hands-on experience with databases such as PostgreSQL (for relational data), StarRocks (for analytical workloads), Cassandra or ScyllaDB (for high-throughput NoSQL), and Qdrant (for vector search).

Knowledge of cloud data services (e.g., AWS Glue, Azure Data Factory) and orchestration tools beyond Airflow.

Familiarity with monitoring and observability tools like Prometheus or OpenSearch for data pipeline health.

Apply for this position