Data AI engineer (All Genders)
Role details
Job location
Tech stack
Job description
At Dailymotion, we're building AI-powered products that rely on high-quality, well-governed data-from advertising optimization and measurement to trust & safety and internal decision-making tools., We are looking for an AI Data Engineer to design and operate the data pipelines, datasets, and evaluation workflows that make our AI systems reliable, reproducible, and measurable in production.
In this role, you'll work at the intersection of data engineering, ML/LLM product needs, and platform reliability, partnering closely with AI/ML engineers, product engineers, and analytics stakeholders. You'll help ensure that the data powering training, retrieval (RAG), evaluation, and monitoring is curated, traceable, and secure-so teams can ship AI features with confidence.
We offer challenging work with direct business impact, opportunities to learn and grow, and a collaborative culture that encourages every team member to bring their point of view to the table.
Responsibilities
- Build and maintain reliable data pipelines (batch and/or streaming) that power AI features, evaluation, and analytics use cases.
- Develop curated datasets and feature tables for training and evaluation; implement validation checks, lineage, and clear ownership.
- Support knowledge ingestion for RAG: document processing, chunking, metadata enrichment, indexing/backfills, and freshness monitoring.
- Implement and operate evaluation data workflows: golden sets, labeling support, drift checks, regression reporting, and dataset versioning.
- Collaborate with AI/ML and product engineers to translate requirements into scalable data models and pipelines.
- Improve pipeline performance and cost efficiency through incremental processing, partitioning, and resource tuning on GCP.
Contribute to operational excellence: monitoring, alerting, troubleshooting, and
Requirements
Qualifications - Required Skills
- Professional English proficiency is required (written and spoken).
- Solid experience building production data pipelines and data models (ETL/ELT concepts, orchestration, reliability, and data quality).
- Strong SQL skills and practical knowledge of warehouse/lake patterns and debugging data issues.
- Proficiency in Golang and/or Python for data processing and pipeline development (depending on your internal standards).
- Understanding of AI/data needs: reproducibility, dataset versioning, evaluation data management, and secure handling of sensitive data.
- Ability to deliver well-tested, well-documented work (CI for data jobs, unit/integration tests for transforms).
Preferred Skills
- Experience with GCP-native data tooling (e.g., BigQuery, Dataflow, Pub/Sub, GCS-adapt to what you actually use).
- Familiarity with embedding pipelines and indexing operations used for retrieval (backfills, monitoring, freshness SLAs).
- Experience with orchestration tools and data quality frameworks, and setting up meaningful SLAs/alerts for pipelines.
- Good operational habits: on-call readiness, cost awareness, and pragmatic automation.
- Experience using AI-powered developer tools (e.g., ChatGPT, Claude, GitHub Copilot) for coding assistance, debugging, and test generation.