Generative AI Engineer
DeepRec
2 days ago
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English, Spanish Experience level
IntermediateJob location
Remote
Tech stack
Artificial Intelligence
Amazon Web Services (AWS)
Automation of Tests
Cloud Computing
Python
Machine Learning
Open Source Technology
Software Reliability Testing
Management of Software Versions
PyTorch
Large Language Models
Multi-Agent Systems
Generative AI
GIT
Information Technology
HuggingFace
Docker
Data Generation
Job description
Founded in 2019, our client had grown into one of Europe's most recognized deep-tech scale-ups, backed by major global strategic investors and EU innovation funds.
Their quantum and AI technologies had already transformed how enterprise clients built and deployed intelligent systems - achieving up to 95% model compression and 50-80% inference cost reduction.
The company was recognized by CB Insights (2023 & 2025) as one of the Top 100 most promising AI companies globally, often described as a "quantum-AI unicorn in the making."
Role Highlights
The AI Evaluation Data Scientist was responsible for:
- Designing and leading evaluation strategies for Agentic AI and RAG systems, translating complex workflows into measurable performance metrics.
- Developing multi-step task-based evaluations to capture reasoning quality, factual accuracy, and end-user success in real-world scenarios.
- Building reproducible evaluation pipelines with automated test suites, dataset tracking, and performance versioning.
- Curating and generating synthetic and adversarial datasets to strengthen system robustness.
- Implementing LLM-as-a-judge frameworks aligned with human feedback.
- Conducting error analysis and ablations to identify reasoning gaps, hallucinations, and tool-use failures.
- Collaborating with ML engineers to create a continuous data flywheel linking evaluation outcomes to product improvements.
- Defining and monitoring operational metrics such as latency, reliability, and cost to meet production standards.
- Maintaining high standards in engineering, documentation, and reproducibility.
Requirements
- Master's or Ph.D. in Computer Science, Machine Learning, Physics, Engineering, or related field.
- 3+ years (mid-level) or 5+ years (senior) of experience in Data Science, ML Engineering, or Research roles in applied AI/ML projects.
- Proven experience designing and implementing evaluation methodologies for machine learning or Generative AI systems.
- Hands-on experience with LLMs, RAG pipelines, and agentic architectures.
- Proficiency in Python, Git, Docker, and major ML frameworks (PyTorch, HuggingFace, LangGraph, LlamaIndex).
- Familiarity with cloud environments (AWS preferred).
- Excellent communication skills and fluency in English.
Preferred
- Ph.D. in a relevant technical discipline.
- Experience with synthetic data generation, adversarial testing, and multi-agent evaluation frameworks.
- Strong background in LLM error analysis and reliability testing.
- Open-source contributions or publications related to AI evaluation.
- Fluency in Spanish.