Senior Machine Learning Engineer
Role details
Job location
Tech stack
Job description
Do you enjoy building robust APIs and scalable pipelines to operationalize model evaluation? Do you want to help product teams get fast, reliable feedback on their AI outputs through automation?
About our Team Elsevier's AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions used across multiple product lines. We partner with Product, Technology, Domain SMEs, and Governance to ensure our AI features are safe, effective, and continuously improving., As a Senior Machine Learning Engineer, you will build and maintain the infrastructure and APIs that power automated evaluation of AI products. You'll ensure evaluations are scalable, reliable, and integrated into product development workflows, enabling product teams to quickly assess model outputs and iterate on their features., · API & platform development - Build and maintain evaluation APIs and backend services to run automated assessments.
· Pipeline orchestration - Develop scalable Python/SQL pipelines, integrate with CI/CD, and implement monitoring/logging for evaluation jobs.
· Infrastructure & reliability - Ensure reproducibility, version control, observability, and error handling across evaluation workflows.
· Collaboration - Work closely with fellow Data Scientists, SMEs, Product, and Engineering teams to operationalize metrics and evaluation processes.
· Automation & tooling - Support auto-assessments as first-pass evaluation and integration with downstream SME-evals.
Requirements
· Education/Experience: Master's + 3 years, or Bachelor's + 5 years, in CS, Data Engineering, Software Engineering, or related field; experience building production ML pipelines.
· Technical: Strong Python (FastAPI/Flask), SQL, cloud platforms (AWS /Azure / Databricks); orchestration frameworks (Airflow, Prefect, Dagster); containerization (Docker/K8s); CI/CD pipelines; logging and monitoring.
· Practices: Git, reproducibility, documentation; collaborative coding and design review.
· Communication: Ability to explain technical choices and results to non-technical stakeholders.
· Mindset: Ownership, bias-for-action, curiosity, and collaborative problem-solving.
Nice to have
· Experience with LLM/NLP evaluation pipelines or agentic systems.
· Familiarity with auto-assessment frameworks and multi-product evaluation scaling.
· Exposure to healthcare or regulated content doma