Senior Machine Learning Engineer

Elsevier

Amsterdam, Netherlands

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Amsterdam, Netherlands

Tech stack

API

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Azure

Continuous Integration

Information Engineering

Python

Machine Learning

Software Engineering

SQL Databases

Data Logging

Flask

Large Language Models

GIT

FastAPI

Containerization

Kubernetes

Software Version Control

Docker

Databricks

Microservices

Job description

Do you enjoy building robust APIs and scalable pipelines to operationalize model evaluation? Do you want to help product teams get fast, reliable feedback on their AI outputs through automation?

About our Team Elsevier's AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions used across multiple product lines. We partner with Product, Technology, Domain SMEs, and Governance to ensure our AI features are safe, effective, and continuously improving., As a Senior Machine Learning Engineer, you will build and maintain the infrastructure and APIs that power automated evaluation of AI products. You'll ensure evaluations are scalable, reliable, and integrated into product development workflows, enabling product teams to quickly assess model outputs and iterate on their features., · API & platform development - Build and maintain evaluation APIs and backend services to run automated assessments.

· Pipeline orchestration - Develop scalable Python/SQL pipelines, integrate with CI/CD, and implement monitoring/logging for evaluation jobs.

· Infrastructure & reliability - Ensure reproducibility, version control, observability, and error handling across evaluation workflows.

· Collaboration - Work closely with fellow Data Scientists, SMEs, Product, and Engineering teams to operationalize metrics and evaluation processes.

· Automation & tooling - Support auto-assessments as first-pass evaluation and integration with downstream SME-evals.

Requirements

· Education/Experience: Master's + 3 years, or Bachelor's + 5 years, in CS, Data Engineering, Software Engineering, or related field; experience building production ML pipelines.

· Technical: Strong Python (FastAPI/Flask), SQL, cloud platforms (AWS /Azure / Databricks); orchestration frameworks (Airflow, Prefect, Dagster); containerization (Docker/K8s); CI/CD pipelines; logging and monitoring.

· Practices: Git, reproducibility, documentation; collaborative coding and design review.

· Communication: Ability to explain technical choices and results to non-technical stakeholders.

· Mindset: Ownership, bias-for-action, curiosity, and collaborative problem-solving.

Nice to have

· Experience with LLM/NLP evaluation pipelines or agentic systems.

· Familiarity with auto-assessment frameworks and multi-product evaluation scaling.

· Exposure to healthcare or regulated content doma