Senior Site Reliability Engineer Gpu & Ml Infrastructure H/F

Criteo SA

Paris, France

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Paris, France

Tech stack

C Sharp (Programming Language)

Cloud Engineering

Computer Clusters

Distributed Systems

Python

Machine Learning

Reliability Engineering

TensorFlow

Azure

Software Engineering

Data Processing

Graphics Processing Unit (GPU)

Deep Learning

Kubernetes

Low Latency

Machine Learning Operations

TensorRT

Job description

At Criteo, the Platform Core group builds the foundational infrastructure powering our global advertising platform. We design and operate large-scale, resilient systems supporting real-time decision-making and data processing across thousands of services.

As we expand our distributed computing and ML infrastructure capabilities, we are building a new team focused on GPU platforms and high-performance model serving technologies.

As a Site Reliability Engineer in the GPU team, you will help design, operate, and scale the infrastructure powering machine learning training and inference workloads.

You will work on technologies such as:

Ray on Kubernetes

Build and operate scalable Ray clusters running on Kubernetes.
Develop reliable self-service distributed computing platforms for ML workloads.
Improve provisioning, observability, reliability, and operational efficiency of ray-as-a-service environments.

NVIDIA Triton Inference Server

Operate and optimize large-scale inference platforms using Triton.
Improve latency, throughput, scalability, and GPU utilization for deep learning inference workloads.

You will collaborate closely with ML engineers, data scientists, and infrastructure teams to deliver reliable, production-grade ML platforms accelerating innovation across Criteo.

Requirements

5+ years of experience in backend engineering, Site Reliability Engineering, or platform engineering roles focused on distributed systems.

Strong experience with Kubernetes, including workload scheduling, dynamic provisioning, and custom controllers/operators.
Hands-on experience running or optimizing GPU-based workloads in production, ideally for ML training or inference systems.
Strong software engineering skills in C#, Python, Go, or similar languages, with a focus on building reliable distributed systems.
Experience building or operating production-grade infrastructure with strong requirements around performance, scalability, and reliability.
Strong interest in automation, observability, and designing systems that scale efficiently under high load.

Bonus Points

Experience with distributed ML frameworks such as Ray or similar systems.
Familiarity with inference serving stacks such as NVIDIA Triton or TensorRT.
Experience with GPU scheduling, resource management, or multi-tenant GPU platforms.
Exposure to cloud-native GPU orchestration (GKE, EKS, or on-prem Kubernetes GPU clusters).

About the company

We're Criteo, the Commerce Intelligence Platform. Criteo helps businesses turn shopper signals into commerce outcomes while delivering more relevant experiences for shoppers. We use proprietary commerce intelligence and AI decisioning to drive relevance for shoppers and performance for businesses. At Criteo, our culture is as unique as it is diverse. From our offices across the globe or from the comfort of home, our 3,600 Criteos collaborate together to build an open, impactful, and forward-thinking environment. We foster a workplace where everyone is valued, and employment decisions are based solely on skills, qualifications, and business needs-never on non-job-related factors or legally protected characteristics. What We Offer: Ways of working - Our hybrid model blends home with in-office experiences, making space for both. Grow with us - Learning, mentorship & career development programs. Your wellbeing matters - Health benefits, wellness perks & mental health support. A team that cares - Diverse, inclusive, and globally connected. Fair pay & perks - Attractive salary, with performance-based rewards and family-friendly policies, plus the potential for equity depending on role and level. Additional benefits may vary depending on the country where you work and the nature of your employment with Criteo.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all