Senior Site Reliability Engineer (Core GPU)

Criteo SA
Paris, France
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Paris, France

Tech stack

C Sharp (Programming Language)
Distributed Computing Environment
Distributed Systems
Python
Machine Learning
Reliability Engineering
Data Processing
Deep Learning
Kubernetes
Information Technology
TensorRT
Go

Job description

At Criteo, our Platform Core group builds the foundational services that power our global advertising platform. We design and operate scalable, resilient systems that support real-time decision-making and data processing at massive scale.

As we expand our capabilities in high-performance inference and distributed computing, we're forming a new team focused on GPU-powered services and cutting-edge ML serving technologies.

What You'll Do

As a Site Reliability Engineer in this new team, you'll be at the forefront of building and operating GPU-powered services for machine learning workloads.

Your mission will be to ensure the reliability, scalability, and performance of our systems that leverage:

  • Ray: You'll manage on-demand provisioning of Ray clusters on Kubernetes, enabling scalable distributed computing as a service for ML training and inference. You'll design, maintain, and monitor these ray-as-a-service systems, and deliver these capabilities as robust, self-service platform offerings.

  • Nvidia Triton Inference Server: You'll optimize and operate high-performance inference services using Triton, ensuring low-latency and high-throughput serving of deep learning models.

You'll work closely with ML engineers, data scientists, and other infrastructure teams to deliver production-grade services that accelerate innovation across Criteo.

Requirements

  • Master's or PhD in Computer Science (or equivalent experience).

  • 5+ years in backend engineering, SRE or DevOps.

  • Strong experience with Kubernetes, especially in dynamic provisioning and custom operators.

  • Hands-on experience with GPU workloads, ideally in ML training or inference contexts.

  • Solid programming skills in C#, Python, Go, or similar languages.

  • Passion for automation, observability, and building reliable services.

Bonus Points

  • Familiarity with Ray or other distributed computing frameworks.

  • Knowledge of Nvidia Triton, TensorRT, or similar inference serving technologies.

  • Familiarity with cloud-native GPU orchestration (e.g., GKE, EKS, or on-prem equivalents).

Benefits & conditions

Ways of working - Our hybrid model blends home with in-office experiences, making space for both. Grow with us - Learning, mentorship & career development programs. Your wellbeing matters - Health benefits, wellness perks & mental health support. A team that cares - Diverse, inclusive, and globally connected. Fair pay & perks - Attractive salary, with performance-based rewards and family-friendly policies, plus the potential for equity depending on role and level.

About the company

We're Criteo, the Commerce Intelligence Platform. Criteo helps businesses turn shopper signals into commerce outcomes while delivering more relevant experiences for shoppers. We use proprietary commerce intelligence and AI decisioning to drive relevance for shoppers and performance for businesses. At Criteo, our culture is as unique as it is diverse. From our offices across the globe or from the comfort of home, our 3,600 Criteos collaborate together to build an open, impactful, and forward-thinking environment.

Apply for this position