Site reliability engineer (sre) ai infrastructure (early career)

Nebius
Amsterdam, Netherlands
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Amsterdam, Netherlands

Tech stack

Artificial Intelligence
Computing Platforms
JIRA
C++
Cloud Computing
Configuration Management
Computer Programming
Computer Networks
Ethernet
Internet Protocol
Python
Linux kernel
Machine Learning
Routing
Reliability Engineering
AI Infrastructure
Cloud Platform System
GIT
Kubernetes
Terraform

Job description

Launch your career in site reliability engineering with Nebius through this 3-month Early Talent Program in Amsterdam. This opportunity is designed for current university students, recent graduates, and early career professionals who want to gain hands-on experience supporting AI infrastructure and cloud systems in a real production environment.

As a Site Reliability Engineer (SRE) in the AI Infrastructure team, you will contribute to day-to-day operational work while learning how large-scale infrastructure is maintained, improved, and automated. You will assist with routine SRE tasks, help deploy tested and approved changes by following clear instructions, and support the reliability and performance of systems that power modern AI workloads.

In addition to operational support, you will work on small, well-defined projects and backlog tasks that help improve infrastructure and internal processes. You will create tests for your changes when applicable, write technical documentation, and track your progress through Jira. This role offers practical exposure to both engineering operations and project-based work, helping you develop a strong understanding of how reliability engineering functions in a fast-moving technical environment.

A key part of the program is learning and development. You will deepen your understanding of networking and systems fundamentals, including Ethernet, IP networks, routing, the Linux kernel, eBPF, traffic control, IPVS, and DPDK. You will also gain exposure to container and platform technologies such as Kubernetes, Helm, and Terraform, while learning internal tools, workflows, configuration management, and automation practices used by experienced infrastructure teams.

Requirements

This role is ideal for candidates who have a strong interest in cloud providers, operations, networking, hardware, and software. You should have programming experience in Python, Go, or C++, a basic understanding of networking concepts such as Ethernet, IP, TCP/UDP, and routing, and familiarity with Git. A basic understanding of containers, Kubernetes, and Infrastructure as Code is also valuable for success in this role.

During the program, you will receive mentorship from experienced professionals in AI, machine learning, and cloud infrastructure. You will gain hands-on experience with real customer workloads and production systems, while building practical skills in reliability engineering, automation, infrastructure operations, and modern platform technologies.

In addition to paid compensation, Nebius offers a collaborative and supportive work environment that values initiative, innovation, and continuous learning. High-performing participants may also have the opportunity to be considered for a full-time role after completing the Early Talent Program.

Apply for this position