Site reliability engineer (sre) ai infrastructure (early career)

Nebius

Amsterdam, Netherlands

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Amsterdam, Netherlands

Tech stack

Artificial Intelligence

Computing Platforms

JIRA

C++

Cloud Computing

Configuration Management

Computer Programming

Computer Networks

Ethernet

Internet Protocol

Python

Linux kernel

Machine Learning

Routing

Reliability Engineering

AI Infrastructure

Cloud Platform System

GIT

Kubernetes

Terraform

Job description

Launch your career in site reliability engineering with Nebius through this 3-month Early Talent Program in Amsterdam. This opportunity is designed for current university students, recent graduates, and early career professionals who want to gain hands-on experience supporting AI infrastructure and cloud systems in a real production environment.

As a Site Reliability Engineer (SRE) in the AI Infrastructure team, you will contribute to day-to-day operational work while learning how large-scale infrastructure is maintained, improved, and automated. You will assist with routine SRE tasks, help deploy tested and approved changes by following clear instructions, and support the reliability and performance of systems that power modern AI workloads.

In addition to operational support, you will work on small, well-defined projects and backlog tasks that help improve infrastructure and internal processes. You will create tests for your changes when applicable, write technical documentation, and track your progress through Jira. This role offers practical exposure to both engineering operations and project-based work, helping you develop a strong understanding of how reliability engineering functions in a fast-moving technical environment.

A key part of the program is learning and development. You will deepen your understanding of networking and systems fundamentals, including Ethernet, IP networks, routing, the Linux kernel, eBPF, traffic control, IPVS, and DPDK. You will also gain exposure to container and platform technologies such as Kubernetes, Helm, and Terraform, while learning internal tools, workflows, configuration management, and automation practices used by experienced infrastructure teams.

Requirements

This role is ideal for candidates who have a strong interest in cloud providers, operations, networking, hardware, and software. You should have programming experience in Python, Go, or C++, a basic understanding of networking concepts such as Ethernet, IP, TCP/UDP, and routing, and familiarity with Git. A basic understanding of containers, Kubernetes, and Infrastructure as Code is also valuable for success in this role.

During the program, you will receive mentorship from experienced professionals in AI, machine learning, and cloud infrastructure. You will gain hands-on experience with real customer workloads and production systems, while building practical skills in reliability engineering, automation, infrastructure operations, and modern platform technologies.

In addition to paid compensation, Nebius offers a collaborative and supportive work environment that values initiative, innovation, and continuous learning. High-performing participants may also have the opportunity to be considered for a full-time role after completing the Early Talent Program.