STACKIT CLOUD INFRASTRUCTURE SITE RELIABILITY ENGINEER (SRE)

Barcelona, Spain

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Barcelona, Spain

Tech stack

Proxmox

Artificial Intelligence

Cloud Computing

Computer Programming

Data Centers

Linux

Document Management Systems

Infrastructure as a Service (IaaS)

Python

Open Source Technology

OpenStack

Release Management

Reliability Engineering

Prometheus

Virtualization Technology

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

System Availability

Grafana

Kubernetes

ELK

Job description

Make an amazing climb in your career in an international team of experts. Our company provides technological services for the whole Schwarz group of more than 30 countries in Europe and the US. Our vision is to be the leading ecosystem for a better life. We built the European sovereign cloud STACKIT. With XM Cyber we set new standards in differing cyber crimes. We run AI better than anyone. With us you will find a variety of opportunities to grow and do your best at your calling - IT. We exist to improve life with our products and services - for today's generation and future generations. We act future proof! The impact you will create:

You will be a part of the Infrastructure-as-a-Service Site Reliability Engineering (IaaS SRE) team, helping to build a high-performance cloud platform based on OpenStack, with a focus on scaling and expansion across data centers and national borders
You will continuously operate and optimize technical processes through efficient automation and further development using Golang and/or Python
You will be responsible for and optimize the provisioning of bare-metal resources from various manufacturers using OpenStack Ironic and internal and/or Open-Source-based tools
You will operate and manage the surrounding Linux-based system landscapes (e.g., Kubernetes, Proxmox) and ensure the high availability of our cloud infrastructure
You will create and maintain documentation and be responsible for the implementation and maintenance of monitoring and logging (e.g., Prometheus, Grafana, ELK Stack) for stable platform operation
You will be part of a motivated team that constantly strives for improvement and continuously develops itself and its products

Requirements

You have a passion and enthusiasm for new technologies and topics related to Linux, automation, virtualization, and networking
You are proactive in driving improvements in availability and scaling and are eager to automate processes
You are capable of analyzing and solving technical problems and have experience in conducting root cause analyses
You have several years of experience in implementing and managing Kubernetes environments, including deployment and scaling
You have experience in programming and scripting in Python and/or Golang
You have experience in build and release management