STACKIT CLOUD INFRASTRUCTURE SITE RELIABILITY ENGINEER (SRE)

IT
Barcelona, Spain
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Barcelona, Spain

Tech stack

Proxmox
Artificial Intelligence
Cloud Computing
Computer Programming
Data Centers
Linux
Document Management Systems
Infrastructure as a Service (IaaS)
Python
Open Source Technology
OpenStack
Release Management
Reliability Engineering
Prometheus
Virtualization Technology
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
System Availability
Grafana
Kubernetes
ELK
Go

Job description

Make an amazing climb in your career in an international team of experts. Our company provides technological services for the whole Schwarz group of more than 30 countries in Europe and the US. Our vision is to be the leading ecosystem for a better life. We built the European sovereign cloud STACKIT. With XM Cyber we set new standards in differing cyber crimes. We run AI better than anyone. With us you will find a variety of opportunities to grow and do your best at your calling - IT. We exist to improve life with our products and services - for today's generation and future generations. We act future proof! The impact you will create:

  • You will be a part of the Infrastructure-as-a-Service Site Reliability Engineering (IaaS SRE) team, helping to build a high-performance cloud platform based on OpenStack, with a focus on scaling and expansion across data centers and national borders
  • You will continuously operate and optimize technical processes through efficient automation and further development using Golang and/or Python
  • You will be responsible for and optimize the provisioning of bare-metal resources from various manufacturers using OpenStack Ironic and internal and/or Open-Source-based tools
  • You will operate and manage the surrounding Linux-based system landscapes (e.g., Kubernetes, Proxmox) and ensure the high availability of our cloud infrastructure
  • You will create and maintain documentation and be responsible for the implementation and maintenance of monitoring and logging (e.g., Prometheus, Grafana, ELK Stack) for stable platform operation
  • You will be part of a motivated team that constantly strives for improvement and continuously develops itself and its products

Requirements

  • You have a passion and enthusiasm for new technologies and topics related to Linux, automation, virtualization, and networking
  • You are proactive in driving improvements in availability and scaling and are eager to automate processes
  • You are capable of analyzing and solving technical problems and have experience in conducting root cause analyses
  • You have several years of experience in implementing and managing Kubernetes environments, including deployment and scaling
  • You have experience in programming and scripting in Python and/or Golang
  • You have experience in build and release management

Apply for this position