Site Reliability Engineer (SRE)

Software AG

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

€ 44K

Job location

Remote

Tech stack

Java

Amazon Web Services (AWS)

Azure

Bash

Cloud Computing

Cloud Computing Security

Configuration Management

Databases

Continuous Delivery

Continuous Integration

DevOps

Distributed Systems

DNS

Identity and Access Management

Python

Network Security

Linux System Administration

Open Source Technology

PCI Data Security Standards

Reliability Engineering

Ansible

Prometheus

Software Engineering

Google Cloud Platform

Load Balancing

Grafana

Reliability of Systems

Firewalls (Computer Science)

Infrastructure as Code (IaC)

Kubernetes

Puppet

Docker

Job description

We are seeking an experienced Site Reliability Engineer (SRE) (m/f/d) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operational excellence.

You will work very closely with our product development team from an early stage of design to all the way helping resolve any production incidents for production services and influencing them with SRE principles and best practices. If you take pride in complete ownership, have a passion for solving complex technical challenges for distributed systems and demeanor to work and communicate effectively across team boundaries, this is the role for you!, * Design, implement and maintain scalable and reliable infrastructure.

Collaborate with engineering and product teams to integrate observability, reliability, and security considerations into the entire software development lifecycle.
Develop and implement automation tools for monitoring, deployment, and incident response to ensure efficient and reliable operations.
Lead and participate in post-incident reviews to learn from operational surprises and driving actionable improvements to system reliability.
Proactively identify and resolve performance bottlenecks and system issues.
Conduct regular security assessments and audits to mitigate risks.
Champion and embed a culture of reliability across the organization. You will act as a force multiplier, scaling your technical expertise by creating clear documentation, developing best-practice guides, and building tooling to roll out reliability enhancements automatically.
Implement and manage Infrastructure as Code (IaC) using Ansible and other industry-standard tools.
Implement and enforce cloud security best practices, including identity and access management (IAM), encryption, and network security.
Develop dashboards and alerts to ensure real-time visibility into system operations.
Stay updated with emerging cloud technologies and recommend improvements to existing systems.

Requirements

3+ years of experience as a Site Reliability Engineer (SRE), Systems, DevOps Engineer or similar role supporting business-critical services.
English Level: B2 minimum
Expertise with Linux system administration and networking technologies (DNS, firewalls, load-balancing).
Good knowledge in creating, managing and troubleshooting containers, engines (Docker, Podman) and related cloud native ecosystem tools.
Knowledge of database operations and concepts.
Knowledgeable about a wide range of web, internet and cloud technologies.
Understand distributed systems, their common failure modes and edge cases.
Proficient in at least one programming language (Bash, Python, Java, Go etc).
Hands on experience with Configuration Management (Ansible, Puppet etc), Infrastructure as Code (IaC) and Continuous Integration / Continuous Delivery (CI/CD).
Familiarity with open source observability and telemetry tooling for logs, metrics, and traces, including Grafana, Prometheus and OpenTelemetry.
Excellent problem-solving and analytical skills. You can calmly navigate complex production issues, identify root causes, and implement effective, lasting solutions.
Possess a growth mindset. You are relentlessly curious, committed to continuous improvement, and passionate about scaling your expertise.
Excellent communication & collaboration skills and a proven ability to build relationships with and educate engineering partners.

Bonus Skills

Familiarity with industry standard compliance requirements (ISO/IEC 27001, PCI-DSS, NIST CSF etc).
Experience with container orchestration systems like Kubernetes.
Experience with cloud platforms (Azure, AWS, GCP etc)
C1+ English level

Benefits & conditions

We offer you a competitive compensation plus these benefits:

Flexible working hours (Flexitime) and the freedom to regularly work from home or our great office
Free snacks and beverages in the office
Regular team events
A modern office environment which fits our innovative mindset and enables us to collaborate with our diverse team
Mental health counselling and tips from NiloHealth
Home Office set up budget
25 vacation days annually, plus local holidays
Additional day off for your birthday

Following the guidelines of the Austrian collective bargaining agreement for IT, we offer a minimum monthly gross salary of € 3.175,- depending on your experience and credentials in Austria.

About the company

Software AG (Frankfurt MDAX: SOW) reimagines integration, sparks business transformation and enables fast innovation on the Internet of Things so you can pioneer differentiating business models. We give you the freedom to connect and integrate any technology from app to edge. We help you free data from silos so it’s shareable, usable and powerful - enabling you to make the best decisions and unlock entirely new possibilities for growth. Software AG has nearly 5,000 employees and is active in 70 countries