Site Reliability Engineer (SRE)
Role details
Job location
Tech stack
Job description
We are seeking an experienced Site Reliability Engineer (SRE) (m/f/d) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operational excellence.
You will work very closely with our product development team from an early stage of design to all the way helping resolve any production incidents for production services and influencing them with SRE principles and best practices. If you take pride in complete ownership, have a passion for solving complex technical challenges for distributed systems and demeanor to work and communicate effectively across team boundaries, this is the role for you!, * Design, implement and maintain scalable and reliable infrastructure.
- Collaborate with engineering and product teams to integrate observability, reliability, and security considerations into the entire software development lifecycle.
- Develop and implement automation tools for monitoring, deployment, and incident response to ensure efficient and reliable operations.
- Lead and participate in post-incident reviews to learn from operational surprises and driving actionable improvements to system reliability.
- Proactively identify and resolve performance bottlenecks and system issues.
- Conduct regular security assessments and audits to mitigate risks.
- Champion and embed a culture of reliability across the organization. You will act as a force multiplier, scaling your technical expertise by creating clear documentation, developing best-practice guides, and building tooling to roll out reliability enhancements automatically.
- Implement and manage Infrastructure as Code (IaC) using Ansible and other industry-standard tools.
- Implement and enforce cloud security best practices, including identity and access management (IAM), encryption, and network security.
- Develop dashboards and alerts to ensure real-time visibility into system operations.
- Stay updated with emerging cloud technologies and recommend improvements to existing systems.
Requirements
- 3+ years of experience as a Site Reliability Engineer (SRE), Systems, DevOps Engineer or similar role supporting business-critical services.
- English Level: B2 minimum
- Expertise with Linux system administration and networking technologies (DNS, firewalls, load-balancing).
- Good knowledge in creating, managing and troubleshooting containers, engines (Docker, Podman) and related cloud native ecosystem tools.
- Knowledge of database operations and concepts.
- Knowledgeable about a wide range of web, internet and cloud technologies.
- Understand distributed systems, their common failure modes and edge cases.
- Proficient in at least one programming language (Bash, Python, Java, Go etc).
- Hands on experience with Configuration Management (Ansible, Puppet etc), Infrastructure as Code (IaC) and Continuous Integration / Continuous Delivery (CI/CD).
- Familiarity with open source observability and telemetry tooling for logs, metrics, and traces, including Grafana, Prometheus and OpenTelemetry.
- Excellent problem-solving and analytical skills. You can calmly navigate complex production issues, identify root causes, and implement effective, lasting solutions.
- Possess a growth mindset. You are relentlessly curious, committed to continuous improvement, and passionate about scaling your expertise.
- Excellent communication & collaboration skills and a proven ability to build relationships with and educate engineering partners.
Bonus Skills
- Familiarity with industry standard compliance requirements (ISO/IEC 27001, PCI-DSS, NIST CSF etc).
- Experience with container orchestration systems like Kubernetes.
- Experience with cloud platforms (Azure, AWS, GCP etc)
- C1+ English level
Benefits & conditions
We offer you a competitive compensation plus these benefits:
- Flexible working hours (Flexitime) and the freedom to regularly work from home or our great office
- Free snacks and beverages in the office
- Regular team events
- A modern office environment which fits our innovative mindset and enables us to collaborate with our diverse team
- Mental health counselling and tips from NiloHealth
- Home Office set up budget
- 25 vacation days annually, plus local holidays
- Additional day off for your birthday
Following the guidelines of the Austrian collective bargaining agreement for IT, we offer a minimum monthly gross salary of € 3.175,- depending on your experience and credentials in Austria.
About the company
Software AG (Frankfurt MDAX: SOW) reimagines integration, sparks business transformation and enables fast innovation on the Internet of Things so you can pioneer differentiating business models. We give you the freedom to connect and integrate any technology from app to edge. We help you free data from silos so it’s shareable, usable and powerful - enabling you to make the best decisions and unlock entirely new possibilities for growth. Software AG has nearly 5,000 employees and is active in 70 countries