Senior Site Reliability Engineer
Axiom Software Solutions
Eindhoven, Netherlands
4 days ago
Role details
Contract type
Contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Eindhoven, Netherlands
Tech stack
Microsoft Windows
Audit Trail
Azure
Bash
Cloud Computing
Computer Programming
Continuous Integration
Software Debugging
Github
Python
Uptime
Octopus Deploy
Performance Tuning
Release Management
Reliability Engineering
Ruby
Data Logging
Scripting (Bash/Python/Go/Ruby)
System Availability
Reliability of Systems
Infrastructure as Code (IaC)
Kubernetes
Build Tools
Terraform
Splunk
Docker
Jenkins
Job description
- Exp level - JG9
- System Reliability & Uptime
- Design and implement strategies to ensure high availability, reliability, and performance of systems and services.
- Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
- Incident Management & Troubleshooting
- Respond to system outages and incidents, lead post-mortem investigations, and implement preventive measures.
- Create runbooks and automate recovery processes to reduce manual intervention.
- Share the on-call rotation and be an escalation contact for incidents.
- Infrastructure as Code (IaC)
- Build and maintain infrastructure using tools like Terraform.
- Ensure infrastructure is reproducible, version-controlled, and auditable.
- Monitoring & Observability
- Implement and manage monitoring tools (preferably Splunk).
- Set up alerts and dashboards to track the health and performance of services.
- Automation & Tooling
- Automate operational tasks such as deployments, scaling, backups, and failovers.
- Develop internal tools to support deployment pipelines and team workflows.
- Collaboration with Development & Operations
- Work closely with developers to design systems that are scalable and supportable.
- Advocate for and implement best practices around CI/CD, testing, and release management.
Requirements
- Programming & Scripting
- Proficiency in languages like Python, Bash, or Ruby.
- Ability to build tools, automate tasks, and debug production issues.
- Cloud Platforms
- Strong experience with cloud providers (GCP, Azure).
- Knowledge of cloud-native services, networking, and security.
- Linux/Unix Systems/Windows
- Deep understanding of system internals, performance tuning, and debugging.
- Containers & Orchestration
- Experience with Docker and Kubernetes (or other orchestration platforms).
- CI/CD & Automation Tools
- Familiarity with Jenkins, Github Actions, ArgoCD, or similar.
- Experience setting up and managing deployment pipelines.
- Monitoring & Logging
- Knowledge of observability stacks.
- Security & Compliance Awareness
- Understanding of securing systems and managing access control, secrets, and audit logging.
- Soft Skills
- Strong communication and collaboration skills.
- Enjoy coaching more junior team members.
- Ability to work under pressure during incidents and lead blameless post-mortems.
- Analytical mindset and proactive problem-solving approach.