Site Reliability Engineer, EMEA
Role details
Job location
Tech stack
Job description
-
Design, write, and maintain software, primarily in Python, to automate the provisioning, deployment, and configuration management of our infrastructure
-
Contribute to the adoption and maturation of Terraform, establishing and maintaining best practices for state management, modularization, and version control.
-
Utilize Ansible and/or Saltstack to ensure consistency, repeatability, and standardization across all environments.
-
Develop robust CI/CD pipelines for both infrastructure and application deployments, replacing manual processes.
-
Implement and mature monitoring, logging, and alerting systems to proactively improve system reliability.
-
Participate in a "follow the sun" on-call rotation, focusing on sustainable incident response, blameless postmortems, and driving continuous improvement.
-
Champion SRE principles, automation, and coding best practices within the team and across the organization.
Requirements
-
3+ years of hands-on experience managing production environments in AWS and/or GCP.
-
Strong proficiency in Python. Demonstrated ability to write clean, maintainable, and testable code to solve infrastructure problems.
-
Experience with Terraform, including best practices for state management and modular design in complex environments.
-
Strong knowledge of Linux internals and high competency in Bash scripting and command-line operations.
-
Proficiency with Ansible and/or Saltstack as configuration management tools.
-
Expert level understanding of Git and collaborative workflows, such as branching strategies and code review best practices.
Highly Desirable Skills:
-
Proven track record of transitioning legacy/manual operations environments to automated, IaC-driven approaches.
-
Experience with containerization in the context of Docker or Kubernetes, and how container orchestration is used in modern systems.
-
Experience building and managing CI/CD pipelines for infrastructure automation.
-
Familiarity with Zabbix, Prometheus, Grafana and other tools.
-
Experience operating and querying Opensearch/Elasticsearch.
...and these other things:
-
A strong desire to solve complex problems, the resilience to work through significant technical debt, and enthusiasm for driving cultural and technical change.
-
A desire to work in enterprise and government focused computing environments with robust security and reliability requirements.
-
MS/BS in Computer Science/Computer Engineering or related field of study (or equivalent experience)
Benefits & conditions
We take good care of our people. Our benefits include:
- Fully remote company
- Comprehensive health, vision, and dental coverage
- Flexible time off
- Company computer hardware of your choice
- Work from home setup reimbursement
- Health & wellness perks including
- Virtual events, happy hours, trivia, and fun
- Monthly Internet & Phone Reimbursement
- Opportunities to learn and grow