Site Reliability Developer (python/java) / SRE

WatchGuard Technologies
Chiva, Spain
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Chiva, Spain

Tech stack

Java
Amazon Web Services (AWS)
JIRA
Automation of Tests
Azure
Cloud Computing
Code Review
Computer Programming
Software Debugging
DevOps
Elasticsearch
Github
Python
Object-Oriented Software Development
Reliability Engineering
Software Engineering
Spark
Cloudformation
Kubernetes
Apache Flink
Functional Programming
Software Coding
Terraform
New Relic (SaaS)
Software Version Control
Serverless Computing
Docker
Jenkins
Artifactory

Job description

The WatchGuard SRE team owns the reliability and security of our production cloud environments alongside our application development teams to ensure we deliver the best possible experience to our customers. As you learn more about our systems, you will be:

  • Ensuring smooth production operations with development teams and leading large-scale event response.
  • Defining operational and security policies, standards, and processes for our development teams to follow.
  • Guiding our development teams through the process of establishing, monitoring, and achieving their service level agreements through the definition of service level indicators and objectives.

A Typical Day in the Life of a Site Reliability Developer, SRE Team at WatchGuard: As a SRE at WatchGuard, a "typical" day may have you:

  • Working side-by-side with our application teams in production AWS, Azure, and hybrid cloud environments to ensure proper monitoring, security, reliability, automation, and support are in place.
  • Driving an operational excellence culture throughout WatchGuard with the simplification, automation, analysis, and evolution of our activities and processes.
  • Championing security and operational best practices to become known as a cloud expert by the rest of our development teams located across the globe.
  • Striving to provide the best possible customer experience even when things go wrong by participating in our on-call rotation and then coordinating and leading the production troubleshooting efforts.
  • Using your programming skills to develop automation or assist with debugging and fixing complex production issues.
  • Being curious, learning new things, and then sharing your knowledge through documentation, presentations, and guidance to other teams.

Requirements

Do you have experience in Terraform?, You are a customer-focused, data-driven developer who has a passion for delivering the best customer experience possible. You enjoy the thrill of coordinating and troubleshooting production issues and want to proactively find and fix issues. You have an understanding of cloud technologies, automation, everything-as-code, networking, microservice architectures, object-oriented design, SRE and DevOps cultures, proficiency in Python, Java, or Go programming and a desire to learn others. You come with proven knowledge of software engineering best practices for the full software development lifecycle including coding standards, code reviews, security, source control management, build processes, automated testing, deployment, monitoring, chaos engineering, and automated self-healing operations. As well as knowledge of tools and technologies like CloudFormation, Terraform, New Relic, Lambda, Serverless, Elasticsearch, Docker, Kubernetes, Spark, Flink, Jenkins, GitHub, Artifactory, Jira, etc. You are able to lead production incident response and postmortems through your strong analytical and problem-solving abilities as well as verbal and written communication skills.

Apply for this position