Site Reliability Developer (python/java) / SRE

WatchGuard Technologies

Chiva, Spain

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Chiva, Spain

Tech stack

Java

Amazon Web Services (AWS)

JIRA

Automation of Tests

Azure

Cloud Computing

Code Review

Computer Programming

Software Debugging

DevOps

Elasticsearch

Github

Python

Object-Oriented Software Development

Reliability Engineering

Software Engineering

Spark

Cloudformation

Kubernetes

Apache Flink

Functional Programming

Software Coding

Terraform

New Relic (SaaS)

Software Version Control

Serverless Computing

Docker

Jenkins

Artifactory

Job description

The WatchGuard SRE team owns the reliability and security of our production cloud environments alongside our application development teams to ensure we deliver the best possible experience to our customers. As you learn more about our systems, you will be:

Ensuring smooth production operations with development teams and leading large-scale event response.
Defining operational and security policies, standards, and processes for our development teams to follow.
Guiding our development teams through the process of establishing, monitoring, and achieving their service level agreements through the definition of service level indicators and objectives.

A Typical Day in the Life of a Site Reliability Developer, SRE Team at WatchGuard: As a SRE at WatchGuard, a "typical" day may have you:

Working side-by-side with our application teams in production AWS, Azure, and hybrid cloud environments to ensure proper monitoring, security, reliability, automation, and support are in place.
Driving an operational excellence culture throughout WatchGuard with the simplification, automation, analysis, and evolution of our activities and processes.
Championing security and operational best practices to become known as a cloud expert by the rest of our development teams located across the globe.
Striving to provide the best possible customer experience even when things go wrong by participating in our on-call rotation and then coordinating and leading the production troubleshooting efforts.
Using your programming skills to develop automation or assist with debugging and fixing complex production issues.
Being curious, learning new things, and then sharing your knowledge through documentation, presentations, and guidance to other teams.

Requirements

Do you have experience in Terraform?, You are a customer-focused, data-driven developer who has a passion for delivering the best customer experience possible. You enjoy the thrill of coordinating and troubleshooting production issues and want to proactively find and fix issues. You have an understanding of cloud technologies, automation, everything-as-code, networking, microservice architectures, object-oriented design, SRE and DevOps cultures, proficiency in Python, Java, or Go programming and a desire to learn others. You come with proven knowledge of software engineering best practices for the full software development lifecycle including coding standards, code reviews, security, source control management, build processes, automated testing, deployment, monitoring, chaos engineering, and automated self-healing operations. As well as knowledge of tools and technologies like CloudFormation, Terraform, New Relic, Lambda, Serverless, Elasticsearch, Docker, Kubernetes, Spark, Flink, Jenkins, GitHub, Artifactory, Jira, etc. You are able to lead production incident response and postmortems through your strong analytical and problem-solving abilities as well as verbal and written communication skills.