Site Reliability Engineer (SRE)
Role details
Job location
Tech stack
Job description
We are seeking a Site Reliability Engineer (SRE) to operate hands-on across the stack to improve platform and application observability, drive reliability improvements, and deliver measurable gains in operational efficiency. This role will work closely with core teams to execute platform modernization, harden production systems, and evolve support tooling. This position is critical to maintaining execution velocity, reducing operational risk, and ensuring reliability and performance objectives are met., * Collaborate with engineers and architects to design, develop, test, and implement secure, robust, and scalable solutions for applications and platforms.
- Design and implement deployment approaches using automated continuous integration and continuous delivery pipelines.
- Take responsibility for all aspects of reliability, collaborating with technical experts to resolve complex problems.
- Utilize SRE practices, service level indicators, and service level objectives to proactively resolve issues.
- Gather, analyze, and develop visualizations from large, diverse data sets to support continuous platform improvement.
- Identify opportunities to eliminate toil and automate the triage of issues to improve operational stability.
- Collaborate with a global team to identify, analyze, and resolve platform vulnerabilities.
- Promote the adoption of site reliability engineering best practices within the team and organization.
Requirements
Experience: A minimum of 5 years of combined experience in SRE, software development, or infrastructure engineering.
Technical Skills:
- Experience in implementing, monitoring, and maintaining highly scalable and resilient application services and platforms.
- Experience with monitoring tools such as OpenTelemetry (OTel), ELK (Elasticsearch, Logstash, Kibana), Splunk, and Dynatrace.
- Knowledge of Python, Shell, or Perl scripting.
- Proficiency in implementing CI/CD pipelines with tools such as Git and Jenkins.
- Advanced knowledge of networking, including firewalls, DNS, Load Balancing, and Proxies.
- Advanced understanding of the Linux operating system, including shell scripting and core commands for automation.
- Experience with Ansible for writing playbooks and using core modules.
Professional Skills:
- Excellent interpersonal, organizational, and communication skills are required.
- Must be self-motivated and results-oriented with analytical and problem-solving skills.
Preferred Qualifications
- UI/UX experience to provide oversight on best practices for tooling.
- Hands-on experience with Terraform for Infrastructure as Code (IaC).
- Background in a large enterprise environment.
- Ability to analyze and resolve complex infrastructure issues.
- Capacity to work in a fast-paced environment and meet deadlines.
Benefits & conditions
The pay rate for this position is $73.68 per hour. Contract employees are eligible for benefits, including medical, dental, and vision insurance options.