Site Reliability Engineer
Role details
Job location
Tech stack
Job description
This role serves as a hands-on technical leader responsible for designing, implementing, and advancing reliability across complex, distributed systems in a highly regulated financial services environment., *Design and implement reliability strategies across distributed, business-critical systems using automation, observability, and architectural guidance. *Lead root cause analysis investigations on incidents and produce clear documentation of findings and remediation steps. *Develop and maintain CI/CD pipelines and deployment automation to support reliable, measurable software delivery. *Advance observability practices across production workloads, including metrics, logging, and distributed tracing. *Influence system design upstream by operating with significant autonomy and engaging stakeholders on reliability outcomes. *Contribute to scripting and automation using Java, Python, or comparable languages to reduce toil and improve system reliability. *Support containerized and serverless workloads across a large-scale AWS environment spanning approximately 1,200 hosts.
Requirements
*Bachelor's degree in Computer Science, Engineering, or a related field. *6-10 years of experience in SRE, software engineering, platform engineering, or DevOps roles. *Deep hands-on experience with AWS across multiple architecture patterns, including containerized applications, EC2, and serverless. *Strong programming skills with professional experience in Java and/or Python for scripting and automation. *Proven experience with observability tooling covering metrics, logs, and tracing. *Solid experience with CI/CD pipelines, deployment automation, and root cause analysis investigation and documentation. *Demonstrated ability to operate effectively in complex, regulated enterprise environments. Preferred Qualifications: *Experience with Infrastructure as Code, particularly CloudFormation. *Familiarity with application frameworks such as Spring, Spring Boot, React, or Angular, and application servers such as Tomcat, Netty, Node.js, or Next.js. *Experience with relational and non-relational databases and associated ORM frameworks or drivers. *Familiarity with ITSM tools such as ServiceNow or similar platforms, and ITIL-aligned change and release processes. *Experience working in Agile or Scrum delivery environments. *Knowledge of security compliance frameworks such as ISO 27001 or SOC 2. *, *Bachelor's degree in Computer Science, Engineering, or a related field. *6-10 years of experience in SRE, software engineering, platform engineering, or DevOps roles. *Deep hands-on experience with AWS across multiple architecture patterns, including containerized applications, EC2, and serverless. *Strong programming skills with professional experience in Java and/or Python for scripting and automation. *Proven experience with observability tooling covering metrics, logs, and tracing. *Solid experience with CI/CD pipelines, deployment automation, and root cause analysis investigation and documentation. *Demonstrated ability to operate effectively in complex, regulated enterprise environments. Preferred Qualifications: *Experience with Infrastructure as Code, particularly CloudFormation. *Familiarity with application frameworks such as Spring, Spring Boot, React, or Angular, and application servers such as Tomcat, Netty, Node.js, or Next.js. *Experience with relational and non-relational databases and associated ORM frameworks or drivers. *Familiarity with ITSM tools such as ServiceNow or similar platforms, and ITIL-aligned change and release processes. *Experience working in Agile or Scrum delivery environments. *Knowledge of security compliance frameworks such as ISO 27001 or SOC 2.