Lead Site Reliability Engineer (GTAM)
Role details
Job location
Tech stack
Job description
- Demonstrate and champion site reliability culture and practices, exerting technical influence across your team
- Lead initiatives to improve reliability and stability of applications and platforms using data-driven analytics
- Collaborate with team members to define service level indicators and work with stakeholders to establish service level objectives and error budgets
- Provide technical leadership and guidance for medium to large-sized products
- Proactively identify and resolve technology-related bottlenecks in your areas of expertise
- Act as the main point of contact during major incidents, quickly identifying and solving issues to avoid financial losses
- Document and share knowledge within the organization through internal forums and communities of practice
Requirements
- Formal training or certification on software engineering concepts and 5+ years applied experience
- At least 5 years as an SRE and at least 10 years in a highly regulated industry such as Banking
- Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and site reliability best practices, with the ability to implement these practices within an application or platform
- Demonstrated experience designing, deploying, and supporting highly available services in a public cloud environment (AWS, Azure, or GCP); familiarity with cloud-native observability, auto-scaling, and infrastructure-as-code is essential
- Fluency in at least one programming language (e.g., Python, Java Spring Boot, .Net)
- Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines
- Proficiency and experience in observability, including white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk
- Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform)
- Experience with containers and container orchestration (e.g., ECS, Kubernetes, Docker)
- Experience troubleshooting common networking technologies and issues
Preferred qualifications, capabilities, and skills
- Ability to identify and solve problems related to complex data structures and algorithms
- Drive to self-educate and evaluate new technology
- Ability to teach new programming languages to team members
- Ability to expand and collaborate across different levels and stakeholder groups
Benefits & conditions
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.