Sr Site Reliability Engineer I, Global Commercial Services
Role details
Job location
Tech stack
Job description
- Mentors junior Site Reliability Engineers and cross-functional team of colleagues, fostering a culture of excellence and innovation
- Provides guidance and support to junior engineers, fostering professional growth and development within the team, ensuring adherence to best practices in Site Reliability Engineering
- Manages and oversees collaboration with Software Engineering teams to design, develop, and implement advanced features that enhance system resilience, scalability, and performance, proactively identifying and resolving complex system bottlenecks and failure points
- Leads the development and refinement of sophisticated automation tools and frameworks, including advanced infrastructure as code (IaC) practices, to streamline complex operational workflows, deployment processes, and infrastructure management, significantly reducing manual intervention and ensuring high system efficiency
- Actively engages in and influences high-level architectural design discussions, ensuring that advanced reliability, scalability, and performance considerations are deeply integrated into strategic decision-making processes, and driving the adoption of innovative solutions
- Designs, executes, and oversees comprehensive chaos engineering experiments and advanced resiliency testing, analyzing results to implement robust improvements that enhances system robustness and recovery capabilities, and mentors colleagues in these practices
- Leads the development, optimization, and maintenance of comprehensive disaster recovery plans and business continuity strategies, ensuring systems can recover quickly and effectively from complex and unexpected disruptions
- Advocates for and implements advanced observability practices, including error budgeting, service-level objectives (SLOs), and service-level indicators (SLIs), contributing to a culture of continuous improvement and reliability, and mentoring colleagues in these practices
- Collaborates with cross-functional teams to enhance customer journeys, ensuring seamless and reliable technology experiences by addressing potential reliability and performance issues proactively, and leading initiatives to improve overall system reliability
- Collaborates and co-creates effectively with teams in product and the business to align technology initiatives with business objectives
Requirements
- Bachelor's degree in Computer Science, Information Technology, Engineering, and/or comparable experience; advance degree preferred
- 3 years experience of modern observability stack - Splunk, Elastic Search, Prometheus, Grafana
- 3 years experience of containerization technologies (e.g., Kubernetes, Docker) and microservices architecture
- 3 years experience in container orchestration tools (Kubernetes, ECS, Docker Swarm)
- 3 years experience and knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
- 1 year experience of cloud-based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud
- Expert level knowledge of service based and event driven systems and infrastructure (Streams, Topics, Queues, REST)
- Expert level knowledge of IaC automation tools (Terraform, Ansible, CloudFormation, Puppet, Chef)
- Expert level knowledge of CI/CD Automation tools (GitHub Actions, AWS CodePipeline, Google Cloud Build)
- Expert level knowledge of web architecture including networking, infrastructure configuration and provisioning, infrastructure scaling, * AWS Certified DevOps Engineer - Professional
- Google Cloud Professional Cloud DevOps Engineer Certification
Employment eligibility to work with American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions
Benefits & conditions
We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:
- Competitive base salaries
- Bonus incentives
- 6% Company Match on retirement savings plan
- Free financial coaching and financial well-being support
- Comprehensive medical, dental, vision, life insurance, and disability benefits
- Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
- 20+ weeks paid parental leave for all parents, regardless of gender, offered for pregnancy, adoption or surrogacy
- Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)
- Free and confidential counseling support through our Healthy Minds program
- Career development and training opportunities
For a full list of Team Amex benefits, visit out Colleague Benefits Site (https://www.americanexpress.com/en-us/colleagues/benefits) .
American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. American Express will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with the requirements of applicable state and local laws, including the California Fair Chance Act, the Los Angeles County Fair Chance Ordinance for Employers, and the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance. For positions covered by federal and/or state banking regulations, American Express will comply with such regulations as it relates to the consideration of applicants with criminal convictions.
We back our colleagues with the support they need to thrive, professionally and personally. That's why we have Amex Flex, our enterprise working model that provides greater flexibility to colleagues while ensuring we preserve the important aspects of our unique in-person culture. Depending on role and business needs, colleagues will either work onsite, in a hybrid model (combination of in-office and virtual days) or fully virtually.