Site Reliability Engineer II
Role details
Job location
Tech stack
Job description
-
Apply strong software development and system design skills to build scalable, resilient systems.
-
Contribute as a core member of an agile team, consistently following best practices (tools, common components, documentation).
-
Demonstrate self-reliance in achieving team goals and driving technical outcomes.
-
Participate actively in code reviews, automated testing, and debugging to ensure high-quality deliverables.
-
Automate deployments, scaling, and runtime operations across test, integration, and production environments.
-
Enable observability, monitoring, and proactive incident prevention by embedding SRE principles into software lifecycle.
Responsibilities
-
Perform hands-on software development (50-60% of time) including new feature coding, unit testing, proof of concepts, and refactoring.
-
Drive automation initiatives, ensuring repeatable deployments and reducing manual intervention.
-
Design and implement resilient, observable, and self-healing systems that minimize downtime.
-
Collaborate with product, infrastructure, and engineering teams to integrate SRE practices early in the SDLC.
-
Mentor junior engineers while contributing as an individual technical expert.
-
Continuously research, test, and introduce new technologies to enhance system performance and engineering velocity.
-
Participate in incident response, root cause analysis, and remediation to maintain high availability and reliability., We back our colleagues with the support they need to thrive, professionally and personally. That's why we have Amex Flex, our enterprise working model that provides greater flexibility to colleagues while ensuring we preserve the important aspects of our unique in-person culture. Depending on role and business needs, colleagues will either work onsite, in a hybrid model (combination of in-office and virtual days) or fully virtually.
Requirements
-
Education: Bachelors in computer science, Engineering, or equivalent technical discipline.
-
Experience: 5-10 years in software engineering/SRE with proven IC track record.
-
Strong experience in software development (Java, Python, or Go) and REST API design.
-
Expertise with system integration solutions (APIs, Batch & Real-Time Data pipelines).
-
Hands-on with CI/CD pipelines (e.g., Git, Maven, Jenkins) and modern build tools.
-
Solid troubleshooting skills and experience with APM, monitoring, and AIOps platforms.
-
Strong problem-solving, collaboration, and communication skills.
-
Exposure to cloud technologies (Google Cloud, Adobe Marketing Cloud preferred).
-
Willingness to work flexible shifts and support production environments as needed.
Preferred Qualifications:
-
5+ years of professional experience in high-availability distributed systems.
-
Practical experience with Splunk, ElasticSearch, Redis, Postgres, OracleDB, Grafana, Java or Python.
-
Hands-on expertise with frameworks and tools such as Spring Boot or Flask, Django.
-
Deep experience with performance tuning, debugging, and resiliency engineering.
-
Strong knowledge of agile methodologies and continuous improvement practices.
-
Proven track record of delivering scalable, innovative solutions that improve resiliency and engineering efficiency.
Soft Skills & Cultural Fit:
-
Analytical, curious, and proactive - with a continuous learning mindset.
-
Strong communicator, able to bridge technical and business discussions effectively.
-
Brings a culture of innovation, challenges the status quo, and encourages creative problem-solving.
-
Customer-focused with a "can-do" attitude and high integrity in execution. Employment eligibility to work with American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions.