Sr. Site Reliability Engineer (Hybrid)

Broadridge Financial Solutions, Inc.
Newark, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 110K

Job location

Newark, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Systems Engineering
Cloud Computing
Cloud Engineering
Computer Programming
Continuous Integration
DevOps
Distributed Systems
Fault Tolerance
Monitoring of Systems
Identity and Access Management
Python
Key Management
Systems Development Life Cycle
Reliability Engineering
Site Reliability Engineering Practices
Software Engineering
Reliability of Systems
Kubernetes
Infrastructure Automation Frameworks
Cloud Migration
Cloudwatch
Api Gateway
Terraform

Job description

We are seeking a Senior Site Reliability Engineer (SRE) to design, build, and operate highly reliable, scalable, and secure platforms supporting business-critical applications across hybrid (on-prem and cloud) environments. This role blends software engineering, systems engineering, and operational excellence, with a strong focus on automation, resiliency, observability, and cost efficiency. The SRE will partner closely with application development, infrastructure, security, and product teams to reduce operational toil, improve system reliability, and enable faster, safer delivery of services., Reliability & Resiliency Engineering

  • Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS).
  • Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation.
  • Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity.
  • Drive self-healing automation and proactive remediation strategies.

Automation & Infrastructure as Code

  • Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef).
  • Develop automation to eliminate manual operational tasks (TOIL reduction).
  • Create reusable modules, pipelines, and guardrails for standardized deployments.
  • Automate certificate lifecycle management, key rotation, and security updates.

Observability & Monitoring

  • Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring).
  • Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents.
  • Improve signal-to-noise ratio in alerting to reduce operational fatigue.
  • Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups.

Cloud & Platform Engineering

  • Engineer and operate platforms on AWS, including services such as:
  • EKS, EC2, RDS/Aurora, Lambda, API Gateway
  • CloudFront, WAF, ALB/NLB
  • CloudWatch, X-Ray, IAM, Secrets Manager
  • Lead cloud migrations and modernization initiatives, including legacy system refactoring.
  • Implement secure networking patterns (VPCs, private subnets, controlled egress).

Performance, Scalability & Cost Optimization

  • Identify and resolve performance bottlenecks through testing and analysis.
  • Drive FinOps initiatives to optimize infrastructure cost without compromising reliability.
  • Implement capacity planning and autoscaling strategies.

CI/CD & SDLC Enablement

  • Design and support CI/CD pipelines enabling safe, repeatable deployments.
  • Embed reliability practices into the SDLC (testing, rollout strategies, rollback).
  • Partner with development teams to improve operability of applications before production.

Security & Compliance

  • Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls).
  • Implement secure access controls, secrets management, and encryption best practices.
  • Participate in security reviews, audits, and risk assessments.

Leadership & Collaboration

  • Act as a technical leader and mentor for engineers transitioning into SRE roles.
  • Influence architecture and design decisions across multiple teams.
  • Communicate effectively with engineering leadership, product owners, and non-technical stakeholders.
  • Drive a culture of operational excellence, blameless postmortems, and continuous improvement.

Requirements

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
  • Strong programming experience in Python, Java, or similar languages
  • Deep experience with Linux/Unix systems
  • Hands-on expertise with AWS and cloud-native architectures
  • Proven experience with Terraform and Infrastructure as Code
  • Strong understanding of networking, security, and distributed systems
  • Experience operating mission-critical, high-volume platforms, * Experience in financial services or highly regulated environments
  • Experience with EKS/Kubernetes at scale
  • Familiarity with Chaos Engineering and resilience testing
  • Experience leading cloud cost optimization (FinOps) initiatives
  • Prior experience transitioning traditional infrastructure teams into SRE practices

Compensation Range: The salary range for this position is between $100,000 - $110,000 USD. Broadridge considers various factors when evaluating a candidate's final salary including, but not limited to, relevant experience, skills, and education.

About the company

We are dedicated to fostering a collaborative, engaging, and inclusive environment and are committed to providing a workplace that empowers associates to be authentic and bring their best to work. We believe that associates do their best when they feel safe, understood, and valued, and we work diligently and collaboratively to ensure Broadridge is a company-and ultimately a community-that recognizes and celebrates everyone's unique perspective.

Apply for this position