Sr. Site Reliability Engineer (Hybrid)

Broadridge Financial Solutions, Inc.

Newark, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 110K

Job location

Newark, United States of America

Tech stack

Java

Amazon Web Services (AWS)

Systems Engineering

Cloud Computing

Cloud Engineering

Computer Programming

Continuous Integration

DevOps

Distributed Systems

Fault Tolerance

Monitoring of Systems

Identity and Access Management

Python

Key Management

Systems Development Life Cycle

Reliability Engineering

Site Reliability Engineering Practices

Software Engineering

Reliability of Systems

Kubernetes

Infrastructure Automation Frameworks

Cloud Migration

Cloudwatch

Api Gateway

Terraform

Job description

We are seeking a Senior Site Reliability Engineer (SRE) to design, build, and operate highly reliable, scalable, and secure platforms supporting business-critical applications across hybrid (on-prem and cloud) environments. This role blends software engineering, systems engineering, and operational excellence, with a strong focus on automation, resiliency, observability, and cost efficiency. The SRE will partner closely with application development, infrastructure, security, and product teams to reduce operational toil, improve system reliability, and enable faster, safer delivery of services., Reliability & Resiliency Engineering

Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS).
Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation.
Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity.
Drive self-healing automation and proactive remediation strategies.

Automation & Infrastructure as Code

Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef).
Develop automation to eliminate manual operational tasks (TOIL reduction).
Create reusable modules, pipelines, and guardrails for standardized deployments.
Automate certificate lifecycle management, key rotation, and security updates.

Observability & Monitoring

Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring).
Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents.
Improve signal-to-noise ratio in alerting to reduce operational fatigue.
Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups.

Cloud & Platform Engineering

Engineer and operate platforms on AWS, including services such as:

EKS, EC2, RDS/Aurora, Lambda, API Gateway
CloudFront, WAF, ALB/NLB
CloudWatch, X-Ray, IAM, Secrets Manager

Lead cloud migrations and modernization initiatives, including legacy system refactoring.
Implement secure networking patterns (VPCs, private subnets, controlled egress).

Performance, Scalability & Cost Optimization

Identify and resolve performance bottlenecks through testing and analysis.
Drive FinOps initiatives to optimize infrastructure cost without compromising reliability.
Implement capacity planning and autoscaling strategies.

CI/CD & SDLC Enablement

Design and support CI/CD pipelines enabling safe, repeatable deployments.
Embed reliability practices into the SDLC (testing, rollout strategies, rollback).
Partner with development teams to improve operability of applications before production.

Security & Compliance

Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls).
Implement secure access controls, secrets management, and encryption best practices.
Participate in security reviews, audits, and risk assessments.

Leadership & Collaboration

Act as a technical leader and mentor for engineers transitioning into SRE roles.
Influence architecture and design decisions across multiple teams.
Communicate effectively with engineering leadership, product owners, and non-technical stakeholders.
Drive a culture of operational excellence, blameless postmortems, and continuous improvement.

Requirements

3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
Strong programming experience in Python, Java, or similar languages
Deep experience with Linux/Unix systems
Hands-on expertise with AWS and cloud-native architectures
Proven experience with Terraform and Infrastructure as Code
Strong understanding of networking, security, and distributed systems
Experience operating mission-critical, high-volume platforms, * Experience in financial services or highly regulated environments
Experience with EKS/Kubernetes at scale
Familiarity with Chaos Engineering and resilience testing
Experience leading cloud cost optimization (FinOps) initiatives
Prior experience transitioning traditional infrastructure teams into SRE practices

Compensation Range: The salary range for this position is between $100,000 - $110,000 USD. Broadridge considers various factors when evaluating a candidate's final salary including, but not limited to, relevant experience, skills, and education.

About the company

We are dedicated to fostering a collaborative, engaging, and inclusive environment and are committed to providing a workplace that empowers associates to be authentic and bring their best to work. We believe that associates do their best when they feel safe, understood, and valued, and we work diligently and collaboratively to ensure Broadridge is a company-and ultimately a community-that recognizes and celebrates everyone's unique perspective.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all