Senior SRE, Ads
Role details
Job location
Tech stack
Job description
The Ads Reliability team partners closely with Ads Engineering to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem. We help build and operate highly available services that drive revenue and maintain advertiser trust. We're looking for a Senior Site Reliability Engineer to build, operate, and scale the critical systems behind Reddit Ads., * Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving, auction, targeting, measurement, and billing systems.
- Design, build, and maintain infrastructure, tooling, and automation that improve service reliability and engineering productivity.
- Improve observability through monitoring, alerting, tracing, logging, and dashboards.
- Participate in on-call rotations and lead incident response efforts for critical production systems.
- Run root cause analysis and drive corrective actions following incidents.
- Collaborate with software engineers throughout the service lifecycle, from design reviews through production operations.
- Drive adoption of SRE best practices including SLIs, SLOs, error budgets, capacity planning, and operational readiness reviews.
- Reduce operational toil through automation and self-service tooling.
- Help define and measure advertiser-critical user journeys such as campaign creation, ad delivery, reporting, and billing.
- Scale Ads systems to support continued traffic growth, increased advertiser demand, and evolving business requirements.
Requirements
Do you have experience in Scalability?, * 5+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Good understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Good programming skills in languages such as Go, Python, or similar.
- Demonstrated ability to troubleshoot complex issues across applications, infrastructure, networking, and services.
- Experience with observability platforms, monitoring systems, alerting, and incident response.
- Experience driving automation and operational improvements.
Benefits & conditions
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Private Pension plan with Employer-matching
- 100% employer-sponsored group medical plan
- Income Replacement Programs
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave