Site Reliability Engineer

Red Ventures

New York, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 145K

Job location

New York, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Business Analytics Applications

Bash

Cloud Computing

Cloud Computing Security

Data Infrastructure

Information Leak Prevention

Software Debugging

Noise Reduction

DevOps

Distributed Systems

Python

PostgreSQL

MySQL

Network Architecture

PCI Data Security Standards

Reliability Engineering

Prometheus

Systems Architecture

Google Cloud Platform

Delivery Pipeline

Large Language Models

Snowflake

Grafana

Database Optimization

Multi-Cloud

Reliability of Systems

Gitlab-ci

Kubernetes

Infrastructure Automation Frameworks

Deployment Automation

Machine Learning Operations

Terraform

Prisma Cloud Platform

Splunk

New Relic (SaaS)

ELK

Jenkins

Databricks

Vulnerability Analysis

Microservices

Job description

Ensure system reliability and performance across multi-cloud, multi-region platforms using first principles thinking
Build and maintain comprehensive observability solutions (OpenTelemetry, New Relic, Grafana, Prometheus) that provide actionable insights into system health and performance.
Automate infrastructure provisioning and deployments using Terraform and infrastructure-as-code practices
Define, implement, and monitor SLOs/SLIs that align with business-critical SLAs and drive accountability for reliability.
Manage and optimize Kubernetes clusters (EKS, GKE) with a focus on security hardening, performance, and operational excellence.
Lead incident response efforts, troubleshoot complex system issues, restore service quickly, and conduct thorough root cause analysis
Implement preventive measures and reliability improvements based on lessons learned from incidents and system behavior patterns.
Partner with platform engineers and developers to embed reliability best practices into system architecture and delivery pipelines
Proactively scale infrastructure capacity based on growth forecasts and traffic patterns.
Contribute to architecture reviews with a deep focus on reliability, performance, and operational sustainability.
Foster a culture of continuous improvement, systematic problem-solving, and operational excellence., * Work in a small, high-impact team where your contributions directly shape system reliability and operational practices
Focus on strategic engineering rather than firefighting. We build monitoring, automation, and guardrails that prevent problems rather than just reacting to them.
Engage in first-principles thinking and a continuous-improvement culture that values thoughtful design over quick fixes.
Collaborate across a multi-cloud environment (AWS, Google Cloud Platform, Kubernetes) supporting diverse, mission-critical workloads.
Partner with platform engineers, developers, and principal engineers who provide technical guidance and collaboration
Own reliability for systems that directly impact business outcomes and customer experiences
Work alongside platform engineers to ensure the platforms they build are operationally sound and reliable at scale.

At Red Ventures, reliability isn't just about keeping systems running; it's about engineering resilience through thoughtful observability, automation, and operational discipline. You'll work with passionate engineers who value systematic problem-solving, learn from failures, and build reliability into every layer of the stack., We are committed to providing equal employment opportunities to qualified individuals with disabilities. This includes providing reasonable accommodation where appropriate. Should you require a reasonable accommodation to apply or participate in the job application or interview process, please contact

If you are based in California, we encourage you to read this important information for California residents linked here.

At Red Ventures, we believe in real human connection. That's why we do not hire someone through text, social media, or email only. As part of the hiring process, you should expect live conversations with RV teammates before any offer is made. Also, keep an eye on the sender: we only use official @redventures.com email addresses at the portfolio level or business specific email addresses (e.g., @thepointsguy.com), not ones like "redventurescareer.com." We will never ask candidates to send money, buy equipment, or share financial account info during your journey with us. You can always find our open roles on redventures.com- if you receive a message that seems suspicious, please use redventures.com to verify the opportunity.

For more, the U.S. Federal Trade Commission has published helpful articles to help individuals learn more about protecting themselves from recruiter scams. If you think you've been targeted, feel free to report it to your local authorities. Stay safe out there!

Requirements

3-5 years of experience in SRE, DevOps, or cloud infrastructure engineering roles
Experience leveraging AI/ML tools to enhance observability, including anomaly detection, alert noise reduction, and predictive incident identification
Experience using generative AI or LLM-based tools to accelerate debugging, runbook creation, and operational knowledge sharing
Strong hands-on experience with AWS and Google Cloud Platform cloud platforms
Deep Kubernetes expertise (EKS, GKE), including security, networking, and operational best practices
Proficiency with infrastructure-as-code using Terraform
Experience building and maintaining observability systems (New Relic, Grafana, Prometheus, OpenTelemetry, or similar)
Solid understanding of CI/CD pipelines and automated deployment strategies (Harness, Jenkins, GitLab CI, or similar)
Strong scripting and automation skills (Python, Bash, Go, or similar languages)
Proven track record of maintaining high-availability systems (99.9%+ uptime)
Deep understanding of distributed systems, microservices architectures, and scalability patterns
Experience with incident management, troubleshooting complex systems, and learning from failures
Strong first-principles thinking, ability to reason from fundamentals rather than relying solely on existing patterns
Excellent written and verbal communication skills with the ability to explain complex technical concepts clearly

Bonus Points For:

Cloud certifications (AWS Solutions Architect, Google Cloud Platform Professional Cloud Architect, or equivalent)
Experience with data platform infrastructure (Databricks, Snowflake, or similar)
Familiarity with security scanning and remediation tools (Wiz, Aqua, Prisma Cloud, or similar)
Knowledge of compliance frameworks (SOC 2, PCI-DSS, HIPAA) and their operational implications
Experience with chaos engineering, resilience testing, or systematic failure injection
Database performance tuning and optimization expertise (PostgreSQL, MySQL, etc.)
Experience with log aggregation and analytics platforms (ELK Stack, Splunk, or similar)
Understanding of cloud security, network architecture, and multi-region deployment patterns
Familiarity with DLP (Data Loss Prevention) solutions (Netskope, Zscaler, or similar)
Background working with regulated industries or highly available consumer-facing applications

Benefits & conditions

This range reflects total cash compensation, which may include base salary only or base salary plus target bonus, depending on the role. Where eligible, equity may also be offered separately and not included below. Actual compensation varies based on location, experience, and qualifications.

Total Cash Compensation Range: $100,000 - $145,000 per year

Additionally, the following benefits are provided by Red Ventures, subject to eligibility requirements.

Health Insurance Coverage (medical, dental, and vision)
Life Insurance
Short and Long-Term Disability Insurance
Flexible Spending Accounts
Holiday Pay
401(k) with match
Employee Assistance Program
Paid Parental Bonding Benefit Program
Flexible Paid Time Off (PTO): We believe time to rest and recharge is essential. That's why we offer a generous and flexible PTO policy. Full-time employees accrue 20 days of PTO for a full calendar year annually, with an increase to 25 days after five years of service., We offer competitive salaries and a comprehensive benefits program for full-time employees, including medical, dental and vision coverage, paid time off, life insurance, disability coverage, employee assistance program, 401(k) plan and a paid parental leave program.

About the company

Red Ventures is a global portfolio of high-growth companies - spanning several U.S. businesses, a joint venture in the health services industry, and strategic investments in Europe and Puerto Rico. Their businesses include The Points Guy, Lonely Planet, Bankrate, the Allconnect Platform, RV Home Client Growth, RV Growth & Transformation, Sage Home Loans Corporation, RV Education and more. Across the portfolio, Red Ventures businesses deliver seamless digital experiences for consumers, help Fortune 100 clients solve large-scale digital growth challenges, and create world-class experiences and opportunities for employees. Learn more at redventures.com and follow @RedVentures on LinkedIn and Instagram. At Red Ventures, we believe diverse, inclusive teams are better. To help you better understand our core values and beliefs, we encourage you to watch this brief YouTube video: Our Belief Statements. This will give you insight into the principles that guide our work and our commitment to fostering an inclusive environment.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all