Site Reliability Engineer

Red Ventures
New York, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 145K

Job location

New York, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Business Analytics Applications
Bash
Cloud Computing
Cloud Computing Security
Data Infrastructure
Information Leak Prevention
Software Debugging
Noise Reduction
DevOps
Distributed Systems
Python
PostgreSQL
MySQL
Network Architecture
PCI Data Security Standards
Reliability Engineering
Prometheus
Systems Architecture
Google Cloud Platform
Delivery Pipeline
Large Language Models
Snowflake
Grafana
Database Optimization
Multi-Cloud
Reliability of Systems
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Machine Learning Operations
Terraform
Prisma Cloud Platform
Splunk
New Relic (SaaS)
ELK
Jenkins
Databricks
Vulnerability Analysis
Go
Microservices

Job description

  • Ensure system reliability and performance across multi-cloud, multi-region platforms using first principles thinking
  • Build and maintain comprehensive observability solutions (OpenTelemetry, New Relic, Grafana, Prometheus) that provide actionable insights into system health and performance.
  • Automate infrastructure provisioning and deployments using Terraform and infrastructure-as-code practices
  • Define, implement, and monitor SLOs/SLIs that align with business-critical SLAs and drive accountability for reliability.
  • Manage and optimize Kubernetes clusters (EKS, GKE) with a focus on security hardening, performance, and operational excellence.
  • Lead incident response efforts, troubleshoot complex system issues, restore service quickly, and conduct thorough root cause analysis
  • Implement preventive measures and reliability improvements based on lessons learned from incidents and system behavior patterns.
  • Partner with platform engineers and developers to embed reliability best practices into system architecture and delivery pipelines
  • Proactively scale infrastructure capacity based on growth forecasts and traffic patterns.
  • Contribute to architecture reviews with a deep focus on reliability, performance, and operational sustainability.
  • Foster a culture of continuous improvement, systematic problem-solving, and operational excellence., * Work in a small, high-impact team where your contributions directly shape system reliability and operational practices
  • Focus on strategic engineering rather than firefighting. We build monitoring, automation, and guardrails that prevent problems rather than just reacting to them.
  • Engage in first-principles thinking and a continuous-improvement culture that values thoughtful design over quick fixes.
  • Collaborate across a multi-cloud environment (AWS, Google Cloud Platform, Kubernetes) supporting diverse, mission-critical workloads.
  • Partner with platform engineers, developers, and principal engineers who provide technical guidance and collaboration
  • Own reliability for systems that directly impact business outcomes and customer experiences
  • Work alongside platform engineers to ensure the platforms they build are operationally sound and reliable at scale.

At Red Ventures, reliability isn't just about keeping systems running; it's about engineering resilience through thoughtful observability, automation, and operational discipline. You'll work with passionate engineers who value systematic problem-solving, learn from failures, and build reliability into every layer of the stack., We are committed to providing equal employment opportunities to qualified individuals with disabilities. This includes providing reasonable accommodation where appropriate. Should you require a reasonable accommodation to apply or participate in the job application or interview process, please contact

If you are based in California, we encourage you to read this important information for California residents linked here.

At Red Ventures, we believe in real human connection. That's why we do not hire someone through text, social media, or email only. As part of the hiring process, you should expect live conversations with RV teammates before any offer is made. Also, keep an eye on the sender: we only use official @redventures.com email addresses at the portfolio level or business specific email addresses (e.g., @thepointsguy.com), not ones like "redventurescareer.com." We will never ask candidates to send money, buy equipment, or share financial account info during your journey with us. You can always find our open roles on redventures.com- if you receive a message that seems suspicious, please use redventures.com to verify the opportunity.

For more, the U.S. Federal Trade Commission has published helpful articles to help individuals learn more about protecting themselves from recruiter scams. If you think you've been targeted, feel free to report it to your local authorities. Stay safe out there!

Requirements

  • 3-5 years of experience in SRE, DevOps, or cloud infrastructure engineering roles
  • Experience leveraging AI/ML tools to enhance observability, including anomaly detection, alert noise reduction, and predictive incident identification
  • Experience using generative AI or LLM-based tools to accelerate debugging, runbook creation, and operational knowledge sharing
  • Strong hands-on experience with AWS and Google Cloud Platform cloud platforms
  • Deep Kubernetes expertise (EKS, GKE), including security, networking, and operational best practices
  • Proficiency with infrastructure-as-code using Terraform
  • Experience building and maintaining observability systems (New Relic, Grafana, Prometheus, OpenTelemetry, or similar)
  • Solid understanding of CI/CD pipelines and automated deployment strategies (Harness, Jenkins, GitLab CI, or similar)
  • Strong scripting and automation skills (Python, Bash, Go, or similar languages)
  • Proven track record of maintaining high-availability systems (99.9%+ uptime)
  • Deep understanding of distributed systems, microservices architectures, and scalability patterns
  • Experience with incident management, troubleshooting complex systems, and learning from failures
  • Strong first-principles thinking, ability to reason from fundamentals rather than relying solely on existing patterns
  • Excellent written and verbal communication skills with the ability to explain complex technical concepts clearly

Bonus Points For:

  • Cloud certifications (AWS Solutions Architect, Google Cloud Platform Professional Cloud Architect, or equivalent)
  • Experience with data platform infrastructure (Databricks, Snowflake, or similar)
  • Familiarity with security scanning and remediation tools (Wiz, Aqua, Prisma Cloud, or similar)
  • Knowledge of compliance frameworks (SOC 2, PCI-DSS, HIPAA) and their operational implications
  • Experience with chaos engineering, resilience testing, or systematic failure injection
  • Database performance tuning and optimization expertise (PostgreSQL, MySQL, etc.)
  • Experience with log aggregation and analytics platforms (ELK Stack, Splunk, or similar)
  • Understanding of cloud security, network architecture, and multi-region deployment patterns
  • Familiarity with DLP (Data Loss Prevention) solutions (Netskope, Zscaler, or similar)
  • Background working with regulated industries or highly available consumer-facing applications

Benefits & conditions

This range reflects total cash compensation, which may include base salary only or base salary plus target bonus, depending on the role. Where eligible, equity may also be offered separately and not included below. Actual compensation varies based on location, experience, and qualifications.

  • Total Cash Compensation Range: $100,000 - $145,000 per year

Additionally, the following benefits are provided by Red Ventures, subject to eligibility requirements.

  • Health Insurance Coverage (medical, dental, and vision)
  • Life Insurance
  • Short and Long-Term Disability Insurance
  • Flexible Spending Accounts
  • Holiday Pay
  • 401(k) with match
  • Employee Assistance Program
  • Paid Parental Bonding Benefit Program
  • Flexible Paid Time Off (PTO): We believe time to rest and recharge is essential. That's why we offer a generous and flexible PTO policy. Full-time employees accrue 20 days of PTO for a full calendar year annually, with an increase to 25 days after five years of service., We offer competitive salaries and a comprehensive benefits program for full-time employees, including medical, dental and vision coverage, paid time off, life insurance, disability coverage, employee assistance program, 401(k) plan and a paid parental leave program.

About the company

Red Ventures is a global portfolio of high-growth companies - spanning several U.S. businesses, a joint venture in the health services industry, and strategic investments in Europe and Puerto Rico. Their businesses include The Points Guy, Lonely Planet, Bankrate, the Allconnect Platform, RV Home Client Growth, RV Growth & Transformation, Sage Home Loans Corporation, RV Education and more. Across the portfolio, Red Ventures businesses deliver seamless digital experiences for consumers, help Fortune 100 clients solve large-scale digital growth challenges, and create world-class experiences and opportunities for employees. Learn more at redventures.com and follow @RedVentures on LinkedIn and Instagram. At Red Ventures, we believe diverse, inclusive teams are better. To help you better understand our core values and beliefs, we encourage you to watch this brief YouTube video: Our Belief Statements. This will give you insight into the principles that guide our work and our commitment to fostering an inclusive environment.

Apply for this position