Lead SRE/DevOps Engineer

Launch Potato
9 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Runbook
Terraform
Pagerduty

Job description

  • Deliver SOC 2 Type I audit-ready infrastructure evidence package: own the technical controls implementation end-to-end.
  • Version and publish the Terraform module library: (30+ modules) to a private registry to eliminate ad hoc git consumption by product teams.
  • Implement automated deployment rollback for ECS and Lambda: gate production on integration test passage.
  • Stand up monthly cost reporting to leadership: budget anomaly detection, savings plan recommendations, spend by service/team/environment.

Requirements

  • 5+ years of production AWS infrastructure experience with deep Terraform expertise.
  • Hands-on experience building the SRE function from scratch and had complete ownership.
  • Experience with a multi-site company where PaaS or microservices are required.
  • CI/CD pipeline ownership in one or more previous roles.
  • PagerDuty experience and standing up an on-call rotation.

EXPERIENCE: 5+ years hands-on with AWS, Terraform, CI/CD pipeline ownership, and SRE tooling (OpenTelemetry, Grafana, PagerDuty or equivalent) in a production environment., * Ownership orientation: You don't wait to be assigned a problem. If something is broken, undocumented, or a risk, you flag it and fix it. If the runbooks don't exist yet, you write them.

  • Documentation discipline: You write things down. Runbooks, decision rationale, architecture patterns, incident post-mortems. The next person should be able to understand your work without asking you.
  • Cost consciousness: You think about the business impact of infrastructure decisions. You can explain a spending anomaly to a CFO in plain language. You know what things cost before you build them.
  • Calm under pressure: Production incidents happen. You triage clearly, communicate proactively with technical and non-technical stakeholders, and run a tight post-mortem without blame. You've been woken up at 3am. You can handle it.
  • Cross-functional communication: You can work with product engineers, legal/compliance, and executive leadership in the same week without switching communication modes awkwardly. You speak both engineer and business.
  • Proactive reliability: A good SRE reacts to outages. A great SRE catches degradation before it becomes an outage. You build alerting against the patterns, not just the failures.

Benefits & conditions

Lead the development of Launch Potato's cloud infrastructure, establishing SRE practices including on-call rotations and monitoring systems, while ensuring cost efficiency and reliability. The summary above was generated by AI, Base salary is set according to market rates for the nearest major metro and varies based on Launch Potato's Levels Framework. Your compensation package includes a base salary, profit-sharing bonus, and competitive benefits. Launch Potato is a performance-driven company, which means once you are hired, future increases will be based on company and personal performance, not annual cost of living adjustments.

About the company

Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About Cookies, and OnlyInYourState. As The Discovery and Conversion Company, our mission is to connect consumers with the world's leading brands through data-driven content and technology. Headquartered in South Florida with a remote-first team spanning over 15 countries, we've built a high-growth, high-performance culture where speed, ownership, and measurable impact drive success.

Apply for this position