Senior DevOps Engineer
Role details
Job location
Tech stack
Job description
We are looking for a Staff DevOps Engineer to join our DevOps team at K Health. You will own and evolve the infrastructure underpinning a healthcare AI platform serving patients and enterprise health system partners. This is a high-ownership role: you will architect and operate cloud environments across K Health and its enterprise partners, lead complex infrastructure migrations, drive disaster recovery programs, and help build the next generation of AI-powered operations tooling. You will also mentor junior engineers and collaborate closely with product and engineering teams across the company. This is a hybrid role based in New York City (4 days/week in office) and includes participation in a daytime on-call rotation., * Own the design, implementation, and evolution of our GKE-based Kubernetes infrastructure across K Health and enterprise partner environments.
- Build and maintain our Terraform modular infrastructure library, including reusable modules with automated testing, across GCP, Cloudflare, and AWS.
- Architect, build, and maintain GitLab CI/CD shared pipeline templates used by all engineering teams (build, test, security scanning, deployment).
- Own and maintain self-hosted infrastructure software running in-cluster, including GitLab, ArgoCD, Langfuse, DependencyTrack, NGINX Ingress, and others.
- Implement and support security and compliance controls across infrastructure and the software supply chain - secrets management, pipeline secret detection, container scanning, SOC2 and HIPAA.
- Drive disaster recovery readiness: design failover scenarios, author runbooks, and lead periodic DR tests.
- Lead development of AI-powered operations tooling and agentic infrastructure.
- Monitor, troubleshoot, and improve production system reliability; respond to incidents during on-call shifts.
- Mentor junior DevOps engineers and establish team-wide engineering standards.
Requirements
Do you have experience in Terraform?, * 5+ years of experience in DevOps, platform engineering, or site reliability engineering.
- Deep, hands-on experience with Kubernetes and the surrounding ecosystem - Helm, Helmfile, ArgoCD, Kyverno, cert-manager, and NGINX Ingress.
- Extensive experience with Google Cloud Platform - GKE, Cloud SQL, Memorystore, Cloud Storage, IAM, and Workload Identity.
- Strong Terraform expertise: modular architecture, multi-environment provisioning, and automated testing.
- Advanced knowledge of GitLab CI/CD and GitOps practices.
- Proficiency in Python and/or Go.
Plus:
- Advanced Bash scripting skills.
- Experience with secrets management solutions such as Akeyless or HashiCorp Vault.
- Experience with database administration across PostgreSQL, Redis, and MongoDB - including DR configuration and operational runbooks.
- Experience with Datadog or equivalent observability platform (APM, infrastructure, log management).
- Experience with Cloudflare for DNS, CDN, and security rules management.
- Demonstrated experience designing and executing disaster recovery programs, including failover testing and runbook authorship., * Experience in highly regulated environments - SOC2 and HIPAA.
- Excellent communication skills with the ability to lead cross-functional infrastructure initiatives.
- Demonstrated leadership experience, including mentoring junior engineers.
- Experience with HPC or GPU cluster infrastructure, including Slurm..
- Experience building or operating AI agents or agentic infrastructure.
- Experience with microservices architecture and API gateway / reverse proxy patterns.
- Experience with AWS.
Benefits & conditions
2.52.5 out of 5 stars New York, NY Hybrid work $135,000 - $180,000 a year, Pulled from the full job description
- Paid parental leave
- Parental leave
- 401(k)
- Health insurance
- Vision insurance
- Dental insurance
- Stock options, Benefits & Perks: #LI-Hybrid
- Hybrid work schedule with weekly lunches and stocked fridges
- Monthly social committees for company events
- 18 vacation days, 9 company holidays, 5 sick days, and 2 personal days
- Stock options for every full-time employee
- Paid parental leave
- 401k benefit
- Commuter Benefits
- Competitive health, dental, and vision insurance options, We offer competitive compensation packages based on industry benchmarks for function, level, and geographic location. Offer amounts are determined by multiple factors such as a candidate's experience and expertise.