Senior DevOps Engineer
Role details
Job location
Tech stack
Job description
We're looking for a Senior DevOps Engineer to own our cloud infrastructure end-to-end - from operating a large multi-tenant Kubernetes environment to building CI/CD pipelines that teams actually trust. You'll work across AWS, drive infrastructure-as-code standards, and lead our migration toward GitLab CI and a Grafana-based observability stack while keeping production environments stable.
What You'll Do
- Operate and scale a multi-tenant AWS EKS cluster where each client runs an isolated set of application services - owning tooling to onboard, scale, and observe hundreds of service instances reliably
- Build and improve CI/CD pipelines in GitLab CI and GitHub Actions with automated testing, static analysis, and build-gated releases; maintain ArgoCD GitOps workflows for production deployments
- Lead the migration from Datadog to a self-managed Grafana observability stack (Grafana, Loki, Mimir/Prometheus, Tempo) - dashboards, SLOs, alert routing, and on-call integration
- Manage secrets, IAM, and security scanning pipelines using AWS KMS, Secrets Manager, external-secrets operator, and Auth0/Dex OIDC - enforcing least-privilege across all environments
- Own and evolve the Redpanda (Kafka-compatible) streaming layer and its integrations with application workers
- Drive cloud cost optimization through right-sizing, autoscaling, and shared infrastructure patterns on EKS
- Document infrastructure with automated tooling (terraform-docs) and maintain standards that scale across teams
- Automate operational toil - certificate renewal, clinic environment provisioning, deployment validation, runbook automation
Requirements
Do you have experience in WAF?, Required
- 5+ years in DevOps or infrastructure engineering
- 3+ years operating Kubernetes in production - AWS EKS preferred - including CSI drivers, cluster autoscaling, network policy (Calico), and pod identity
- 3+ years hands-on with AWS core services (IAM, S3, KMS, Secrets Manager, STS, EKS, Load Balancer Controller, ECR)
- Strong Terraform experience; GitOps experience with ArgoCD or Flux
- Hands-on experience with GitLab CI and/or GitHub Actions
- Scripting proficiency in Python and Bash
- Experience with IAM design and security best practices (SAST/DAST, secret scanning, OIDC federation)
- Familiarity with streaming or message-queue infrastructure (Redpanda, Kafka, or equivalent)
Nice to Have
- Experience migrating from a SaaS observability tool (Datadog, New Relic) to a self-hosted Grafana stack
- Grafana stack depth - Loki for logs, Mimir or Thanos for metrics, Tempo for traces, Alertmanager for routing
- Experience with Redpanda specifically, or deep Kafka operations knowledge
- Background in multi-tenant SaaS platforms or per-customer service isolation patterns
- AWS certification
- Familiarity with chaos engineering tooling (chaos-mesh or LitmusChaos)
- Background in software engineering or scripting-heavy roles
Tech Stack
Current production: AWS (EKS, S3, KMS, Secrets Manager, STS, Load Balancer) · Terraform · GitHub Actions · ArgoCD · Kubernetes · Traefik · Coraza WAF · Redis HA · MongoDB · Auth0 · Dex · external-secrets · Datadog · Docker · Python · Bash · Linux
Where we're going: GitLab CI · Redpanda · Grafana · Loki · Prometheus/Mimir · Tempo · Alertmanager
Platform components you'll operate: ArgoCD · Traefik · Coraza WAF · Auth0 · Dex · Redis HA · MongoDB · API servers · client-facing portals · internal tooling
Benefits & conditions
- Own infrastructure across a real multi-tenant platform serving production clinic environments
- Lead the observability and streaming migrations - greenfield decisions with lasting impact
- Collaborative engineering culture with high trust and low bureaucracy
- Competitive salary, benefits, and flexible work arrangements