Senior DevOps Engineer

Lumicity LLC

Millbrae, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Millbrae, United States of America

Tech stack

Amazon Web Services (AWS)

Bash

Cloud Computing

Cloud Engineering

Continuous Integration

Data Systems

Software Debugging

DevOps

Distributed Systems

Github

Python

PostgreSQL

Reliability Engineering

Software Vulnerability Management

Datadog

Scripting (Bash/Python/Go/Ruby)

System Availability

Grafana

Backend

Sentry

Machine Learning Operations

Vertica

Terraform

Serverless Computing

Job description

We're hiring a Senior DevOps Engineer or Site Reliability Engineer - depending on where your experience and interests land.

Both roles sit within our engineering team, report into engineering leadership, and work closely with backend and ML engineers. The difference is in focus:

DevOps track: Infrastructure as code, CI/CD, deployment systems, developer experience, and platform reliability.
SRE track: Observability, incident management, SLO frameworks, and production reliability across distributed systems.

Whichever track you're on, this is a hands-on, high-ownership role. You'll have real production responsibility and real impact on how the platform performs at scale.

What you'll work on

Design and evolve AWS-based cloud infrastructure using Terraform
Own and improve CI/CD pipelines (GitHub Actions) for fast, safe deployments
Standardize deployment patterns across serverless workloads (Lambda), containerized services (ECS), and workflow orchestration systems
Define observability standards across metrics, logs, and traces using OpenTelemetry, Datadog, Grafana, and Sentry
Build proactive detection for reliability risks, latency regressions, and performance degradation
Partner with backend and ML teams to debug distributed system issues, including Postgres performance
Lead and support incident response and root cause analysis
Automate security and compliance workflows (access controls, audit readiness, vulnerability management)
Participate in on-call rotation, * Modern cloud-native stack: AWS, Terraform, GitHub Actions, ECS, Lambda, Aurora Postgres, Datadog, OpenTelemetry

Requirements

Must have:

7+ years in DevOps, SRE, or infrastructure engineering in a B2B SaaS environment
Strong production AWS experience
Deep hands-on Terraform (IaC) experience
CI/CD pipeline ownership (GitHub Actions or equivalent)
Experience with serverless and containerized services in production
Postgres in production (performance, tuning, operations)
Observability tooling: metrics, logs, traces - and the ability to turn signals into action
Scripting fluency (Python, Bash, or similar)
High ownership mindset - you're not waiting to be assigned an incident, you're already thinking about failure modes

Nice to have:

Experience in healthcare, fintech, or other regulated environments
ClickHouse or high-scale analytics systems
OpenTelemetry and modern observability architecture
ML infrastructure experience

About the company

Series B healthcare AI company that has grown revenue by a tremendous amount. More than 100 enterprise healthcare organizations use our platform to automate complex, compliance-critical operational workflows - the kind of work that used to require large manual teams and still carries serious downstream risk if it breaks. We're about 100 people, well-funded, and at an inflection point: our platform is scaling fast, our engineering team is growing, and reliability is becoming mission-critical. This isn't a company that's been around long enough to accumulate decades of technical debt. You'd be building the right foundation from the start.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all