Senior Platform Engineer

STN, inc.

Oakland, United States of America

11 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Oakland, United States of America

Tech stack

Artificial Intelligence

Cloud Engineering

Computer Clusters

Configuration Management

Computer Programming

Image Management

Web Portals

Python

Open Source Technology

Cloud Services

Software Engineering

AI Infrastructure

Istio

Multi-Agent Systems

Core Api

Build Management

Kubernetes

Information Technology

Linkerd (Service Mesh)

Slurm

Hardware Infrastructure

Job description

The Senior Platform Engineer builds and operates the multi-tenant orchestration, scheduling, and customer-facing platform layer that turns raw GPU infrastructure into a usable cloud service. This role is the software backbone of GPU One (GPUaaS)., * Design and build the orchestration layer (Kubernetes, Slurm, Run:ai, or comparable)

Manage multi-tenant isolation including namespaces, networking, storage, and quotas
Build customer-facing platform APIs, CLIs, web portals, and SDKs
Implement and operate image management, GPU operator, and node provisioning automation
Drive infrastructure-as-code and automation across the platform stack
Partner with SRE on platform reliability, SLO definition, and observability
Support TAM and Support engineers on customer-impacting platform issues
Maintain customer environment templates, configuration management, and rollout tooling
Participate in architecture review, design discussions, and technical roadmap
Drive continuous platform improvement and reduce operational toil

Requirements

Do you have experience in Software engineering?, Do you have a Bachelor's degree?, * 6+ years in platform engineering, SRE, or cloud engineering at scale

Deep Kubernetes expertise including CRDs, operators, and multi-tenant patterns
Strong programming skills in Go, Python, or both
Experience operating GPU clusters or AI infrastructure at production scale
Bachelor's degree in computer science or equivalent experience, * Experience with NVIDIA GPU Operator, MIG, MPS, and NCCL operator patterns
Familiarity with Slurm operator, Run:ai, KubeRay, or comparable AI orchestration
Service mesh experience (Istio, Linkerd) and multi-cluster networking
Open source contributions in the cloud-native or AI infrastructure ecosystem

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all