Senior Platform Engineer
STN, inc.
Oakland, United States of America
11 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Remote
Oakland, United States of America
Tech stack
Artificial Intelligence
Cloud Engineering
Computer Clusters
Configuration Management
Computer Programming
Image Management
Web Portals
Python
Open Source Technology
Cloud Services
Software Engineering
AI Infrastructure
Istio
Multi-Agent Systems
Core Api
Build Management
Kubernetes
Information Technology
Linkerd (Service Mesh)
Slurm
Hardware Infrastructure
Job description
The Senior Platform Engineer builds and operates the multi-tenant orchestration, scheduling, and customer-facing platform layer that turns raw GPU infrastructure into a usable cloud service. This role is the software backbone of GPU One (GPUaaS)., * Design and build the orchestration layer (Kubernetes, Slurm, Run:ai, or comparable)
- Manage multi-tenant isolation including namespaces, networking, storage, and quotas
- Build customer-facing platform APIs, CLIs, web portals, and SDKs
- Implement and operate image management, GPU operator, and node provisioning automation
- Drive infrastructure-as-code and automation across the platform stack
- Partner with SRE on platform reliability, SLO definition, and observability
- Support TAM and Support engineers on customer-impacting platform issues
- Maintain customer environment templates, configuration management, and rollout tooling
- Participate in architecture review, design discussions, and technical roadmap
- Drive continuous platform improvement and reduce operational toil
Requirements
Do you have experience in Software engineering?, Do you have a Bachelor's degree?, * 6+ years in platform engineering, SRE, or cloud engineering at scale
- Deep Kubernetes expertise including CRDs, operators, and multi-tenant patterns
- Strong programming skills in Go, Python, or both
- Experience operating GPU clusters or AI infrastructure at production scale
- Bachelor's degree in computer science or equivalent experience, * Experience with NVIDIA GPU Operator, MIG, MPS, and NCCL operator patterns
- Familiarity with Slurm operator, Run:ai, KubeRay, or comparable AI orchestration
- Service mesh experience (Istio, Linkerd) and multi-cluster networking
- Open source contributions in the cloud-native or AI infrastructure ecosystem