AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)
Role details
Job location
Tech stack
Job description
- Develop automation and tooling in Python, Bash, and Go to streamline deployments and scaling
- Partner with ML, runtime, and hardware teams to productionize new inference capabilities
- Contribute to capacity planning, cost optimization, and reliability engineering
- Participate in on?call rotation for critical services
Requirements
- 3-5 years of hands?on Kubernetes experience (EKS, GKE, or self?hosted)
- 2-3 years operating production workloads on AWS or GCP
- Experience running ML or accelerated inference services at scale
- Strong skills in Python, Bash, and Go
- Deep understanding of GPU/accelerator scheduling, device plugins, and cluster performance
- Experience with IaC (Terraform/Pulumi), config management (Ansible/Puppet/Salt), and GitOps (Argo/Flux)
- Comfortable operating in fast?moving, early?stage environments
Bonus Points
- Experience with inference servers (Triton, vLLM, TGI)
- Exposure to non?GPU accelerators (FPGAs, ASICs)
- Background in SRE, observability, or performance engineering
- Experience building customer?facing API platforms
Benefits & conditions
Prime Team Partners is an equal opportunity employer. Prime Team Partners does not discriminate on the basis of race, color, religion, national origin, pregnancy status, gender, age, marital status, disability, medical condition, sexual orientation, or any other characteristics protected by applicable state or federal civil rights laws. For contract positions, hired candidates will be employed by Prime Team for the duration of the contract period and be eligible for our company benefits. Benefits include medical, dental and vision. Employees are covered at 75%. We offer a 401K after 6 months, we do not provide paid holidays or PTO, sick time is offered in accordance with local laws. This position is open until filled.