Software Development Engineer - CI/CD, Trainium Manufacturing Test Infrastructure

Amazon.com, Inc.
Cupertino, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 185K

Job location

Cupertino, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Automation of Tests
Azure
C Sharp (Programming Language)
C++
Databases
Continuous Integration
Firmware
Github
Hardware Design
Virtual Private Networks (VPN)
Python
Machine Learning
Network Configuration and Change Management
Cloud Services
Software Deployment
Software Engineering
System Software
TypeScript
Rust
CircleCI
Amazon Web Services (AWS)
Pulumi
Hardware Testing
Cloudformation
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Terraform
Network Server
Docker
Jenkins
Go
Microservices

Job description

Design, build, and maintain CI/CD pipelines (AWS CDK, Pipelines) that deploy containerized services to AWS Outposts at global manufacturing sites

  • Extend the manufacturing infrastructure platform (TypeScript CDK, Python microservices) to support new workflows for Trainium accelerator cards, baseboards, and rack-level integration

  • Build integration test frameworks and canary systems that validate service health across all production sites before and after deployments

  • Develop automated alarming, rollback mechanisms, and deployment wave strategies to ensure zero-downtime releases to active manufacturing lines

  • Develop infrastructure-as-code for containerized services, databases, artifact storage, messaging queues, and authentication systems deployed on Outposts

  • Collaborate with Test Engineering teams, Hardware Engineers, and Supply Chain to resolve bottlenecks in the manufacturing process

About the team Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro, Graviton, Inferentia, and Trainium families of processors. Machine Learning Annapurna (MLA) functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization. We are the Training Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability. This position is in the Manufacturing, Quality and Reliability team.

Requirements

BS degree in computer science or equivalent

  • Experience with at least one general-purpose programming language such as Java, Python, C++, C#, Go, Rust, or TypeScript
  • Experience with CI/CD pipeline design and implementation (AWS Pipelines, CircleCI, GitLab CI, GitHub Actions, Jenkins, or similar)
  • Experience with cloud services (AWS, GCP, or Azure) - particularly IaC tools such as CDK, CloudFormation, Terraform, or Pulumi

Preferred Qualifications

  • Experience deploying software to edge/hybrid environments (AWS Outposts, on-premises)
  • Experience with containerized microservice architectures (Docker, ECS/EKS, Kubernetes)
  • Familiarity with hardware test automation or manufacturing systems
  • Experience with setting up CI/CD for system software
  • Familiarity with network configuration in constrained environments (VPN, CIDR management, site connectivity)

Benefits & conditions

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 127,100.00 - 185,000.00 USD annually

About the company

The Manufacturing Infrastructure Release Team within Annapurna ML builds and operates the software platform that orchestrates hardware testing and validation across multiple Trainium manufacturing sites worldwide. Our platform deploys containerized microservices to AWS Outposts at manufacturing partner factories - enabling component-level testing, card/board validation, server-level testing, and rack-level testing at scale. We directly enable the manufacturing ramp of AWS's custom AI training chips. We are looking for a Software Development Engineer to own and evolve the CI/CD infrastructure that delivers software to Trainium manufacturing sites worldwide. You will build and maintain deployment pipelines that push tested, validated code to production Outpost environments across multiple manufacturing partners. Your work directly impacts how fast Trainium servers move from factory floor to customer - every hour of pipeline latency is lost customer revenue.

Apply for this position