Staff AI Platform Engineer
Role details
Job location
Tech stack
Job description
This is not a standard DevOps posting. We are looking for one unusually capable, AI-native engineer to own our entire platform engineering and SRE function - using autonomous agents, LLM-powered pipelines, and MCP-based tooling as force multipliers to do the work of a team, on-site, in close partnership with our engineering leadership.
You will inherit a mature, fully containerized AWS estate (9 EKS clusters, 27 accounts, 228 Kubernetes nodes), an Akamai CDN layer managing live traffic splits, GitHub Actions + Jenkins CI/CD pipelines for a Webpack 5 micro-frontend monorepo, and an operational AI agent platform - OpsWhisperer - already in production monitoring 25 AWS accounts with a 91% autonomous resolution.
Your job is to extend all of it, automate what remains manual, and be the person who makes every deployment, incident, and infrastructure change happen with speed, precision, and intelligence.
SCOPE OF OWNERSHIP
What you'll own
AWS Multi-Account Infrastructure
- EKS clusters across dedicated AWS accounts
- EC2 worker nodes via Auto Scaling Groups
- SQS pipelines
- AWS Bedrock (Claude) for AI agent workloads
Kubernetes & Containerization
- EKS clusters
- Node group mgmt
- Kops clusters alongside EKS
- Multiple environment tiers with full blast-radius isolation
CI/CD & Release Management
- Multiple Repos
- GitHub Actions workflows + Jenkins pipeline management
- Turbo build system across multiple micro-frontend packages
- Canary release gating and rollback automation
CDN & Traffic Management
- Akamai Property Manager config
- Phased Release Cloudlet for Canary and Production split
- Security, Throttling and Monitoring
- Jenkins-driven cache invalidation
Observability & Incident Response
- Elastic/Kibana
- CloudWatch across all AWS accounts
- Business performance monitoring
- SQS backlog + pipeline health alerting
- On-call ownership, proactive, AI-assisted triage, AWS EKS · Kubernetes · Kops · AWS Organizations · Auto Scaling Groups · AWS SQS · AWS Bedrock · CloudWatch
CDN & Networking
Akamai Property Manager · Phased Release Cloudlet · Fast Purge · · Content Protector
CI/CD & Frontend
GitHub Actions · Jenkins · Turbo (monorepo) · Webpack 5 Module Federation · Canary / Blue-Green Deployments
AI & Agentic
MCP (Model Context Protocol) · Claude API / AWS Bedrock · Azure Bot Service · Microsoft Entra ID · Operational AI Agents
Requirements
Do you have experience in Tooling?, * 10+ years of hands-on DevOps, SRE, or platform engineering experience in production AWS cloud environments.
- Deep AWS expertise: EKS, EC2, SQS, CloudWatch, IAM, Organizations, and multi-account architectures
- Strong Kubernetes skills: cluster operations, node group management, workload isolation, taints/tolerations, auto-scaling
- Experience with Akamai or equivalent enterprise CDN - configuration, purge operations, traffic routing rules
- CI/CD ownership: GitHub Actions and/or Jenkins pipeline design, monorepo build systems, release gating
- Production experience building or operating AI agents - LLM integration, autonomous workflow design, prompt engineering
- Proficiency in Node.js and/or Python for automation, tooling, and MCP server development
- Observability stack ownership: Elastic/Kibana, log analysis, alerting design, SLO/SLI instrumentation
- Comfortable owning on-call responsibility for a production e-commerce platform with significant revenue exposure
- Strong written and verbal communication - will interface with engineering leadership and present findings to executives
- Based in or willing to relocate to the Los Angeles / Long Beach area for on-site work
Benefits & conditions
3.33.3 out of 5 stars 4910 Airport Plaza Drive, Long Beach, CA 90815 $166,000 - $232,000 a year, Pulled from the full job description
- Opportunities for advancement