System Development Engineer II, AWS DC Central Operations
Role details
Job location
Tech stack
Job description
CIAT is the unified source for Infrastructure Operations data and BI solutions across Amazon's global data center fleet. We build and run the analytics platform that Central Ops leadership uses to manage rack install, decom, repair, logistics, capacity optimization, and network operations. The platform spans a large-scale datalake, multiple Redshift clusters, hundreds of Airflow pipelines, hundreds of AWS accounts, dozens of production QuickSight dashboards, and thousands of active users.
We need a System Development Engineer II to own platform infrastructure - the AWS accounts, application services, deployment automation, security posture, and emerging GenAI capabilities that the rest of the team builds on top of. You'll work with a senior SysDE who sets the technical direction, alongside other SysDEs and a cross-functional team of Data Engineers and BIEs who depend on your platform to ship their work.
The role is split between keeping production running (account governance, security remediation, deployment pipelines, on-call) and building new capabilities (Bedrock integration, QuickSight Q topic infrastructure, agent frameworks, self-service tooling). The GenAI platform work is early-stage - you'll help define the patterns, not just implement someone else's design.
Key job responsibilities
- Own AWS infrastructure across hundreds of accounts - cross-account access patterns, IAM governance, service control policies, LakeFormation permissions
- Build and maintain infrastructure-as-code (CDK/CloudFormation) for production services including LakeSQL, Validation Engine, TEMPO, Langley, CIAuth, and QuickSight
- Build deployment automation and CI/CD that lets Data Engineers and BIEs ship without waiting on a SysDE - the goal is self-service, not gatekeeping
- Stand up GenAI platform infrastructure - Bedrock integration, QuickSight Q topic configuration, agent systems (Spaces, Topics, Knowledge Bases, Actions), cross-account data access for AI workloads
- Drive security and compliance - Mirador/AppSec findings, patching, least-privilege IAM, security posture across production accounts
- Mentor junior SysDEs - break down complex problems into implementable pieces, review CRs, coach on architecture and operational thinking
- Reduce KTLO through automation, legacy system migration (Hammerstone * Airflow/NAWS), and better tooling
A day in the life You might start the day investigating why a cross-account LakeFormation permission is blocking a QuickSight data source, then write a CDK construct so the same misconfiguration can't happen again. Review a CR from a teammate building a Lambda for automated QuickSight group provisioning. Pair with a DE to figure out why their Airflow DAG can't reach a Glue catalog in another account. After lunch, design the infrastructure for a new Bedrock-powered feature in Langley, or write a runbook for something you've seen break twice.
The through-line: you build systems that scale through automation, not through you personally doing things. When something breaks, you fix it and then fix the system. You're always asking "how do I make this self-service so I'm not the bottleneck?"
Requirements
Experience in automating, deploying, and supporting large-scale infrastructure
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
- Experience with CI/CD pipelines build processes
- 3+ years of non-internship professional software development experience
- 2+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- Knowledge of systems engineering fundamentals (networking, storage, operating systems), Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with enterprise-scale infrastructure or development-based cloud programs/projects in a related industry
- Experience contributing to the definition and implementation of automation opportunities within an operations environment
- Experience in Redshift, or experience in any Bigdata architecture and experience in technical support
- Experience working with Data & AI related technologies, including, but not limited to, AI/ML, GenAI, Analytics, Database, and/or Storage
- Experience architecting/operating solutions built on AWS
- Experience in mentoring, leading, or managing more junior engineers
- Experience with large distributed IT systems
Benefits & conditions
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, OH, COLUMBUS - 129,200.00 - 174,800.00 USD annually USA, WA, Seattle - 129,200.00 - 174,800.00 USD annually