ITS Systems Eng II (CIS), Corporate Infrastructure Services, Corporate Infrastructure Services, IT
Role details
Job location
Tech stack
Job description
Corporate Infrastructure Services (CIS) builds and operates the systems that power Amazon's corporate offices globally - the networks, compute platforms, audio-visual systems, and operational tooling that every Amazonian depends on. Our Systems Engineering function owns the operational standards, fleet governance, observability, and engineering practices that keep this infrastructure reliable, secure, and cost-effective across a worldwide footprint. We are looking for a Systems Engineer with a focus on fleet management and operational excellence. You will own the operational health, compliance, and lifecycle management of infrastructure fleets - ensuring that our infrastucture device estates are accurately inventoried, properly governed, and operating within defined standards. You will be the person who knows what we have, where it is, what state it's in, and what needs attention. This role combines hands-on systems engineering with fleet-level operational thinking. You will troubleshoot difficult systems problems across hardware, software, networking, and cloud platforms. You will build automation that scales fleet operations - replacing manual processes with repeatable, reliable mechanisms. You will create and maintain the SOPs, runbooks, and documentation that enable consistent operations across a global footprint. You will identify patterns that affect fleet health, performance, and compliance, and drive improvements that deliver measurable results. You will work autonomously, taking ownership of problems even when they cross domain boundaries. You will be proficient across multiple technology areas - operating systems, networking, compute hardware, cloud platforms, and monitoring tools. You will use AWS services (EC2, Lambda, Systems Manager, CloudWatch, DynamoDB, S3, Fleet Manager) and scripting languages (Python, PowerShell, Bash) to build the automation and tooling that keeps fleet operations efficient and scalable., * Own the operational health and compliance of infrastructure fleets - maintaining accurate inventory, tracking lifecycle status, and ensuring fleet governance standards are met
- Troubleshoot and resolve difficult systems problems across hardware, software, networking, and operating environments, driving root cause resolution
- Build automation that scales fleet operations - device discovery, compliance scanning, firmware tracking, health monitoring, and lifecycle reporting
- Create, review, and improve SOPs, runbooks, and documentation to ensure consistent, repeatable operations across sites and regions
- Identify patterns that affect fleet performance, reliability, availability, or compliance, and deliver automation that addresses them at scale
- Drive operational excellence initiatives that demonstrate measurable improvements to fleet health, compliance posture, and operational efficiency
- Provide insight to engineers across domains (software, hardware, networking, security) on how their components interact to form a system
- Participate in on-call rotations, diagnosing and resolving operational issues across the infrastructure estate
- Mentor junior engineers, helping them understand systems architecture, operational practices, and fleet management disciplines
- Contribute to team design, scoping, and prioritisation discussions with informed operational and fleet perspective
Requirements
Bachelor's degree in Systems Engineering, Computer Science, or related field or relevant work experience
- Experience in site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration
- Experience working with Linux
- Experience in systems engineering
- Experience in any of the following: Python, Java, Perl, PHP, Ruby, Bash, Shell or equivalent
Preferred Qualifications
- Knowledge of TCP/IP and networking protocols such as HTTP and DNS
- Experience designing and developing scripts to automate operational burdens and reviewing scripting changes to ensure they meet the standards for maintainability, scalability and security
- Experience working in 24/7 production environment
- Experience with service-oriented architecture and web services