AWS AI Ops Infrastructure Engineer
Role details
Job location
Tech stack
Job description
Overview / Summary: The AWS AI/ML Cloud Architect is responsible for designing, building, and supporting scalable AWS infrastructure to enable AI and machine learning workloads. This role focuses on architecting robust cloud environments, deploying AI solutions, and supporting tenants in a multi-cloud ecosystem while ensuring performance, security, and operational efficiency., + Design, develop, and enhance a scalable AWS infrastructure platform to support AI and machine learning workloads
-
Support all phases of development and modernization in a multi-cloud environment
-
Implement best practices for AWS cloud architecture, including security, performance, and high availability
-
Integrate and configure AWS services such as AWS SageMaker, AWS Lambda, Amazon Bedrock, and other AI-related services
-
Deploy AWS AI solutions across the platform to meet tenant requirements
-
Automate the deployment of AI services
-
Monitor platform health, performance, and security using tools such as Amazon CloudWatch, AWS Security Hub, and AWS Config
-
Support tenants by troubleshooting issues, optimizing workloads, and assisting with onboarding and scaling
-
Provide ongoing platform support and resolve incidents and service requests in a timely manner
-
Identify opportunities for performance improvement and implement optimization solutions
-
Perform patching, updates, and enhancements to maintain platform security and currency
-
Develop automation scripts to manage cloud infrastructure and improve operational efficiency
Requirements
-
Bachelor's degree with 10+ years of experience, or Master's degree with 8+ years of experience
-
8+ years of experience designing, building, and monitoring AWS infrastructure
-
Experience implementing AWS AI/ML solutions and deploying models
-
Experience with AI/ML technologies
-
Familiarity with managing cloud systems and infrastructure-as-code tools
-
Solid understanding of networking, security, and system architecture
-
Experience with AWS AI Ops tools including Amazon Bedrock, CloudWatch, X-Ray, Model Context Protocol development, and observability tools
-
Experience with tools such as Moogsoft, BigPanda, Splunk, and ServiceNow
-
AWS certifications preferred