Senior DevOps Engineer
Role details
Job location
Tech stack
Job description
background in DevOps workflows and platform engineering best practices to join our AI initiative. This is a high-visibility project focused on deploying and managing AI agents across our infrastructure. You will work closely with the Research & Data Science team, backend and frontend engineers, and other technical teams to build a secure, scalable, and cost-optimized platform for AI workloads. This position supports AI Engineering and Data Science initiatives by focusing on infrastructure, operations, and platform reliability. The Platform Engineer will work closely with AI Engineers and Data Scientists to ensure they have robust, scalable infrastructure to deploy their work. You Will: Design, deploy, and maintain AI agents on Agent Core MCP servers and MCP gateways. Implement and manage observability using OpenTelemetry for logs and traces, integrating with Datadog. Ensure security, high availability, and cost optimization across all AI platform components. Provide infrastructure and deployment support to AI researchers and engineering teams, enabling integration of cutting-edge technologies into production. Perform load testing, token cost measurement, and optimize resource utilization. Facilitate external vulnerability assessments and ensure compliance with security best practices. Troubleshoot and resolve platform issues promptly to maintain operational stability. Contribute to DevOps workflows, CI/CD pipelines, and automation for AI deployments. Support evaluation of third-party products related to hosting AI agents or enhancing project capabilities. Assist in external audits and maintain documentation for platform architecture and processes. Develop and execute automation scripts using the AWS Boto3 SDK to deploy, test, and validate AI platform components across multiple environments. Implement Infrastructure as Code (IaC) using Terraform to provision and manage cloud resources for AI workloads, ensuring consistency and scalability. #LI-Hybrid Qualifications You Have: 5+ Years of experience in Platform Engineering / DevOps practice Deep understanding of DevOps principles, workflows, and best practices. Proven experience in platform engineering and full-stack development. Proficiency in API design and integration. Hands-on experience with AWS services Familiarity with OpenTelemetry, Datadog, and observability tooling. Solid coding skills in languages commonly used for backend and automation (e.g., Python, Node.js, Go). Knowledge of microservices, container orchestration (Kubernetes/EKS), and cloud-native architectures. Extensive knowledge of security practices, cost optimization, and performance testing. Interest and familiarity with latest trends in MCPs (Model Context Protocol) and AI agent frameworks. Preferred Experience Working with AI/ML platforms or deploying AI agents in production environments. Exposure to high-scale distributed systems and cloud infrastructure. Experience in observability and monitoring for complex systems. AWS certifications Project Details High visibility within the organization. Opportunity to work with cutting-edge AI technologies and collaborate with leading experts. Additional Information What's in it for you to join MUFG Investor Services? Take a look at our careers site and you'll find everything you'd expect working with one of the fastest-growing businesses at one of the world's largest financial groups. Now take another look. Because it's how we defy expectations that really defines us. You'll feel that difference in all kinds of ways. Our vibrant CULTURE. Connected team. Love of innovation, laser client focus. So, why settle for the ordinary? Apply now for your next Brilliantly Different opportunity. We thank all candidates for applying; however, only those proceeding to the interview stage will be contacted. MUFG is an equal opportunity employer. Responsibilities The Platform Engineer will design, deploy, and maintain AI agents while ensuring security, high availability, and cost optimization across all AI platform components. They will also provide infrastructure support to AI researchers and engineering teams, enabling the integration of cutting-edge technologies into production.
Requirements
DevOps, Platform Engineering, AWS, OpenTelemetry, Datadog, Python, Node.js, Go, Microservices, Kubernetes, Infrastructure as Code, Terraform, AI, Security, Cost Optimization, Performance Testing