DevOps EngineerMUFG Investor Services
Role details
Job location
Tech stack
Job description
We are seeking a highly skilled Platform Engineer with a strong background in DevOps workflows and platform engineering best practices to join our AI initiative. This is a high-visibility project focused on deploying and managing AI agents across our infrastructure. You will work closely with the Research & Data Science team, backend and frontend engineers, and other technical teams to build a secure, scalable, and cost-optimized platform for AI workloads.
This position supports AI Engineering and Data Science initiatives by focusing on infrastructure, operations, and platform reliability. The Platform Engineer will work closely with AI Engineers and Data Scientists to ensure they have robust, scalable infrastructure to deploy their work.
You Will:
- Design, deploy, and maintain AI agents on Agent Core MCP servers and MCP gateways.
- Implement and manage observability using OpenTelemetry for logs and traces, integrating with Datadog.
- Ensure security, high availability, and cost optimization across all AI platform components.
- Provide infrastructure and deployment support to AI researchers and engineering teams, enabling integration of cutting-edge technologies into production.
- Perform load testing, token cost measurement, and optimize resource utilization.
- Facilitate external vulnerability assessments and ensure compliance with security best practices.
- Troubleshoot and resolve platform issues promptly to maintain operational stability.
- Contribute to DevOps workflows, CI/CD pipelines, and automation for AI deployments.
- Support evaluation of third-party products related to hosting AI agents or enhancing project capabilities.
- Assist in external audits and maintain documentation for platform architecture and processes.
- Develop and execute automation scripts using the AWS Boto3 SDK to deploy, test, and validate AI platform components across multiple environments.
- Implement Infrastructure as Code (IaC) using Terraform to provision and manage cloud resources for AI workloads, ensuring consistency and scalability.
Requirements
You Have:
- 5+ Years of experience in Platform Engineering / DevOps practice
- Deep understanding of DevOps principles, workflows, and best practices.
- Proven experience in platform engineering and full-stack development.
- Proficiency in API design and integration.
- Hands-on experience with AWS services
- Familiarity with OpenTelemetry, Datadog, and observability tooling.
- Solid coding skills in languages commonly used for backend and automation (e.g., Python, Node.js, Go).
- Knowledge of microservices, container orchestration (Kubernetes/EKS), and cloud-native architectures.
- Extensive knowledge of security practices, cost optimization, and performance testing.
- Interest and familiarity with latest trends in MCPs (Model Context Protocol) and AI agent frameworks.
Preferred Experience
- Working with AI/ML platforms or deploying AI agents in production environments.
- Exposure to high-scale distributed systems and cloud infrastructure.
- Experience in observability and monitoring for complex systems.
- AWS certifications
Project Details
- High visibility within the organization.
- Opportunity to work with cutting-edge AI technologies and collaborate with leading experts.