Platform Engineer
Role details
Job location
Tech stack
Job description
The Platform Engineer will support and enhance an AWS-based machine learning platform, with a primary focus on Amazon SageMaker. This role is responsible for maintaining and improving cloud infrastructure that enables data scientists and machine learning engineers to build, train, and deploy models efficiently and securely. The ideal candidate brings foundational cloud and infrastructure-as-code experience, strong troubleshooting abilities, and an interest in supporting machine learning workloads., * Contribute to the development, maintenance, and enhancement of AWS infrastructure supporting Amazon SageMaker and related machine learning services.
- Develop and maintain Infrastructure as Code (IaC) using Terraform to provision, manage, and version cloud resources.
- Assist in building and maintaining CI/CD pipelines and automation for infrastructure and ML workflows.
- Support the design, implementation, and maintenance of core platform components following cloud architecture and security best practices.
- Monitor platform reliability, performance, and cost efficiency, and assist in troubleshooting infrastructure and deployment issues.
- Collaborate with data scientists and ML engineers to understand platform requirements and improve user experience.
- Participate in the creation of documentation, operational runbooks, and knowledge-sharing materials.
Requirements
Understanding of core AWS components and services, including:
- Networking: subnets, routing, DNS, security groups
- Security: IAM roles and policies, encryption, secrets management, least-privilege access models
- EC2/ECS
- S3
- Lambda
- SQS
Experience with:
- Python development (e.g., Lambda functions, scripting)
- Terraform or other infrastructure-as-code tools
- CI/CD tools such as Jenkins, GitHub Actions, or similar
Additional requirements:
- Exposure to Linux-based environments and shell scripting
- Strong problem-solving skills and a willingness to learn new technologies
- Ability to collaborate effectively with cross-functional teams
Preferred Qualifications
- Exposure to Amazon SageMaker or other machine learning platform tools
- Basic understanding of machine learning workflows (training, inference, data pipelines, model lifecycle) and execution patterns (batch, real-time)
- Familiarity with containerization and orchestration technologies such as Docker or Kubernetes
- Experience developing applications integrated with OIDC