Software Engineer
Role details
Job location
Tech stack
Job description
Evaluate and adopt emerging technologies to simplify the ML ecosystem and improve developer experience and stability
-
Drive pragmatic innovation by balancing future capabilities with production readiness and maintainability
-
Increase team agility through reusable frameworks, clear abstractions, and automation that reduce downstream friction
-
Resolve technical roadblocks and mitigate platform risks related to scalability, reliability, and integration
-
Accelerate delivery and improve system reliability through robust CI/CD pipelines and automated operational workflows
Requirements
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field
- Strong Python software engineering experience building and maintaining production-grade libraries, services, and platforms; Linux, scripting, and automation required (Java/Groovy a plus)
- Experience building and operating cloud-native systems on AWS, including core services and managed ML platforms (e.g., SageMaker); Azure or Google Cloud Platform exposure beneficial
- Solid DevOps and CI/CD expertise with Jenkins, Git-based workflows, Docker, and scalable containerized deployments
- Infrastructure-as-Code experience using CloudFormation and/or Terraform/OpenTofu
- Hands-on experience operating ML systems in production, including deployment, inference, monitoring, and reliability-focused operations
- Working knowledge of applied ML concepts, data pipelines, and diverse data types to support scalable ML platforms
- Strong background in distributed systems, high-throughput workloads, asynchronous processing, and fault-tolerant architectures
- Proven ability to support business-critical systems, troubleshoot production issues, and drive stability and performance improvements
- 5+ years building Python-based cloud applications or platforms with ownership of complex production systems
- Excellent communication and collaboration skills, including technical documentation and cross-functional teamwork
- Ability to thrive in fast-paced, ambiguous environments within a broader AI and data ecosystem
qualifications:
-
Partner with Data Scientists to package, scale, and operationalize models for secure, reliable production use
-
Collaborate with application and platform engineers to integrate ML capabilities with enterprise gateways and services
-
Design and operate enterprise-scale ML systems serving tens of millions of users with high reliability and performance
-
Build platform tooling for model and data observability, including drift detection, quality monitoring, and automated diagnostics