Machine Learning Engineer
Role details
Job location
Tech stack
Job description
As a Machine Learning Engineer on the AI & ML Platform team at Atlassian, you will work on the core infrastructure to allow software engineers, ML engineers & data scientists to develop, train, evaluate, deploy, and operate Machine Learning models and pipelines. Along with that, you will build systems for product teams like Jira & Confluence to solve their specific challenges in building ML solutions. You will use your software development expertise to solve difficult problems, tackling infrastructure and architecture challenges.
About Central AI Org
Our organisation is designed to accelerate AI innovation across all our products and platform, provide cohesive AI experiences and setup up an Atlassian AI infrastructure for the future. Our purpose is to:
- Develop horizontal AI capabilities and infrastructure that can be leveraged across all products.
- Establish a centralized Search, Q&A, and Conversational AI system that integrates seamlessly with all Atlassian products.
- Explore integration of Atlassian products with AI products outside Atlassian.
About AI & ML Platform Team
Our team is building the foundations to democratise Machine Learning for Atlassian's teams, customers and ecosystem. Our goal is to create tools that are user-friendly and reliable, ensuring that Atlassian teams can easily embrace and utilize them. These tools will facilitate the development, deployment, measurement, and operation of AI & ML experiences. They will seamlessly integrate with the Atlassian Data Platform, enabling teams to efficiently and rapidly incorporate AI & ML capabilities into their workflows. The focus is on providing a smooth and hassle-free experience for Atlassian users, allowing them to leverage the power of AI & ML without any complications.
In this role, you'll get the chance to:
- Design, build, and operate large-scale backend and infrastructure services for ML training and inference
- Collaborate with your teammates to solve complex problems, from technical design to launch.
- Deliver cutting-edge solutions that are used by other Atlassian teams and products to build AI features that reach millions of customers.
- Own services end-to-end - from design, implementation, infrastructure-as-code and CI/CD, through observability, on-call, and incident response.
Requirements
- 4+ years of experience building and operating large-scale backend, infrastructure, or ML systems in a cloud environment with strong programming skills in at least one of: Java/Kotlin, Go, or Python,
- ML lifecycle exposure: experience supporting or building systems for training, deployment, and serving of ML/LLM models (online or batch).
- Large-scale system design: strong experience designing and implementing distributed, fault-tolerant, high-throughput services, ideally for ML or data/compute platforms.
- Cloud infrastructure: hands-on experience with AWS and/or GCP, including networking, security, and compute services (EC2/GKE/EKS, GPUs, autoscaling, load balancing).
- MLOps and automation: experience automating deployment and operations of ML workloads - CI/CD, config and secret management, rollout/rollback strategies, monitoring and alerting. (nice to have)
- Kubernetes and containers: practical experience building and running services or control planes on Kubernetes (deployments, operators/controllers, scaling, observability). (nice to have)
About the company
Atlassian's mission is to unleash the potential of every team. We build agile, DevOps, IT service management, and work management software to help teams organize, discuss, and complete shared work. Over 300,000 companies worldwide rely on Atlassian to work better together and deliver results. With Atlassian Rovo, teams can now find, learn from, and act on organizational knowledge faster using AI-powered search, chat, and automation agents—boosting productivity and collaboration across all their tools.