Staff Machine Learning Ops Engineer
Role details
Job location
Tech stack
Job description
As a Staff Machine Learning Ops Engineer at Overstory, you will design and build the foundations of our machine learning operations, ensuring our models are reliable, maintainable, and deliver real value to customers. You'll help architect end-to-end systems for experiment tracking, data management, and scalable deployment. As one of our first dedicated MLOps hires, you'll have significant ownership and influence over our technical direction, balancing best practices with pragmatic delivery to help our teams move fast while maintaining trust and reliability in production. You'll also collaborate closely with data engineers, data scientists, and machine learning engineers, as well as future MLOps colleagues., In collaboration with your data and ML colleagues, you will design, build, and maintain processes and systems such as:
- automated pipelines for training, testing, and deploying ML models
- experiment tracking systems for performance metrics, data and model versioning, and documentation
- processes and systems for the full model lifecycle, including registries, release and rollback strategies, and scalable model serving
- monitoring and alerting for prediction quality, system health, and cost optimization
You will also influence the direction of data and ML within Overstory by:
- advocating for a balance between MLOps best practices and quick slices of value
- aligning technical solutions with customer needs in collaborating with both engineering and product
- ensuring our MLOps systems support regulatory, privacy, and security requirements, We're always looking to diversify our team further, but we're proud of the fact that four out of the nine people on our leadership team are female, 46% of the overall team are female and 20% of the team are people of color. Our team speaks fifteen languages: English, Dutch, French, Spanish, German, Italian, Portuguese, Russian, Luxembourgish, Lithuanian, Bulgarian, Cantonese, Estonian, Danish and Korean., Note: We are only able to hire candidates based in the United States, Canada, and the following European countries: Denmark, Estonia, France, Ireland, the Netherlands, Portugal, Sweden, Switzerland, and the United Kingdom. Some roles may also have additional location-specific requirements. What time zone are you based in?* Select... Our team requires 3-4 hours of daily overlap with the CET time zone to collaborate effectively with our European team. If you're currently in a time zone without this overlap, are you willing to adjust your working hours to meet this requirement? * Select... Do you require visa sponsorship from Overstory to obtain or maintain legal work authorization for employment with Overstory? * Select...
Requirements
- You love working in a remote-first, fast-moving environment where collaboration and adaptability are essential.
- 10+ years of experience with designing and building production-grade ML pipelines and systems - but don't filter yourself out if you feel you're a strong candidate with 5+ years.
- Strong knowledge of experiment tracking, model deployment strategies, data versioning, and monitoring.
- Experience with ML infrastructure tools (e.g. MLflow, Kubeflow, Airflow, feature stores, model registries).
- Familiarity with GCP and VertexAI preferred, but not required.
- Strong communication skills and ability to align technical solutions with business goals.
- Comfortable making architectural decisions and balancing best practices with practical trade-offs.
Nice-to-haves
- Experience in remote-first or globally distributed teams.
- Background in image processing, geospatial, or spatio-temporal data processing.
- Prior work on real-time prediction systems or active-learning loops.
- Knowledge of regulatory, privacy, or security considerations in ML.
- Experience optimizing cloud infrastructure costs for ML workloads.
- Familiarity with Overstory's mission domains (e.g. satellite imagery, forestry, utilities), Do you have experience operationalizing machine learning models in production environments? If so, briefly describe the types of ML tasks involved (e.g., training, inference, monitoring), the infrastructure you worked with (e.g., model serving frameworks, orchestration systems, CI/CD for ML), and the tools or platforms you used. Two sentences to a short paragraph is sufficient.*
Benefits & conditions
- To be part of truly mission-driven work that reduces wildfires, protects Earth's natural resources, and helps solve our climate crisis.
- Flexible working environment with a lot of autonomy. We build our work days around our lives, not the other way around.
- Other benefits like a remote working budget, an educational budget, and time to develop new skills.
- To be surrounded by an excellent, vibrant, smart team who have each other's back and believe in a culture of openness, tolerance and respect.
- Equity and a competitive salary.