ML Infrastructure & MLOps Engineer

AllSTEM Connections

Ontario, United States of America

2 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Ontario, United States of America

Tech stack

Airflow

Program Optimization

Continuous Delivery

Continuous Integration

Information Engineering

Distributed Systems

Azure

Cloud Platform System

Delivery Pipeline

Reliability of Systems

Backend

AI Platforms

Kubernetes

Deployment Automation

Machine Learning Operations

Job description

ML Infrastructure & Container Orchestration Distributed Clusters: Architect and maintain high-performance training and serving infrastructure utilizing Google Kubernetes Engine (GKE). Model Optimization: Design and implement high-efficiency optimization pipelines, including advanced knowledge distillation and foundational training tooling. Platform Scaling: Build, monitor, and optimize shared ML systems to ensure maximum infrastructure uptime, pipeline reliability, and cloud cost-efficiency.

Data Engineering & Pipeline Automation Workflow Automation: Build robust, automated pipelines for standardized model training, validation, and continuous deployment (CI/CD for ML). Feature Platforms: Develop scalable data sampling and feature-generation platforms to accelerate research experimentation cycles. Onboarding & Usability: Drive high platform adoption by building intuitive, standardized deployment tools that decrease onboarding speed for research and engineering teams.

Collaboration & Governance Cross-Functional Bridge: Collaborate closely with ML researchers and core software engineers to translate theoretical models into highly scalable production systems. Methodical Execution: Apply a disciplined, data-backed approach to identify infrastructure bottlenecks, reduce time-to-market, and stabilize complex deployments.

Requirements

Experience: o5 to 10+ years of hands-on experience designing and operating large-scale distributed ML platforms. oProven track record of supporting production-grade ML workflows in cloud environments. Technical Mastery: oDeep expertise in container orchestration, specifically GKE (Google Kubernetes Engine) or equivalent enterprise Kubernetes environments. oHands-on experience building scalable ML pipelines (e.g., Kubeflow, Airflow, TFX). oStrong proficiency in distributed training strategies, feature store management, and model serving infrastructure. Soft Skills & Attributes: oPragmatic Mindset: Strong ownership-driven work style focused on consistency, system reliability, and cost-awareness. oEffective Communicator: Ability to collaborate seamlessly with highly technical researchers and platform engineers alike.

Preferred Qualifications Prior experience working within dedicated, tier-1 enterprise ML/AI platform teams. Deep knowledge of distributed systems backend optimization and infrastructure-as-code (IaC).

About the company

For temporary assignments lasting 13 weeks or longer, AllSTEM Connections is pleased to offer major medical, dental, vision, 401k and any statutory sick pay where required. We are committed to working with and providing reasonable accommodations to individuals with disabilities. If you need a reasonable accommodation for any part of the employment process, please contact your staffing representative who will reach out to our HR team. AllSTEM Connections participates in the E-Verify program in certain locations as required by law. Learn more about the E-Verify program. _Participation_Poster_ES.pdf We also consider for employment qualified applicants regardless of criminal histories, consistent with legal requirements, including, if applicable, the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance. Pursuant to applicable state and municipal Fair Chance Laws and Ordinances, we will consider for employment-qualified applicants with arrest and conviction records, including, if applicable, the San Francisco Fair Chance Ordinance. For Los Angeles, CA applicants: Qualified applications with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all