Senior AI/ML Engineer - MLOps & Production AI Systems - Remote or Hybrid in MN/D
Role details
Job location
Tech stack
Job description
- Design, build, and maintain end-to-end ML platforms and pipelines (training, validation, deployment, and monitoring)
- Productionize ML models using batch and real-time inference architectures (APIs, streaming, event-driven systems)
- Develop and manage ML lifecycle workflows using tools such as MLflow, Kubeflow, SageMaker, or Azure ML
- Build and maintain CI/CD pipelines for ML (CI/CT/CD), including automated testing, validation, and model promotion
- Containerize and deploy ML workloads using Docker and Kubernetes, ensuring scalability and reliability
- Implement infrastructure-as-code (Terraform or equivalent) for reproducible and secure ML environments
- Develop monitoring and observability solutions for model performance, drift, latency, and data quality
- Automate retraining and redeployment workflows based on performance degradation or new data availability
- Partner with cross-functional teams to define and enforce ML engineering standards and best practices
- Ensure compliance with enterprise governance, security, and Responsible AI requirements
You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field OR 4+ years of equivalent experience
- 5+ years of experience in ML Engineering / MLOps with production deployment of machine learning systems
- 3+ years of experience with ML lifecycle tools (MLflow, Kubeflow, SageMaker, Azure ML, or similar)
- 3+ years of experience with Docker and Kubernetes in production environments
- 3+ years of experience building CI/CD pipelines for ML using Git-based workflows and automation tools
- 2+ years of experience with cloud platforms (AWS, Azure, or GCP) for ML workloads
- Experience with real-time and batch inference systems (e.g., Kafka, Kinesis, Event Hubs)
- Solid programming experience in Python (5+ years) with ML frameworks (PyTorch, TensorFlow, or scikit-learn), * 7+ years of experience in ML engineering or distributed systems
- Experience with feature stores (e.g., Feast) and data versioning systems
- Hands-on experience with distributed data processing frameworks (Spark, Ray)
- Experience with workflow orchestration tools (Airflow, Dagster, Prefect)
- Experience with multi-cloud or hybrid cloud ML deployments
- Knowledge of Responsible AI, bias detection, and model explainability techniques
- Familiarity with observability tools (Prometheus, Grafana, OpenTelemetry)
- Proven contributions to open-source ML or MLOps projects
*All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy
Benefits & conditions
Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far-reaching choice of benefits and incentives. The salary for this role will range from $91,700 to $163,700 annually based on full-time employment. We comply with all minimum wage laws as applicable.