Machine Learning Operations Engineer

The University of Texas MD Anderson Cancer Center
Houston, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 220K

Job location

Remote
Houston, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Computer Vision
Azure
Health Informatics
Cloud Computing
Directed Acyclic Graph (Directed Graphs)
Information Engineering
DevOps
Github
Machine Learning
NumPy
Azure
Software Engineering
Enterprise Software Applications
PyTorch
Containerization
Kubernetes
Information Technology
Machine Learning Operations
Software Version Control
Data Pipelines
Docker

Job description

Within this mission-driven environment, the Senior Machine Learning Operations Engineer plays a critical role in building, deploying, and sustaining production-quality machine learning systems. The Senior Machine Learning Operations Engineer partners closely with data scientists, engineers, clinicians, and business stakeholders to ensure AI solutions are scalable, secure, reliable, and aligned with responsible AI principles across UT MD Anderson., AI Model Lifecycle & MLOps

  • Oversee end-to-end AI model lifecycles including training, evaluation, deployment, monitoring, and maintenance of production-quality machine learning models

  • Design and implement CI/CD pipelines for model training, deployment, monitoring, and retraining with a focus on security, scalability, reliability, reproducibility, and performance

  • Implement rigorous testing, versioning, and documentation practices to support reproducibility, risk mitigation, and measurable impact

  • Maintain comprehensive experiment tracking, data lineage, model lineage, and model scorecards

  • Design fallback, rollback, and decommissioning strategies to ensure operational continuity of AI solutions Responsible AI & Governance

  • Promote responsible AI practices by minimizing bias, enhancing fairness, and maximizing transparency in machine learning models

  • Ensure AI lifecycle management aligns with institutional standards and best practices

  • Support assessment, validation, and onboarding of external machine learning models and AI-driven products to minimize organizational risk and maximize value Platform, Infrastructure & Tooling

  • Develop and maintain scalable data pipelines, feature stores, and artifact management systems

  • Deploy and operate ML workloads across cloud and on-premises environments including Azure, AWS, or GCP

  • Utilize containerization and orchestration technologies such as Docker, Kubernetes, and DAG-based tools

  • Apply DevOps and MLOps tools including Azure DevOps, GitHub Actions, and version control systems Stakeholder Engagement & Enablement

  • Collaborate with stakeholders to gather requirements, translate AI concepts into understandable terms, and incorporate feedback

  • Partner with data scientists, ML engineers, and software engineers to integrate models into enterprise systems

  • Deliver training and knowledge sharing to enhance AI understanding and adoption across the organization

  • Report project progress, impact, risks, and recommendations to leadership Innovation & Continuous Learning

  • Stay current with emerging technology trends in AI, MLOps, and healthcare analytics

  • Contribute to internal and external technical communities

  • Foster a culture of continuous improvement, innovation, and learning across teams

  • Perform other duties as assigned, This position may be responsible for maintaining the security and integrity of critical infrastructure, as defined in Section 113.001(2) of the Texas Business and Commerce Code and therefore may require routine reviews and screening. The ability to satisfy and maintain all requirements necessary to ensure the continued security and integrity of such infrastructure is a condition of hire and continued employment.

Requirements

The ideal candidate is a seasoned machine learning or software engineering professional with a strong foundation in MLOps, cloud and on-premises AI platforms, and healthcare-focused AI lifecycle management. This individual typically holds a Bachelor's degree in a relevant technical discipline, with a Master's degree preferred, and brings significant hands-on experience developing, deploying, and maintaining machine learning systems in production environments. Experience leading or designing shared ML services, evaluating third-party AI solutions, and applying responsible AI practices within regulated or clinical settings is highly valued., Education Required: Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline.

Preferred Education: Master's Level Degree

Experience Required : Five years of experience in machine learning engineering, data science, data engineering, and/or software engineering. With Master's degree, three years' experience required. With PhD, one year of experience required.

Preferred Experience: Experience developing MLOps pipelines for computer vision AI models, hands on experience developing custom machine learning algorithms from scratch (e.g., in NumPy or PyTorch, designed and implemented shared machine learning service that is used across multiple teams or production projects, led the development of systems that automate the deployment and maintenance of multiple machine learning models into user-facing products, five years of industry experience in data science, with at least 3 of those years as a Senior Machine Learning Engineer

Benefits & conditions

Minimum $146,500 - Midpoint $183,000- Maximum $219,500 based on a 40-hour work week. Work Location: Remote within Texas only.

Why Us? This role offers the opportunity to directly influence how artificial intelligence is responsibly scaled across UT MD Anderson, contributing to meaningful, long-lasting improvements in cancer care while working alongside experts in data science, engineering, and clinical innovation. The Senior Machine Learning Operations Engineer is supported by an environment that values continuous learning, technical excellence, and sustainable work practices while enabling professional growth and enterprise-level impact.

  • Employer-paid medical coverage starting day one for employees working 30+ hours/week, plus optional group dental, vision, life, AD&D, and disability insurance.
  • Accruals for PTO and Extended Illness Bank, plus paid holidays, wellness, childcare, and other leave options.
  • Tuition Assistance Program after six months of service and access to extensive wellness, fitness, and employee resource groups.
  • Defined-benefit pension through the Teachers Retirement System, voluntary retirement plans, and employer-paid life and reduced salary protection programs., The University of Texas MD Anderson Cancer Center offers excellent benefits, including medical, dental, paid time off, retirement, tuition benefits, educational opportunities, and individual and team recognition.

About the company

The University of Texas MD Anderson Cancer Center is seeking a Senior Machine Learning Operations Engineer to support enterprise-wide artificial intelligence initiatives within Data Impact & Governance. The Senior Machine Learning Operations Engineer will join a multidisciplinary environment that integrates multidimensional data, advanced analytics, and machine learning to drive sustainable, responsible AI solutions that improve cancer care outcomes.

Apply for this position