Lina Weichbrodt
Is my AI alive but brain-dead? How monitoring can tell you if your machine learning stack is still performing
#1about 2 minutes
Why defining the business problem is crucial for monitoring
Machine learning projects often have vague requirements, making it essential to define success KPIs before implementing monitoring.
#2about 3 minutes
A real-world use case for loan rejection prediction
A machine learning model is used to predict loan application rejections upfront, saving significant monthly costs from credit agency queries.
#3about 3 minutes
Using precision and recall for model training
Precision and recall are chosen as the key metrics to balance the model's accuracy in predicting rejections against the volume of applications it can identify.
#4about 2 minutes
Choosing gradient boosted trees for tabular data
Gradient boosted trees are selected over deep learning for this tabular data problem because they offer comparable performance with much faster training times.
#5about 2 minutes
Using existing tools like Grafana for ML monitoring
You can leverage your existing software monitoring stack like Grafana and Prometheus for machine learning, which is often sufficient and avoids adopting immature tools.
#6about 6 minutes
Monitoring model outcomes with a holdout set
When the true outcome is unknown due to model intervention, a holdout set of live traffic is used to calculate production metrics like precision and recall.
#7about 3 minutes
Translating stakeholder fears into monitoring signals
Address stakeholder concerns by identifying their worst-case scenarios and creating specific metrics to monitor and alert on those potential issues.
#8about 4 minutes
Monitoring the model's response distribution for drift
Track the distribution of model outputs over time using statistical distance metrics like the D1 distance to detect shifts that indicate a problem.
#9about 2 minutes
Creating quality heuristics as sanity checks
Develop simple, human-understandable heuristics, such as the average rank of a user's favorite item, to serve as an intuitive quality indicator.
#10about 2 minutes
Monitoring input data to detect training-serving skew
Compare the distribution of input features between the training environment and live production to identify and debug training-serving skew.
#11about 4 minutes
Key takeaways for practical machine learning monitoring
Monitoring in production focuses on detecting problems with indicator KPIs, not measuring absolute quality, and can be done by working backwards from business impact.
#12about 15 minutes
Q&A on career paths and delayed outcomes
The Q&A session covers topics such as career entry points into machine learning, handling delayed outcomes in business processes, and stakeholder communication.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
35:33 MIN
Ensuring AI reliability with monitoring and data governance
Navigating the AI Revolution in Software Development
00:11 MIN
The challenge of operationalizing production machine learning systems
Model Governance and Explainable AI as tools for legal compliance and risk management
02:38 MIN
Common challenges in developing machine learning applications
Data Fabric in Action - How to enhance a Stock Trading App with ML and Data Virtualization
24:42 MIN
Overcoming the challenges of productionizing AI models
Navigating the AI Revolution in Software Development
09:51 MIN
Understanding the machine learning development lifecycle
Leverage Cloud Computing Benefits with Serverless Multi-Cloud ML
00:20 MIN
The lifecycle for operationalizing AI models in business
Detecting Money Laundering with AI
06:34 MIN
Understanding the machine learning workflow and MLOps
Machine Learning in ML.NET
01:01 MIN
Understanding the role and challenges of MLOps
The Road to MLOps: How Verivox Transitioned to AWS
Featured Partners
Related Videos
Deployed ML models need your feedback too
David Mosen
The state of MLOps - machine learning in production at enterprise scale
Bas Geerdink
Detecting Money Laundering with AI
Stefan Donsa & Lukas Alber
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
From Traction to Production: Maturing your LLMOps step by step
Maxim Salnikov
How AI Models Get Smarter
Ankit Patel
You are not my model anymore - understanding LLM model behavior
Andreas Erben
Overview of Machine Learning in Python
Adrian Schmitt
From learning to earning
Jobs that call for the skills explored in this talk.









AIML -Machine Learning Research, DMLI
Apple
Python
PyTorch
TensorFlow
Machine Learning
Natural Language Processing