Is my AI alive but brain-dead? How monitoring can tell you if your machine learning stack is still performing

Machine learning models can become "brain-dead" in production without triggering a single alert. Learn how to spot the signs before it impacts your bottom line.

#1about 2 minutes

Why defining the business problem is crucial for monitoring

Machine learning projects often have vague requirements, making it essential to define success KPIs before implementing monitoring.

#2about 3 minutes

A real-world use case for loan rejection prediction

A machine learning model is used to predict loan application rejections upfront, saving significant monthly costs from credit agency queries.

#3about 3 minutes

Using precision and recall for model training

Precision and recall are chosen as the key metrics to balance the model's accuracy in predicting rejections against the volume of applications it can identify.

#4about 2 minutes

Choosing gradient boosted trees for tabular data

Gradient boosted trees are selected over deep learning for this tabular data problem because they offer comparable performance with much faster training times.

#5about 2 minutes

Using existing tools like Grafana for ML monitoring

You can leverage your existing software monitoring stack like Grafana and Prometheus for machine learning, which is often sufficient and avoids adopting immature tools.

#6about 6 minutes

Monitoring model outcomes with a holdout set

When the true outcome is unknown due to model intervention, a holdout set of live traffic is used to calculate production metrics like precision and recall.

#7about 3 minutes

Translating stakeholder fears into monitoring signals

Address stakeholder concerns by identifying their worst-case scenarios and creating specific metrics to monitor and alert on those potential issues.

#8about 4 minutes

Monitoring the model's response distribution for drift

Track the distribution of model outputs over time using statistical distance metrics like the D1 distance to detect shifts that indicate a problem.

#9about 2 minutes

Creating quality heuristics as sanity checks

Develop simple, human-understandable heuristics, such as the average rank of a user's favorite item, to serve as an intuitive quality indicator.

#10about 2 minutes

Monitoring input data to detect training-serving skew

Compare the distribution of input features between the training environment and live production to identify and debug training-serving skew.

#11about 4 minutes

Key takeaways for practical machine learning monitoring

Monitoring in production focuses on detecting problems with indicator KPIs, not measuring absolute quality, and can be done by working backwards from business impact.

#12about 15 minutes

Q&A on career paths and delayed outcomes

The Q&A session covers topics such as career entry points into machine learning, handling delayed outcomes in business processes, and stakeholder communication.