Dora Petrella

How We Built a Machine Learning-Based Recommendation System (And Survived to Tell the Tale)

How do you find the perfect substitute for an out-of-stock item? Learn how we adapted a natural language model to solve this critical e-commerce challenge.

How We Built a Machine Learning-Based Recommendation System (And Survived to Tell the Tale)
#1about 5 minutes

Defining the business need for product recommendations

A recommendation system for substitute products is needed across multiple touchpoints to prevent lost sales from out-of-stock items.

#2about 2 minutes

Analyzing the limitations of the existing recommender

The previous system, based on the Jaccard coefficient, produced low-quality recommendations, particularly for new or unpopular items.

#3about 5 minutes

Using the Prod2Vec algorithm for recommendations

The Prod2Vec algorithm, adapted from Word2Vec, learns product relationships by analyzing co-occurrence within user session context windows.

#4about 2 minutes

Improving predictions with Meta-Prod2Vec and metadata

Incorporating product metadata like category and brand into the model (Meta-Prod2Vec) significantly improves recommendation quality for long-tail items.

#5about 2 minutes

Implementing the end-to-end MLOps pipeline

The production system uses dbt for data transformation, a Vertex AI pipeline for model training, and Elasticsearch for efficient vector similarity search.

#6about 3 minutes

Evaluating model performance with offline and online metrics

Offline metrics like NDCG confirmed model quality, while mirror traffic analysis showed a 45% increase in product recommendation coverage.

#7about 3 minutes

Visualizing product relationships with embedding projector

Using TensorFlow's Embedding Projector tool reveals how the model groups similar products into distinct clusters in a high-dimensional space.

#8about 3 minutes

Adopting pragmatic baselines and automated data analysis

Key project takeaways include using simple business-logic baselines for benchmarking and automating exploratory data analysis within the ML pipeline itself.

#9about 1 minute

Understanding the project team and final timeline

The project was completed in nine months by a cross-functional team of data engineers, data scientists, and software developers.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

From learning to earning

Jobs that call for the skills explored in this talk.