Intelligent Data Selection for Continual Learning of AI Functions

What if the key to better AI isn't the data you collect, but the vast amount of data you discard? This talk explores intelligent on-device data selection.

#1about 3 minutes

Understanding the core use cases for data selection

Data selection is crucial for creating diverse datasets, enabling active learning, detecting corner cases, and building new AI functions.

#2about 4 minutes

Comparing data sources for machine learning models

Data can be sourced from data lakes with heavy compute, targeted test fleets, or the vast customer fleet which offers real-world scenarios but has limited compute.

#3about 2 minutes

Identifying informative data in long-tail distributions

Informative data lies in the long tail of the data distribution, including rare scenarios, weak sensor signals, and atypical class distributions.

#4about 3 minutes

Overview of methods for intelligent data selection

Key methods for selecting valuable data include uncertainty estimation, temporal analysis of predictions, anomaly detection, and using model ensembles.

#5about 3 minutes

Using softmax uncertainty for traffic light detection

An uncertainty trigger aggregates softmax scores from a traffic light detection model to identify and record challenging images like false positives or distant objects.

#6about 4 minutes

Evaluating model improvements from selected data

Proper model evaluation requires testing against not just random data but also corner-case datasets to prevent performance regressions in specific scenarios.

#7about 5 minutes

Deploying data selection triggers to the vehicle fleet

An in-vehicle module called "Instinct" filters data streams in real-time, enabling continual learning by collecting data from new regions to expand a model's operational domain.

#8about 5 minutes

Building a universal data selection framework

A universal framework uses a plugin architecture to support various trigger types and treats perception functions as black boxes by using a framework-independent format like ONNX.

#9about 21 minutes

Overcoming challenges in automotive software deployment

Deploying data science code to vehicles requires bridging Python and C++, ensuring high code quality, and maintaining full traceability from requirements to artifacts.