On the straight and narrow path - How to get cars to drive themselves using reinforcement learning and trajectory optimization

From erratic exploration to a perfectly optimized path in 150 laps. See how a race car learns to drive itself using reinforcement learning.

#1about 5 minutes

A novel approach to self-driving cars

This project uses reinforcement learning to enable a car to learn on-the-drive, unlike pre-trained models that rely on static data.

#2about 3 minutes

Setting a baseline with a human driver

A human driver completes three laps on the physical racetrack to establish a benchmark time for the AI to compete against.

#3about 5 minutes

Observing the AI learn across 1500 laps

The AI's driving behavior evolves from random and unstable after 15 laps to smooth and optimized after 150, showing diminishing returns by 1500 laps.

#4about 6 minutes

Understanding the core concepts of reinforcement learning

A recap of the demo's results leads into an explanation of reinforcement learning's core ideas like agents, environments, actions, and maximizing rewards.

#5about 5 minutes

Applying Q-learning with states, actions, and Q-tables

Q-learning uses a table of states and actions to store learned values, making it easy to inspect and update the agent's knowledge.

#6about 4 minutes

Key parameters for tuning the Q-learning algorithm

The algorithm's behavior is controlled by key parameters like the learning rate (alpha), discount factor, and the exploration factor (epsilon).

#7about 1 minute

The technical architecture of the race track demo

The demo integrates PS4 controllers, an Arduino, the Watson IoT platform, a Node.js backend, and a React.js frontend.

#8about 3 minutes

Real-world application with Thyssen Krupp

A collaboration with Thyssen Krupp applies these reinforcement learning concepts to a full-size vehicle to learn and adapt its driving style.

#9about 7 minutes

Q&A on data, constraints, and local optima

The speakers answer audience questions about the importance of data quality, how the car stays on the track, and how the algorithm avoids local optima.