Reinforcement Learning

Pick a world for the robot

tries trained 0 last reward – best path –

Reward on each try (right = most recent) — watch it climb 📈

arrow = best move it has learned for that square greener square = “a good place to be” redder square = “a bad place to be”

🤖 In a real AI — this is the third way to learn There's no teacher giving answers (that's supervised / classification) and we're not just finding groups (that's unsupervised / clustering). Here the robot learns from rewards by trial and error — this is Reinforcement Learning. The same idea teaches computers to play chess and video games, helps robots walk, and even helps train chatbots to give better answers (it's the “RL” in the training of modern AI). The robot keeps a little scorebook (a Q-table): for every square, how good is each move? Good moves get a higher score next time.

🕹️ Reinforcement Learning — learning by trial & reward

Practice 🎯