πΉοΈ Reinforcement Learning β learning by trial & reward
No labelled examples, no neat piles of dots β just a robot π€ dropped into a world. It
tries things, bumps around, and gets rewards (π good!) or penalties (β‘ ouch!). Over many
tries it remembers what paid off β exactly how you train a pet with treats. Press βAuto-trainβ and watch
it get smarter.
Pick a world for the robot
tries trained 0last reward βbest path β
Reward on each try (right = most recent) β watch it climb π
arrow = best move it has learned for that squaregreener square = βa good place to beβredder square = βa bad place to beβ
π€ In a real AI β this is the third way to learn
There's no teacher giving answers (that's supervised / classification) and we're not just finding
groups (that's unsupervised / clustering). Here the robot learns from rewards by
trial and error β this is Reinforcement Learning. The same idea teaches computers to play
chess and video games, helps robots walk, and even helps train chatbots to give better answers (it's the
βRLβ in the training of modern AI). The robot keeps a little scorebook (a Q-table): for every
square, how good is each move? Good moves get a higher score next time.