A big language model has two lives. First it goes to "school" for a long time — this is training, where it actually learns. Later, when you chat with it, it just uses what it learned to write a reply — this is inference. Play with both and spot the difference.
🏫 The model goes to school
It reads text, covers the next word, guesses, checks the real answer, and adjusts its knobs. Millions of times.
💬 The model writes a reply
Its knobs are now frozen. It reads your words, scores the next token, picks one, adds it, and repeats — one token at a time.
| TRAINING (learning) | INFERENCE (answering) | |
|---|---|---|
| Goal | learn to predict words well | use what it learned to reply |
| What it reads | huge piles of text (the whole internet, books…) | just your prompt |
| Its knobs (weights) | change & improve | frozen — never change |
| Is it learning? | ✅ yes | ❌ no, just using |
| Speed & cost | very slow, months, huge computers, very costly | fast, one token at a time, cheap |
| When it happens | once, before you ever use it | every time you chat |