A dataset is a big group of data we use to teach and test an AI model.
Each tiny piece (a record or data point) has attributes β like an animal photo's colour,
size and shape. Datasets come in many flavours (numbers, text, pictures, sound, time, places) and we always
split them into three parts. Play below to see how!
Playground 1 Β· Split the dataset π°
Here are 100 animal photos. Slide the controls to share them between
Training, Validation and
Test. The three must always add up to 100!
Quick presets:
Training60%
Validation20%
Test20%
60 training20 validation20 test
π€ Why three parts? Think like a student.Training = learning the chapter. Validation = practising sums to get better while you still
study. Test = the final exam on questions you've never seen. The computer never peeks
at the test photos while training β that's the only way to know if it really learned, instead of
just memorising!
Playground 2 Β· Sort the data π§Ί
Datasets are named by what's inside them. Tap an item, then tap the
bucket it belongs to. Score: 0 / 7
1οΈβ£ Pick an item
2οΈβ£ Drop it in the right bucket
π¦ Structured vs UnstructuredStructured data sits in neat rows and columns, like a spreadsheet β super easy to search.
Unstructured data has no fixed format, like a messy pile of photos, songs or videos. Some
datasets are hybrid β a bit of both!