Test It, See the Mistakes, Make It Better
Your digit recognizer scores high — but a number isn’t the whole story. A real machine-learning engineer looks at what the model got wrong and asks how to do better. Today you’ll watch your AI make predictions, spot its mistakes, and try to improve it.
Open this lesson in Colab💡 In Colab. Start by training the model from last lesson:
from sklearn.datasets import load_digits from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt digits = load_digits() X, y = digits.data, digits.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) model = KNeighborsClassifier() model.fit(X_train, y_train)
Watch it predict
Ask the model to label the test images, then look at a few — the picture, what the AI guessed, and the true answer:
predictions = model.predict(X_test)
for i in range(5):
plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
plt.title("AI guessed " + str(predictions[i]) + " — really " + str(y_test[i]))
plt.show()
Each test example is 64 numbers; .reshape(8, 8) folds them back into an 8×8 picture so you can see what the AI saw.
Hunt for the mistakes
Mostly it’s right — but find the ones it missed:
for i in range(len(X_test)):
if predictions[i] != y_test[i]:
plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
plt.title("AI said " + str(predictions[i]) + ", but it's " + str(y_test[i]))
plt.show()
Look closely at the ones it got wrong. Often you’d struggle too — a sloppy 4 that looks like a 9, a 3 that could be an 8. The AI’s mistakes usually happen on genuinely confusing, messy digits. That’s reassuring: it’s failing where the data is hard, not randomly.
Try it 🎯
How many did it get wrong out of the test set? Count them:
wrong = 0
for i in range(len(X_test)):
if predictions[i] != y_test[i]:
wrong = wrong + 1
print("Got", wrong, "wrong out of", len(X_test))
Try to do better
Two simple ways to push accuracy up:
# 1. Try a different number of neighbors
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
print("With 3 neighbors:", model.score(X_test, y_test))
# 2. Give it MORE training data (smaller test set)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
model.fit(X_train, y_train)
print("With more training data:", model.score(X_test, y_test))
More examples and the right settings usually help. This tuning — “what makes it better?” — is a big part of real machine-learning work.
Think about it 🔮
The AI’s mistakes are mostly messy, hard-to-read digits. Why is that actually a good sign? (It means the model learned the real pattern — it only trips on genuinely ambiguous cases, the same ones that would fool a person, rather than failing on clear, easy digits.)
Your mission 🚀
Investigate your recognizer: show 5 of its mistakes as pictures, count how many it got wrong total, then try to beat your accuracy by changing n_neighbors and test_size. Record your best score and what settings produced it.
What you learned today
- Don’t stop at the score — look at the actual mistakes.
.reshape(8, 8)turns a row of pixels back into a viewable image.- A good model’s errors cluster on genuinely hard, messy examples.
- More data and better settings (
n_neighbors) can improve accuracy — tuning is real ML work.
Next time is the capstone — train a model of your own, and have an important conversation about using AI wisely. 🤖
Comments