Capstone: A Data Detective Report


You’ve reached the end of Phase III! You can load real data, explore it, group it, and chart it. Now you’ll put it all together into a data report — exactly what a real data scientist produces: a few good questions, answered with numbers and pictures, plus what you learned.

Open this lesson in Colab

💡 Work in your notebook, one cell at a time. Load the data and tools first:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
penguins = sns.load_dataset("penguins").dropna()

The shape of a report

A data report answers a handful of questions. For each one: state the question, compute an answer, and (when it helps) draw a chart. Here’s a full mini-report on the penguins — read it, then build your own.

Question 1: How many of each species?

print("Penguins per species:")
print(penguins["species"].value_counts())

penguins["species"].value_counts().plot(kind="bar", color="teal")
plt.title("Penguins per species")
plt.xlabel("Species")
plt.ylabel("Count")
plt.show()

Question 2: Which species is heaviest on average?

avg_mass = penguins.groupby("species")["body_mass_g"].mean()
print("Average body mass (g) per species:")
print(avg_mass)

avg_mass.plot(kind="bar", color="coral")
plt.title("Average weight by species")
plt.xlabel("Species")
plt.ylabel("Body mass (grams)")
plt.show()

Question 3: Do longer flippers mean heavier penguins?

penguins.plot(kind="scatter", x="flipper_length_mm", y="body_mass_g", figsize=(8, 5))
plt.title("Longer flippers, heavier penguins")
plt.xlabel("Flipper length (mm)")
plt.ylabel("Body mass (grams)")
plt.show()

The findings

Finish with a short write-up — just print statements stating what you discovered:

print("WHAT I DISCOVERED:")
print("1. Adelie penguins are the most common in this data.")
print("2. Gentoo penguins are the heaviest species on average.")
print("3. Penguins with longer flippers tend to be heavier.")

That’s a complete data report: questions, evidence (numbers + charts), and conclusions.

Build your own report 🚀

Now make yours. Pick a dataset — the penguins, or tips/titanic/mpg from last lesson, or a CSV an adult helps you find. Then:

  1. Choose three questions you’re curious about.
  2. Answer each with a stat (value_counts, groupby, .mean(), .max()…) and a labeled chart where it helps.
  3. Save your best chart with plt.savefig("my_chart.png").
  4. Write your findings as a few print lines — three things you discovered.

Tip: build it one question at a time, in its own cells, and run as you go. When it’s done, you’ve made something real — show a parent or friend your discoveries.

What you learned in Phase III

You started Phase III barely able to open a notebook. Now you can:

  • Work in Colab/Jupyter with cells.
  • Load real data with pandas (read_csv, load_dataset) and clean it (dropna).
  • Explore it: head, shape, columns, select columns, filter rows, sort.
  • Analyze it: value_counts, groupby, .mean()/.max()/.min(), new columns, record-holders.
  • Chart it: bar, histogram, scatter, line — with titles, labels, and color — and save it.

That’s genuine data science, done with the same tools professionals use.

What’s next: Phase IV

In the final phase, you’ll meet Artificial Intelligence. You’ll use AI that already exists (the computer recognizing a photo, reading the mood of a sentence), then teach a computer to learn from examples, and even train your own mini-AI — all in Colab. And here’s the secret: AI is built on exactly the data skills you just mastered. The detective becomes the teacher. See you in Phase IV! 🤖

Comments