Bring Your Own Data


Here’s the exciting part: everything you learned works on any data, not just penguins. Load a different dataset and the same head, value_counts, groupby, and .plot tools just work. Today you explore new data — including, with an adult’s help, data about something you love.

Open this lesson in Colab

💡 Load pandas and the chart tool first:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

More built-in datasets

Seaborn comes with several practice datasets. See the list:

print(sns.get_dataset_names())

You’ll spot names like tips, titanic, mpg, flights, and diamonds. Load any of them the same way you loaded penguins. Let’s try tips — data from a restaurant about bills and tips:

tips = sns.load_dataset("tips")
tips.head()

Columns include total_bill, tip, day, time, and size (how many people). Brand-new data, same moves.

The same skills, instantly

Watch your penguin skills work on tips with no changes:

print("Rows and columns:", tips.shape)
print(tips["day"].value_counts())
print(tips.groupby("day")["tip"].mean())
tips.plot(kind="scatter", x="total_bill", y="tip", figsize=(8, 5))
plt.title("Bigger bills, bigger tips?")
plt.xlabel("Total bill")
plt.ylabel("Tip")
plt.show()

In a couple of lines you found which day is busiest, the average tip per day, and whether bigger bills bring bigger tips. You’re a data detective on any data now.

Try it 🎯

  1. Load the titanic dataset and run .head() and .shape.
  2. On tips, find the average total_bill for each time (Lunch vs Dinner) with groupby.

Loading a CSV from the web

Real data often lives in CSV files online. With an adult’s help finding a link, load any of them with read_csv:

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
mydata = pd.read_csv(url)
mydata.head()

That’s exactly how professionals pull in real-world data — sports stats, weather, movies, anything. Ask a grown-up to help you find a kid-safe CSV about a topic you love, then explore it with head, value_counts, groupby, and a chart.

Predict it 🔮

You switch from penguins to tips. Do you need to learn new commands to explore it? (No! .head(), .shape, value_counts, groupby, .plot all work the same — the skills transfer to any dataset.)

Fix the bug 🐞

This tries to explore the tips data but errors, because it’s using a penguin column that tips doesn’t have:

tips = sns.load_dataset("tips")
print(tips["body_mass_g"].mean())

(tips has no body_mass_g — that was a penguin column. Check this dataset’s columns with tips.columns and use one that exists, like tips["total_bill"].)

Your mission 🚀

Pick a new dataset (tips, titanic, or mpg) and fully explore it: print its shape and columns, run a value_counts on a category, compute a groupby average, and make one labeled chart. You’re choosing your own investigation now.

What you learned today

  • Your data skills work on any dataset, not just penguins.
  • sns.get_dataset_names() lists built-in datasets; sns.load_dataset("name") loads one.
  • pd.read_csv("https://...") loads any CSV from the web — how pros get real data.
  • Always check a new dataset’s .columns before using them.

Next time is your capstone: a real data report combining everything — questions, charts, and findings. 🕵️

Comments