Bring Your Own Data
Here’s the exciting part: everything you learned works on any data, not just penguins. Load a different dataset and the same head, value_counts, groupby, and .plot tools just work. Today you explore new data — including, with an adult’s help, data about something you love.
💡 Load pandas and the chart tool first:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
More built-in datasets
Seaborn comes with several practice datasets. See the list:
print(sns.get_dataset_names())
You’ll spot names like tips, titanic, mpg, flights, and diamonds. Load any of them the same way you loaded penguins. Let’s try tips — data from a restaurant about bills and tips:
tips = sns.load_dataset("tips")
tips.head()
Columns include total_bill, tip, day, time, and size (how many people). Brand-new data, same moves.
The same skills, instantly
Watch your penguin skills work on tips with no changes:
print("Rows and columns:", tips.shape)
print(tips["day"].value_counts())
print(tips.groupby("day")["tip"].mean())
tips.plot(kind="scatter", x="total_bill", y="tip", figsize=(8, 5))
plt.title("Bigger bills, bigger tips?")
plt.xlabel("Total bill")
plt.ylabel("Tip")
plt.show()
In a couple of lines you found which day is busiest, the average tip per day, and whether bigger bills bring bigger tips. You’re a data detective on any data now.
Try it 🎯
- Load the
titanicdataset and run.head()and.shape. - On
tips, find the averagetotal_billfor eachtime(Lunch vs Dinner) withgroupby.
Loading a CSV from the web
Real data often lives in CSV files online. With an adult’s help finding a link, load any of them with read_csv:
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
mydata = pd.read_csv(url)
mydata.head()
That’s exactly how professionals pull in real-world data — sports stats, weather, movies, anything. Ask a grown-up to help you find a kid-safe CSV about a topic you love, then explore it with head, value_counts, groupby, and a chart.
Predict it 🔮
You switch from penguins to tips. Do you need to learn new commands to explore it? (No! .head(), .shape, value_counts, groupby, .plot all work the same — the skills transfer to any dataset.)
Fix the bug 🐞
This tries to explore the tips data but errors, because it’s using a penguin column that tips doesn’t have:
tips = sns.load_dataset("tips")
print(tips["body_mass_g"].mean())
(tips has no body_mass_g — that was a penguin column. Check this dataset’s columns with tips.columns and use one that exists, like tips["total_bill"].)
Your mission 🚀
Pick a new dataset (tips, titanic, or mpg) and fully explore it: print its shape and columns, run a value_counts on a category, compute a groupby average, and make one labeled chart. You’re choosing your own investigation now.
What you learned today
- Your data skills work on any dataset, not just penguins.
sns.get_dataset_names()lists built-in datasets;sns.load_dataset("name")loads one.pd.read_csv("https://...")loads any CSV from the web — how pros get real data.- Always check a new dataset’s
.columnsbefore using them.
Next time is your capstone: a real data report combining everything — questions, charts, and findings. 🕵️
Comments