New Columns and Record-Holders
So far you’ve read the data. Today you start adding to it — creating new columns calculated from the ones you have — and you’ll learn to find the actual record-holder, not just the average.
Open this lesson in Colab💡 Load the data first:
import pandas as pd import seaborn as sns penguins = sns.load_dataset("penguins").dropna()
Making a new column
The body mass is in grams, which gives big numbers. Let’s add a kilograms column. You make a new column just by assigning to it:
penguins["mass_kg"] = penguins["body_mass_g"] / 1000
penguins[["species", "body_mass_g", "mass_kg"]].head()
pandas divided every row’s mass by 1000 at once and stored the results in a brand-new column called mass_kg. New data, computed from old.
Try it 🎯
Add a column flipper_cm that is flipper_length_mm divided by 10 (millimeters to centimeters). Then peek at it.
A column from two columns
You can combine columns. Here’s a rough “chunkiness” score — body mass compared to flipper length:
penguins["chunk"] = penguins["body_mass_g"] / penguins["flipper_length_mm"]
penguins[["species", "chunk"]].head()
Every row gets its mass ÷ its flipper length. (It’s a made-up stat, but inventing measurements is exactly what data scientists do.)
Who holds the record?
.mean() gives the average, but sometimes you want the actual record-holding penguin. .idxmax() finds the row number of the biggest value, and .loc[...] fetches that whole row:
heaviest = penguins.loc[penguins["body_mass_g"].idxmax()]
print(heaviest)
That prints the single heaviest penguin — its species, island, every measurement. Use .idxmin() for the smallest.
Try it 🎯
Find the penguin with the longest flipper (use idxmax() on flipper_length_mm).
Quick stats recap
All on a column, no loop:
print("Average mass (kg):", penguins["mass_kg"].mean())
print("Heaviest (kg):", penguins["mass_kg"].max())
print("Lightest (kg):", penguins["mass_kg"].min())
Predict it 🔮
After you add mass_kg, what’s roughly the average — closer to 4 or to 400? (Around 4 — penguins average about 4 kg. The gram value was ~4200, and dividing by 1000 gives ~4.2.)
Fix the bug 🐞
This tries to find the heaviest penguin’s row but only prints a single number, not the whole penguin. It’s missing the .loc[...] step:
heaviest = penguins["body_mass_g"].idxmax()
print(heaviest)
(.idxmax() gives only the row number. To get the whole penguin, look it up: penguins.loc[penguins["body_mass_g"].idxmax()].)
Your mission 🚀
In your notebook: (1) add a mass_kg column, (2) print the average mass in kg, and (3) find and print the lightest penguin (the whole row) using .idxmin().
What you learned today
- Make a new column by assigning:
penguins["new"] = penguins["a"] / 1000. - pandas computes it for every row at once.
- You can build a column from two columns.
.idxmax()/.idxmin()+.loc[...]find the actual record-holding row, not just the number.
You can find any answer in the data now. Next time, we make those answers visible — your first chart. 📊
Comments