New Columns and Record-Holders


So far you’ve read the data. Today you start adding to it — creating new columns calculated from the ones you have — and you’ll learn to find the actual record-holder, not just the average.

Open this lesson in Colab

💡 Load the data first:

import pandas as pd
import seaborn as sns
penguins = sns.load_dataset("penguins").dropna()

Making a new column

The body mass is in grams, which gives big numbers. Let’s add a kilograms column. You make a new column just by assigning to it:

penguins["mass_kg"] = penguins["body_mass_g"] / 1000
penguins[["species", "body_mass_g", "mass_kg"]].head()

pandas divided every row’s mass by 1000 at once and stored the results in a brand-new column called mass_kg. New data, computed from old.

Try it 🎯

Add a column flipper_cm that is flipper_length_mm divided by 10 (millimeters to centimeters). Then peek at it.

A column from two columns

You can combine columns. Here’s a rough “chunkiness” score — body mass compared to flipper length:

penguins["chunk"] = penguins["body_mass_g"] / penguins["flipper_length_mm"]
penguins[["species", "chunk"]].head()

Every row gets its mass ÷ its flipper length. (It’s a made-up stat, but inventing measurements is exactly what data scientists do.)

Who holds the record?

.mean() gives the average, but sometimes you want the actual record-holding penguin. .idxmax() finds the row number of the biggest value, and .loc[...] fetches that whole row:

heaviest = penguins.loc[penguins["body_mass_g"].idxmax()]
print(heaviest)

That prints the single heaviest penguin — its species, island, every measurement. Use .idxmin() for the smallest.

Try it 🎯

Find the penguin with the longest flipper (use idxmax() on flipper_length_mm).

Quick stats recap

All on a column, no loop:

print("Average mass (kg):", penguins["mass_kg"].mean())
print("Heaviest (kg):", penguins["mass_kg"].max())
print("Lightest (kg):", penguins["mass_kg"].min())

Predict it 🔮

After you add mass_kg, what’s roughly the average — closer to 4 or to 400? (Around 4 — penguins average about 4 kg. The gram value was ~4200, and dividing by 1000 gives ~4.2.)

Fix the bug 🐞

This tries to find the heaviest penguin’s row but only prints a single number, not the whole penguin. It’s missing the .loc[...] step:

heaviest = penguins["body_mass_g"].idxmax()
print(heaviest)

(.idxmax() gives only the row number. To get the whole penguin, look it up: penguins.loc[penguins["body_mass_g"].idxmax()].)

Your mission 🚀

In your notebook: (1) add a mass_kg column, (2) print the average mass in kg, and (3) find and print the lightest penguin (the whole row) using .idxmin().

What you learned today

  • Make a new column by assigning: penguins["new"] = penguins["a"] / 1000.
  • pandas computes it for every row at once.
  • You can build a column from two columns.
  • .idxmax() / .idxmin() + .loc[...] find the actual record-holding row, not just the number.

You can find any answer in the data now. Next time, we make those answers visible — your first chart. 📊

Comments