Looking Around the Data


You’ve loaded the penguins. Now let’s poke around and pull out exactly what we want. A data detective rarely needs the whole table at once — usually just a column or two.

Open this lesson in Colab

💡 Start each lesson by loading the data in your first cell:

import pandas as pd
import seaborn as sns
penguins = sns.load_dataset("penguins").dropna()

Grabbing one column

Use square brackets with a column name to pull out that column:

penguins["species"].head()

A single column is called a Series — basically a labeled list (hello again, Phase II!). It holds one value per penguin.

Grabbing several columns

Want more than one column? Pass a list of names (note the double brackets):

penguins[["species", "body_mass_g"]].head()

The outer brackets say “select”; the inner brackets are the list of columns. The result is a smaller table with just those columns.

Try it 🎯

  1. Show just the island column: penguins["island"].head().
  2. Show two columns together: species and flipper_length_mm.
  3. Show three columns of your choice.

What values appear?

.unique() lists the different values in a column — great for categories like species or island:

print(penguins["species"].unique())
print(penguins["island"].unique())

You’ll see three species (Adelie, Chinstrap, Gentoo) and three islands. Now you know what’s in this data.

A column is just a list of numbers

Number columns can do math, all at once. These work on the whole column:

print("Heaviest penguin:", penguins["body_mass_g"].max())
print("Lightest penguin:", penguins["body_mass_g"].min())
print("Average weight:", penguins["body_mass_g"].mean())

No loop needed — pandas applies it to every value for you. (You could write a for loop like Phase II, but pandas does it in one line.)

Try it 🎯

  1. Find the longest flipper_length_mm.
  2. Find the average bill_length_mm.

Predict it 🔮

What’s the difference between penguins["species"] and penguins[["species"]]? Run both. (One bracket gives a single column (a Series); two brackets give a small table (a DataFrame) with one column. Double brackets = “a list of columns to select.”)

Fix the bug 🐞

This is supposed to grab two columns but errors. Look at the brackets:

penguins["species", "island"].head()

(Selecting several columns needs a list inside — double brackets: penguins[["species", "island"]].)

Your mission 🚀

Make a smaller table with three columns you find interesting (say species, body_mass_g, flipper_length_mm), show its first 8 rows, and then print the average body mass for the whole dataset.

What you learned today

  • penguins["col"] grabs one column (a Series — a labeled list).
  • penguins[["a", "b"]] grabs several columns (note the double brackets).
  • .unique() shows the distinct values in a column.
  • A number column does .max(), .min(), .mean() all at once — no loop needed.

Next time we get pickier: keeping only the rows that match a condition — like “only the heavy penguins.” 🏋️

Comments