Build a Neural Network for Classification from Scratch with PyTorch

TL;DR

Clean the dataset by removing missing/irregular rows before converting to tensors, since PyTorch training expects consistent numeric inputs.

Briefing Cornell Notes

Briefing

A penguin-species classifier built from scratch in PyTorch hinges on three practical steps: turning a cleaned pandas dataset into numeric tensors, splitting data into train/test sets to avoid misleading accuracy, and defining a small feed-forward neural network with linear layers plus a ReLU activation to introduce non-linearity.

The workflow starts with environment setup in Google Colab: installing PyTorch 2.0 and torchview (pinned to version 0.26) for model visualization. The penguins.csv file is downloaded from Google Drive using gdown, then loaded with pandas. Rows with missing or irregular values are removed, leaving 333 records. Features used for prediction are four numeric columns—bill length in millimeters, bill depth in millimeters, flipper length in millimeters, and body mass in grams—while the target label is the penguin species. Species counts are plotted with Seaborn, revealing class imbalance: chin strap appears underrepresented compared with the dominant species (Gentoo/Adelie are the other two categories).

That imbalance matters because a classifier can achieve decent accuracy while failing badly on the rare class. To measure performance honestly, the dataset is split into training and testing subsets using sklearn’s train_test_split with a test_size of 20 (resulting in 266 training examples and 67 test examples). The split is followed by index resets to keep the data tidy. Since PyTorch can’t consume pandas DataFrames directly, a custom create_dataset function converts each subset into tensors: feature tensors are float32, and labels are mapped from species strings to integer IDs via a species_map (Adelie→0, Chinstrap→1, Gentoo→2) stored as torch.long.

With tensors in hand, the neural network is defined as a PyTorch module called PenguinClassifier. It takes four input features and outputs logits for three classes. The architecture is intentionally simple: a first linear layer maps 4→8 neurons, then a ReLU activation is applied, followed by a second linear layer mapping 8→3. The forward pass runs features through linear1, applies ReLU to break pure linear behavior, and then feeds the result into linear2 to produce class scores. Before training, predictions on sample inputs are essentially random—expected because weights haven’t been optimized yet.

To make the model structure tangible, torchview’s draw_graph and visual_graph generate an exported diagram (PNG/SVG) showing the input tensor, hidden layer, ReLU activation, and output layer. The tutorial then zooms in on why ReLU is used: compared with a plain linear function, ReLU clips negative values to zero. A small demonstration plots ReLU versus a linear function and also visualizes how a linear layer’s outputs change once ReLU is applied, showing that negative activations become zero while non-negative values pass through.

By the end, the pipeline is complete up to model definition: data cleaning, tensor conversion, train/test splitting, a two-layer network with ReLU, and visualization of the architecture—setting up the next step of training and evaluating the classifier for correct species prediction.

Cornell Notes

The penguin classifier pipeline turns a cleaned penguins.csv dataset into PyTorch tensors, splits it into train/test sets, and defines a small neural network for 3-class classification. Features come from four numeric columns (bill length, bill depth, flipper length, body mass), while species strings are mapped to integer labels (Adelie=0, Chinstrap=1, Gentoo=2). The model uses two linear layers (4→8 and 8→3) with a ReLU activation in between to introduce non-linearity. ReLU’s role is demonstrated by comparing linear outputs to ReLU-clipped outputs, where negative values become zero. This matters because non-linearity is what lets the network learn more complex patterns than linear models.

Why does the tutorial split the dataset into train and test subsets, and what failure mode does it prevent?

Training and evaluating on the same data can produce overly optimistic results because the model may memorize examples rather than learn general patterns. When deployed on new penguins, performance can drop sharply if the model only learned the training set. Using sklearn’s train_test_split with test_size=20 creates separate train (266 rows) and test (67 rows) subsets so the test set measures performance on data the model hasn’t seen.

How are penguin species labels converted into something PyTorch can train on?

Species are originally strings in the pandas DataFrame, which PyTorch can’t use directly as targets. A species_map assigns integers to each class: Adelie→0, Chinstrap→1, Gentoo→2. Labels are then stored as a torch.long tensor so they can be used with classification losses later.

What exactly are the model inputs and outputs in this classifier?

Inputs are four numeric features per penguin: bill length (mm), bill depth (mm), flipper length (mm), and body mass (grams). The network outputs three logits—one for each species class—produced by the final linear layer mapping 8 hidden neurons to 3 output neurons.

What is the network architecture, layer by layer?

The PenguinClassifier module defines linear1 (4→8), applies ReLU, then applies linear2 (8→3). The forward pass computes x = ReLU(linear1(features)) and returns linear2(x). This structure is a basic feed-forward network suitable for classification.

Why is ReLU necessary, and how does it change the behavior of a linear layer?

Without an activation function, stacking linear layers remains effectively linear, limiting the patterns the network can learn. ReLU (rectified linear unit) clips negative values to zero while leaving non-negative values unchanged. The tutorial demonstrates this by plotting a linear function versus ReLU and by visualizing how a linear layer’s outputs become non-negative after applying ReLU.

How does class imbalance affect classification, and what evidence is shown?

If one species is underrepresented (chin strap is shown as much rarer), a model can achieve high accuracy by mostly predicting the dominant classes while performing poorly on the rare class. The tutorial visualizes species counts with bar charts to show that chin strap has roughly half as many examples as the dominant species, motivating careful evaluation beyond overall accuracy.

Review Questions

What tensor dtypes and shapes are created for features versus labels, and why does labels need torch.long?
How does ReLU introduce non-linearity compared with using only linear layers?
What are the specific train/test sizes produced by the chosen test_size setting, and how does that impact evaluation reliability?

Key Points

1
Clean the dataset by removing missing/irregular rows before converting to tensors, since PyTorch training expects consistent numeric inputs.
2
Use a train/test split (here via train_test_split with test_size=20) to prevent memorization from inflating evaluation results.
3
Convert four numeric feature columns into a float32 tensor and map species strings to integer class IDs stored as torch.long.
4
Define a classification network that outputs logits for all classes (8→3 here) rather than a single prediction value.
5
Insert ReLU between linear layers to break linearity; negative activations become zero, enabling the network to learn more complex decision boundaries.
6
Visualize the architecture with torchview to verify layer connections and tensor flow before training.

Highlights

The classifier’s core structure is simple but complete: 4 input features → Linear(4→8) → ReLU → Linear(8→3) for 3-class logits.

Species imbalance is explicitly quantified and visualized, warning that accuracy can look good even when the rare class is misclassified.

ReLU is demonstrated both mathematically (clipping negatives to zero) and practically (changing linear-layer outputs into non-negative activations).

torchview is used to generate a diagram of the network, making it easier to sanity-check the model before training.

Topics

Penguin Classification
PyTorch Tensors
Train/Test Split
Neural Network Architecture
ReLU Activation

Mentioned

Madeline Valkov
NN
DPI
ReLU
torch
sk1
NN module