Building our Neural Network - Deep Learning and Neural Networks with Python and Pytorch p.3
Based on sentdex's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Define the model by subclassing `nn.Module` and call `super().__init__()` so PyTorch can register layers correctly.
Briefing
The core work in this installment is building a complete feed-forward neural network in PyTorch: defining a model class, wiring fully connected layers, specifying how data flows through them, and producing class-probability outputs via log-softmax. The practical payoff is immediate—once the forward pass is in place, random “image-like” tensors can be pushed through the network to generate 10-class predictions, setting the stage for training in the next tutorial.
The model is implemented as a subclass of `nn.Module`, with an `__init__` method that constructs four linear layers. The input size is set to 784 because the network expects flattened MNIST-style images: 28×28 pixels are reshaped into a single vector of length 784. The hidden layers are configured as three stages of 64 neurons each (`784 → 64 → 64 → 64`), using `nn.Linear` for each fully connected transform. The final layer maps to 10 outputs (`64 → 10`), matching ten classes labeled 0 through 9.
A key PyTorch detail is the need to call `super().__init__()` during initialization. Omitting it triggers a cryptic error (“cannot assign module before module init call”), which the tutorial uses as a cautionary example of what goes wrong when the parent `nn.Module` initialization isn’t executed.
Data flow is defined in a `forward(self, X)` method. The input tensor is passed through `fc1`, `fc2`, and `fc3`, with a ReLU activation applied after each linear layer using `F.relu`. The output layer (`fc4`) is treated differently: it does not use ReLU. Instead, the network returns `F.log_softmax(X, dim=1)` to produce a log probability distribution over classes. The tutorial emphasizes that ReLU is appropriate for hidden layers to prevent values from exploding, while the output layer should be constrained to a probability-like interpretation suitable for multi-class classification.
To prove the wiring works, the tutorial generates random data shaped like a batch of 28×28 images. Passing a raw 28×28 tensor into the model causes a size mismatch because the network expects flattened vectors of length 784. The fix is reshaping with `view`, using `-1, 28*28` (or equivalent shapes like `1, 28, 28`) so the batch dimension is handled correctly. After reshaping, the model produces outputs for each of the 10 classes.
Finally, the tutorial notes that the network’s first passes are effectively untrained: weights aren’t meaningfully initialized yet, so predictions are not reliable. Still, the forward pass now returns something usable for later steps—computing loss and gradients—so the next stage can adjust weights to improve accuracy. It also highlights PyTorch’s flexibility: logic can be embedded inside `forward`, enabling more complex conditional architectures later, while gradients are handled automatically.
Cornell Notes
A PyTorch neural network is built by subclassing `nn.Module`, defining four fully connected layers, and implementing a `forward` method that controls how tensors move through the network. The input is flattened from 28×28 into 784 features, then passed through hidden layers of size 64 with ReLU activations. The output layer produces 10 class scores, converted into log probabilities using `F.log_softmax(X, dim=1)` for multi-class classification. A common pitfall is forgetting `super().__init__()`, which causes module initialization errors. Another pitfall is feeding unflattened image tensors, which triggers size mismatch until reshaping with `view(-1, 28*28)` is applied.
Why does the network expect 784 inputs instead of 28×28 images directly?
What does `super().__init__()` do in a PyTorch `nn.Module` subclass, and what happens if it’s omitted?
How is the forward pass structured, and where does ReLU belong?
Why use `F.log_softmax` on the output layer instead of ReLU?
What does `dim=1` mean in `F.log_softmax(X, dim=1)`?
Why does reshaping with `view(-1, 28*28)` fix the size mismatch error?
Review Questions
- What exact tensor shape must be fed into the network for the first linear layer to work, and how does `view(-1, 28*28)` ensure it?
- Explain why ReLU is applied after `fc1`, `fc2`, and `fc3` but not after the final `fc4` layer.
- In `F.log_softmax(X, dim=1)`, what axis is normalized, and why is that axis the correct one for multi-class outputs?
Key Points
- 1
Define the model by subclassing `nn.Module` and call `super().__init__()` so PyTorch can register layers correctly.
- 2
Use `nn.Linear` layers to map `784 → 64 → 64 → 64 → 10`, where 784 comes from flattening 28×28 images.
- 3
Implement a `forward(self, X)` method that applies ReLU activations after hidden layers to control activation growth.
- 4
Convert final class scores into log probabilities with `F.log_softmax(X, dim=1)` for multi-class classification.
- 5
Flatten inputs before passing them to the network; unflattened 28×28 tensors cause size mismatch errors.
- 6
Reshape batches with `view(-1, 28*28)` so the batch dimension is flexible while features match the expected 784 input size.