What is Transfer Learning? Transfer Learning in Keras | Fine Tuning Vs Feature Extraction
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Transfer learning reuses a pretrained CNN to avoid costly manual labeling and slow training from scratch.
Briefing
Transfer learning is presented as the practical fix for two bottlenecks in deep learning: collecting and labeling huge datasets, and waiting days for models to train from scratch. Instead of training a CNN on a brand-new dataset, a model pretrained on a large benchmark dataset (like ImageNet) is reused on a new task. The core payoff is immediate—less data is needed and training time drops—because the pretrained network already learned general visual features.
The discussion starts with why building your own model is hard. Training a CNN typically requires thousands of labeled images, and labeling is manual and costly. Even when data exists, training on a large dataset can take a long time, discouraging teams from starting from zero. Pretrained models solve both issues by transferring knowledge from a previously trained CNN to a new dataset.
A key example anchors the concept: ImageNet pretraining. The talk references well-known architectures trained on ImageNet—VGG16, ResNet, and Inception—highlighting that these models were trained on roughly 1,000 classes and millions of images. The pretrained CNN contains two major parts: a convolutional base that extracts features from images, and fully connected layers that perform classification for the original task. Transfer learning works by keeping the convolutional base (which captures reusable, general features like edges and textures) and replacing or adapting the classification layers to match the new problem.
Two transfer-learning strategies are then contrasted: feature extraction and fine-tuning. In feature extraction, the convolutional base is frozen so its weights are not updated; only new classification layers are trained on the target dataset. This is framed as ideal when the target task is similar to the pretrained domain, because early layers learned “primitive” features that tend to generalize.
Fine-tuning is the more flexible approach. It still starts from the pretrained model, but it unfreezes some of the later convolutional layers (not necessarily the earliest ones) so the network can adapt higher-level features to the new task. The transcript uses a concrete scenario—phone versus tablet classification—arguing that if the target classes differ significantly from ImageNet’s categories, fine-tuning becomes more important. The tradeoff is cost: fine-tuning usually takes more time because more layers are trainable.
The second half moves into implementation details in Keras using VGG16. The workflow includes importing the VGG16 model with pretrained ImageNet weights, freezing the convolutional base, adding custom dense layers for the new binary classification, normalizing image pixel values, and training with a binary cross-entropy loss and the Adam optimizer. Results are reported for both approaches: feature extraction reaches about 91.4% test accuracy after applying data augmentation to reduce overfitting, while fine-tuning pushes accuracy higher—around 95.2%—with a noted risk of overfitting on training accuracy.
Overall, the transcript frames transfer learning as a “don’t reinvent the wheel” method: reuse pretrained feature extractors, then either train only the classifier head (feature extraction) or adapt deeper layers (fine-tuning) depending on how closely the new task matches the original training domain.
Cornell Notes
Transfer learning reuses a CNN pretrained on a large dataset (commonly ImageNet) to solve a new classification task with less labeled data and faster training. The pretrained model’s convolutional base learns general visual features, while the final classification layers are replaced to match the new labels. Feature extraction freezes the convolutional base and trains only the new dense layers; fine-tuning unfreezes some later convolutional layers so the model adapts to the target domain. In the Keras/VGG16 example, feature extraction with data augmentation improves test accuracy to about 91.4%, while fine-tuning (with a lower learning rate using RMSprop) reaches about 95.2%, at the cost of greater overfitting risk.
Why does transfer learning reduce both data requirements and training time?
What are the two main parts of a pretrained CNN, and how are they reused?
How does feature extraction differ from fine-tuning in practice?
When should fine-tuning be preferred over feature extraction?
What training setup details were used in the Keras/VGG16 example?
What accuracy results were reported for the two approaches?
Review Questions
- In transfer learning, which layers are typically frozen for feature extraction, and why?
- What changes when moving from feature extraction to fine-tuning, and how does that affect training time and overfitting risk?
- How do data augmentation and learning-rate choice influence the reported accuracy gap between training and validation?
Key Points
- 1
Transfer learning reuses a pretrained CNN to avoid costly manual labeling and slow training from scratch.
- 2
A pretrained model’s convolutional base captures general visual features that often transfer well to new tasks.
- 3
Feature extraction freezes the convolutional base and trains only new classification layers on the target dataset.
- 4
Fine-tuning unfreezes some later convolutional layers to adapt higher-level features when the target task differs from the pretrained domain.
- 5
In the Keras/VGG16 example, data augmentation reduced overfitting and improved feature-extraction test accuracy to about 91.4%.
- 6
Fine-tuning improved test accuracy further to about 95.2%, but training accuracy rose to near 99.8%, indicating overfitting risk.
- 7
Normalization, binary cross-entropy loss, and optimizer choice (Adam vs RMSprop with lower learning rate) are key practical details in the implementation.