Supervised vs Unsupervised vs Semi / Self Supervised vs Reinforcement Learning

TL;DR

Machine learning is described as applied statistics: models improve from data experience by learning statistical patterns.

Briefing Cornell Notes

Briefing

Machine learning is essentially applied statistics that lets systems improve from experience—either from labeled examples provided by humans or from patterns found in data without labels. The practical payoff is that computers can learn to recognize speech, filter spam, generate text, and recommend content, often by extracting “features” (input variables) and mapping them to an outcome (the label or target). That framing matters because it clarifies why different learning approaches exist: the availability and cost of labels, the structure of the data, and whether feedback arrives immediately or only after actions.

The transcript starts by defining machine learning as a branch of artificial intelligence that uses statistical methods to improve with experience from data. Deep learning is introduced as a subcategory that relies on multi-layer networks to make decisions. Features are described as the input variables—like housing attributes such as number of rooms, bathrooms, and square footage—while the output variable is the target, such as housing price. This input-output setup becomes the backbone for later distinctions among learning types.

Several real-world examples tied to YouTube illustrate where machine learning shows up in everyday products. Auto-generated captions rely on speech recognition that converts spoken audio into text, with accuracy affected by accents. Spam classification in YouTube Studio filters comments into “likely spam” or “needs review” by learning from previously labeled examples. Text generation appears in suggested replies, where the system learns what responses fit based on patterns from prior interactions. Recommender systems—highlighted as the creator’s research area—power personalized video suggestions on platforms like Netflix, Amazon, and YouTube, using signals such as “people who liked this also liked that,” video similarity, and preferences from similar users.

From there, the transcript lays out the main learning categories. Supervised learning trains on a dataset with labels, meaning the “right answer” is known for each example. The goal is to learn a general mapping from features to labels so predictions can be made for new inputs without known outcomes. A key limitation is that labeled data can be expensive or impossible to exhaustively collect—speech recognition, for instance, faces an effectively infinite space of possible sentences.

Unsupervised learning flips the assumption: the data has no labels, so the task is to find structure such as clusters or anomalies. The transcript uses anomaly detection as an example, where most points form clusters representing normal behavior and outliers fall outside those clusters. Because human intuition breaks down in high-dimensional spaces, algorithms help identify outliers that are not visually obvious.

Self-supervised learning is presented as a way to manufacture labels from unlabeled data by masking part of the input and training the model to predict the missing piece. The same idea is applied to text (fill in a missing word or phrase) and images (reconstruct a missing region), often using deep learning due to the complexity of learning these underlying properties.

Semi-supervised learning addresses situations where labels are scarce but unlabeled data is abundant, such as web content tagging or speech tasks. It relies on assumptions like input similarity implying output similarity, and it works best when the labeling step is simpler than full human annotation.

Finally, reinforcement learning is introduced as a reward-driven approach rather than a label-driven one. An agent takes actions in an environment, receives a reward signal (often delayed), and learns through trial and error to maximize long-term success. Chess is used as the archetype: the agent explores moves repeatedly, learns how board states and opponent responses affect winning, and updates its strategy based on outcomes that may only become clear after sequences of actions. The transcript closes by emphasizing that reinforcement learning’s delayed rewards and repeated interaction are central to how it learns.

Cornell Notes

Machine learning is framed as applied statistics: systems improve from experience by learning patterns in data. Supervised learning uses labeled training data to learn a mapping from features (inputs) to labels (outputs), but labels can be costly or infeasible to obtain at scale. Unsupervised learning uses unlabeled data to discover structure such as clusters and anomalies. Self-supervised learning creates labels from unlabeled data by masking part of the input and training the model to predict the missing piece, often using deep learning. Semi-supervised learning sits between the two extremes by using a small labeled set plus a large unlabeled set under assumptions like “similar inputs lead to similar outputs.” Reinforcement learning differs again: an agent interacts with an environment, receives reward feedback (often delayed), and learns via trial and error to maximize long-term reward.

How does supervised learning differ from unsupervised learning in terms of data and goals?

Supervised learning trains on a dataset where each input has an associated label—meaning the “right answer” is known. The goal is to learn a general function from features to labels so predictions can be made for new inputs without known outcomes. Unsupervised learning has no labels; instead, it searches for interesting structure in the data, such as clusters of normal points or outliers for anomaly detection. The transcript’s housing example illustrates supervised learning (features like rooms and square footage map to housing price), while the anomaly detection example illustrates unsupervised learning (clusters represent normal behavior and distant points represent noise/outliers).

Why is labeled data often the bottleneck in supervised learning?

The transcript highlights that labels can be extremely hard to collect when the space of possible outputs is huge. In speech recognition, there are effectively endless combinations of words that could be spoken, so generating labels for every possible sentence is not practical. By contrast, some domains naturally produce labels at scale—housing prices are recorded when houses are sold—so supervised learning becomes more feasible when outcomes are observable and plentiful.

What makes self-supervised learning “self-supervised” rather than purely unsupervised?

Self-supervised learning starts with unlabeled data but manufactures a training signal by transforming the input. A common setup is masking: remove part of a sentence or image and train the model to predict the missing piece. That creates input-output pairs (the missing part becomes the label) even though no human labeled dataset was provided. The transcript links this approach to speech/text semantics (fill in blanks) and image reconstruction (predict missing regions), noting that deep learning is typically used because simpler models struggle with the complexity of learning these underlying properties.

What assumptions does semi-supervised learning rely on to work with few labels?

Semi-supervised learning assumes there is a large amount of unlabeled data and only a small amount of labeled data. It works under an “input/output proximity symmetry” idea: if two inputs are similar, their outputs should also be similar. It also requires that labeling is relatively simple compared with full human annotation—so the intermediate labeling step does not become harder than doing the whole task manually. The transcript frames this as useful for tasks like web page tagging, where human labeling is time-consuming and expensive.

How does reinforcement learning’s feedback mechanism change what the agent learns?

Reinforcement learning focuses on maximizing a reward signal rather than predicting labeled outputs. The agent takes actions in an environment, observes the resulting state, and learns through trial and error. A key feature is delayed rewards: the consequences of actions may only become clear after several steps. The chess example shows this: a move may not immediately determine winning, but it can shift the probability of winning later. The transcript also notes that learning typically requires many repeated interactions (hundreds, thousands of instances) to discover which sequences of actions lead to success.

Review Questions

In your own words, what is the role of labels in supervised learning, and why does that create a practical limitation?
Give one example each of a supervised, unsupervised, and self-supervised task, and explain how labels (or their absence) drive the learning objective.
What does “delayed reward” mean in reinforcement learning, and why does it matter for learning strategies?

Key Points

1
Machine learning is described as applied statistics: models improve from data experience by learning statistical patterns.
2
Features are the input variables (e.g., rooms, bathrooms, square footage), while labels/targets are the outputs to predict (e.g., housing price).
3
Supervised learning requires labeled training data and learns a feature-to-label mapping, but label collection can be infeasible in domains with enormous output variety.
4
Unsupervised learning uses unlabeled data to discover structure such as clusters and anomalies, which becomes especially important in high-dimensional spaces.
5
Self-supervised learning turns unlabeled data into a labeled training problem by masking parts of inputs and predicting the missing content.
6
Semi-supervised learning combines a small labeled set with a large unlabeled set, relying on assumptions like similar inputs producing similar outputs.
7
Reinforcement learning trains an agent to maximize reward through trial and error, often with delayed feedback that depends on action sequences.

Highlights

YouTube’s everyday ML uses include auto-generated captions (speech recognition), spam filtering for comments, suggested replies (text generation), and personalized video recommendations (recommender systems).

Supervised learning’s core challenge is not the algorithm—it’s the difficulty of obtaining labels, especially when the space of possible outputs is effectively infinite (as in speech).

Self-supervised learning creates its own labels by masking inputs and training models to predict the missing parts, enabling learning from unlabeled data.

Semi-supervised learning depends on the idea that similar inputs should lead to similar outputs, letting a small labeled set guide learning across a large unlabeled pool.

Reinforcement learning differs fundamentally by optimizing reward through interaction, with delayed rewards that make multi-step strategy essential.

Topics

Machine Learning Basics
Supervised Learning
Unsupervised Learning
Self-Supervised Learning
Reinforcement Learning

Mentioned

Ciara Feely

Supervised vs Unsupervised vs Semi / Self Supervised vs Reinforcement Learning | Machine Learning