Get AI summaries of any video or article — Sign up free
Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search thumbnail

Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search

Venelin Valkov·
4 min read

Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Sentence Transformers score semantic similarity by embedding sentences into fixed-length vectors and comparing them with cosine similarity.

Briefing

Sentence Transformers (SBERT) turn sentences into fixed-length embeddings and then use cosine similarity to score semantic closeness—making it practical to run semantic search and rank results by meaning rather than keywords. The core idea is simple: encode each sentence into a vector, compare vectors with cosine similarity, and treat the resulting score as a measure of how related two texts are. That approach matters because it avoids the slowness of running full BERT-style models for every pair during search, enabling fast similarity scoring across a corpus.

The transcript traces this capability back to SBERT’s Siamese-network training approach. Instead of processing one sentence at a time, Siamese setups feed two inputs through the same neural network, then apply a scoring head (often with a sigmoid) to output a similarity score in a bounded range. This design is what allows SBERT-style models to reuse precomputed embeddings: once sentence vectors are generated for the corpus, queries can be embedded once and compared efficiently against all stored vectors.

A key performance motivation comes from the original work on Siamese BERT embeddings: pairwise scoring can be dramatically accelerated. The transcript cites an implementation result where a Siamese approach reduced runtime from roughly 65 hours to about five seconds, illustrating why embedding-based similarity is attractive for real-world semantic search.

For the hands-on portion, the workflow starts by installing Sentence Transformers and loading a strong pretrained model from the library’s leaderboard: mpnet-base-v2 (an MPNet model trained by Microsoft). The model is checked for its maximum sequence length (384) and then used to embed a small corpus of sentences. Each sentence becomes a 768-dimensional vector, and cosine similarity is used to compare a query embedding against corpus embeddings.

The example queries show how semantic search behaves in practice. A cryptocurrency-related query ranks crypto sentences near the top, while powerlifting-related sentences land at the bottom. Another query about deadlifting similarly surfaces powerlifting content and pushes unrelated cryptocurrency material down—demonstrating that similarity is driven by meaning and context, not surface wording.

Finally, the transcript shows how to fine-tune an existing model to match a custom similarity notion. Using a small labeled training setup, it constructs an InputExample with two sentences and a target similarity score (e.g., 0.9). Training is run with model.fit for multiple epochs using a cosine-similarity-based loss. After saving and reloading the trained model, the similarity score between the chosen sentence pair increases to align with the target label, confirming that the embedding space can be adjusted for a specific task.

Overall, the takeaway is that SBERT-style models provide a fast, embedding-first route to semantic similarity and search, and they can be tuned to better reflect domain-specific judgments of what “similar” should mean.

Cornell Notes

Sentence Transformers (SBERT) convert sentences into 768-dimensional embeddings and use cosine similarity to score semantic closeness. The method relies on Siamese networks: the same model processes two sentences, and a scoring head produces a bounded similarity score. This embedding approach enables efficient semantic search because corpus embeddings can be precomputed, then compared to a query embedding without rerunning a full pairwise transformer each time. The transcript demonstrates semantic search with the pretrained model mpnet-base-v2, including ranking results for cryptocurrency and powerlifting queries. It also shows fine-tuning by training on labeled sentence pairs (e.g., target similarity 0.9) and verifying that the similarity score increases after training.

Why do Siamese networks make sentence similarity search faster than running BERT-style models for every pair?

Siamese setups run the same encoder on two inputs and then score their relationship, which supports an embedding-first workflow. With Sentence Transformers, each sentence in the corpus is encoded once into a vector. Later, a query sentence is encoded once, and cosine similarity is computed between the query vector and all stored corpus vectors. That avoids repeated expensive pairwise transformer inference during search.

How does cosine similarity fit into SBERT-style semantic search?

After encoding sentences into embeddings (the transcript reports a 768-dimensional vector for each sentence), cosine similarity measures the angle-based closeness between vectors. Higher cosine similarity corresponds to more semantic overlap. The transcript uses the library’s cosine similarity utility and then the built-in semantic_search function to rank corpus items by similarity score.

What does the mpnet-base-v2 model contribute in the example workflow?

The transcript selects mpnet-base-v2 from the Sentence Transformers leaderboard and loads it through the library. It checks the model’s max sequence length (384) and then uses the model to embed a corpus and a query. The resulting embeddings drive the ranking behavior: cryptocurrency queries surface crypto-related sentences, while powerlifting queries surface powerlifting-related sentences.

What does the semantic_search output structure represent?

semantic_search returns a list of lists. Each inner list contains candidate matches with fields including a similarity score and a Corpus ID. The transcript notes that results are sorted by score (not by original corpus order), and it iterates through the ranked items to print the score and the corresponding corpus sentence.

How does fine-tuning change similarity scores in the transcript’s example?

Fine-tuning is done by creating an InputExample containing two sentences and a target similarity label (e.g., 0.9). A training dataset and a cosine-similarity-based loss are set up, then model.fit runs for several epochs. After saving and reloading the trained model, the similarity score between the same sentence pair increases toward the target label, showing the embedding space has been adjusted.

Review Questions

  1. In an embedding-based semantic search pipeline, which computations can be precomputed for the corpus, and which computations must be done per query?
  2. How would you expect the ranking to change if you replaced cosine similarity with a different similarity metric (e.g., dot product) while keeping embeddings the same?
  3. What kinds of labeled examples would you need to fine-tune SBERT effectively for a domain-specific definition of “semantic similarity”?

Key Points

  1. 1

    Sentence Transformers score semantic similarity by embedding sentences into fixed-length vectors and comparing them with cosine similarity.

  2. 2

    Siamese-network training enables an embedding-first workflow where corpus embeddings can be precomputed for fast search.

  3. 3

    The transcript demonstrates semantic search using mpnet-base-v2, embedding each sentence into a 768-dimensional vector.

  4. 4

    semantic_search ranks corpus candidates by similarity score and returns results as score/Corpus ID pairs.

  5. 5

    Fine-tuning can align similarity scores with custom labels by training on labeled sentence pairs using a cosine-similarity-based loss.

  6. 6

    After fine-tuning, re-encoding with the saved model can noticeably increase similarity for the targeted sentence pair.

Highlights

SBERT-style similarity turns sentences into embeddings once, then uses cosine similarity to rank matches quickly—avoiding expensive pairwise transformer runs at query time.
The Siamese setup feeds two sentences through the same encoder and uses a scoring head to produce a bounded similarity score.
Using mpnet-base-v2, a cryptocurrency query ranks crypto-related sentences highest while powerlifting content drops to the bottom.
A simple fine-tuning loop with a labeled pair (target similarity 0.9) can push the model’s similarity score toward that label after training.

Topics

Mentioned