ToDo list Embeddings with TensorFlow in JavaScript
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Turn each to-do phrase into a dense embedding vector so semantic similarity can be computed numerically.
Briefing
A practical path to “icon suggestions” for a to-do app hinges on turning short task text into numeric embeddings and then measuring similarity between tasks. Instead of trying to match words directly, the approach encodes each to-do phrase as a vector and uses vector similarity to decide which unseen tasks should share icons with previously seen ones—like mapping “read book” to a book icon and “go for a run” to a running icon.
The core problem starts with how computers handle language. Text and images don’t naturally fit into numeric computation, so the workflow needs a transformation from words into numbers. One-hot encoding is presented as a simple baseline: each word becomes a vector with a single 1 and the rest 0s across a vocabulary. But that representation is both inefficient (mostly zeros) and poorly suited for similarity—two related phrases don’t automatically end up “close” in vector space.
Embeddings address both issues by representing each word (and, in this implementation, each sentence/phrase) as a dense vector with a developer-chosen dimensionality. In the example explanation, embeddings use multiple real-valued dimensions (not just 0s and 1s), which makes it possible to compute similarity between words or phrases. The intuition is that semantically related tasks should produce vectors that align more strongly, enabling an algorithm to group “similar” to-dos even when the wording differs.
To demonstrate the mechanics, the project uses TensorFlow.js’s Universal Sentence Encoder, a pre-trained model available via the TensorFlow models GitHub repository. The model converts text into 512-dimensional embeddings (the transcript notes the output shape as [1, 512] for a single input). The choice is practical: Universal Sentence Encoder is trained for text similarity and sentiment-related tasks, and it works well for short inputs—matching the typical length of to-do entries.
After installing dependencies (including the TensorFlow.js Universal Sentence Encoder package and a scaling library used in the similarity-matrix rendering), the code loads the model inside a React-style project flow. It then calls the model’s embed method to generate embeddings for each to-do phrase. Similarity between two tasks is computed using a dot product of their embedding vectors, implemented via TensorFlow.js tensor operations (dot product with appropriate transposition). The resulting similarity score is treated as a numeric measure of relatedness.
Concrete checks show the method producing higher similarity for pairs expected to be in the same category—such as “hit the gym” paired with “go for a run”—and lower similarity for less related pairings—like “hit the gym” paired with “study math.” Finally, a render function draws a similarity matrix across all task pairs, turning the embedding-based similarity scores into a grid that visually confirms which to-dos cluster together.
The takeaway is that once embeddings and similarity scoring are in place, the “cute list” icon suggestion system can be built on top: tasks that land near each other in embedding space can be assigned the same or similar icons. The transcript positions the embedding and similarity pipeline as the foundation for later videos that connect similarity scores to actual icon recommendations.
Cornell Notes
The workflow for suggesting icons in a to-do app starts by converting each task phrase into a dense numeric embedding, then scoring how similar two tasks are in vector space. One-hot encoding is shown as a weak baseline because it’s sparse and doesn’t naturally support similarity. The implementation uses TensorFlow.js’s Universal Sentence Encoder, which maps short text into 512-dimensional embeddings. Similarity is computed with a dot product between embedding vectors, producing scores that are higher for related task pairs and lower for unrelated ones. Rendering a similarity matrix across all pairs provides a quick sanity check before using the scores to drive icon recommendations.
Why is one-hot encoding a poor fit for “task similarity” in a to-do app?
What changes when using embeddings instead of one-hot vectors?
Why does the Universal Sentence Encoder model fit short to-do phrases?
How does the code compute similarity between two to-do items?
What does the similarity matrix add beyond single pair scores?
Review Questions
- How would one-hot encoding likely fail to group “hit the gym” with “go for a run” if the vocabulary doesn’t share the same words?
- What tensor shapes should you expect when embedding a single to-do phrase with Universal Sentence Encoder, and why does that matter for later dot-product similarity?
- If similarity scores for unrelated tasks are unexpectedly high, what parts of the pipeline (text preprocessing, embedding generation, or dot-product computation) would you inspect first?
Key Points
- 1
Turn each to-do phrase into a dense embedding vector so semantic similarity can be computed numerically.
- 2
One-hot encoding is sparse and doesn’t naturally support similarity; embeddings create a geometry where related phrases align.
- 3
Use TensorFlow.js Universal Sentence Encoder to embed short task text into 512-dimensional vectors.
- 4
Compute similarity between two tasks via a dot product between their embedding vectors using TensorFlow.js tensor operations.
- 5
Validate behavior with targeted pairwise similarity checks before building higher-level features.
- 6
Render a similarity matrix across all task pairs to confirm clustering patterns visually.
- 7
Use the similarity scores as the basis for mapping unseen to-dos to icons learned from similar tasks.