Sam Witteveen — Person Summaries
AI-powered summaries of 95 videos about Sam Witteveen.
95 summaries
DeepSeek OCR - More than OCR
DeepSeek OCR’s headline idea isn’t better document reading—it’s a new way to compress long text into far fewer “vision tokens,” then decode it back...
Introducing Gemini CLI
Google’s Gemini team is rolling out Gemini CLI, a command-line interface that turns Gemini Code Assist into an agent-like workflow for editing files,...
Opal - Google Labs Killer NEW App
Google Labs’ Opal is a no-code workflow builder aimed at turning natural-language requests into working LLM “mini apps,” with built-in steps for web...
LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain retrieval QA becomes practical at scale once the document embeddings live in a persistent ChromaDB vector store on disk. Instead of...
LangChain - Using Hugging Face Models locally (code walkthrough)
Running Hugging Face models locally inside LangChain is the practical workaround when Hugging Face Hub access fails—especially for conversational...
LangGraph Crash Course with code examples
LangGraph is positioned as a new way to run LangChain-based LLM agents by modeling agent behavior as a graph-driven state machine rather than a fixed...
Microsoft Launches 10 NEW AI Agents
Microsoft’s Ignite announcements push autonomous AI agents deep into enterprise software—especially Dynamics 365—by rolling out 10 new agent...
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Large language models are powerful at generating text, but they don’t plug cleanly into the rest of software—especially when apps need state,...
Gemini 3 Pro - The Model You've Been Waiting For
Gemini 3 Pro is positioned as a long-horizon, tool-using model built for “clever, concise, direct” work—less about a flashy personality and more...
Anthropic's Meta Prompt: A Must-try!
Anthropic’s “Metaprompt” tool turns weak, one-off prompts into a structured, model-ready instruction set—by using Claude itself to generate the final...
LangExtract - Google's New Library for NLP Tasks
Standard NLP tasks—like sentiment classification, named-entity extraction, and entity disambiguation—have long relied on fine-tuned BERT-style...
DeepSeek's New Image Model - Janus Pro
DeepSeek’s Janus Pro stands out for combining two capabilities in one multimodal system: it can answer questions about images (using a SigLIP-based...
LlamaOCR - Building your Own Private OCR System
LlamaOCR turns screenshots and scanned documents into editable Markdown by using a vision-capable Llama 3.2 model hosted via Together AI. The...
Understanding ReACT with LangChain
ReACT (Reasoning and Action) is a prompting-and-agent pattern designed to make large language models do multi-step problem solving by alternating...
Mistral OCR - Multimodal & Multilingual OCR
Mistral has launched a non-open-source OCR system delivered through an API that turns scanned documents, PDFs, and images into structured, multimodal...
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Custom tools are the key lever for making LangChain conversational agents more useful—and the biggest practical lesson is that tool use often...
Gemini 1.5 Pro for Code - Part 01
Gemini 1.5 Pro for Code can ingest a real GitHub-style repository, then generate working multi-agent Python code that interacts with the repo’s...
Testing Gemini 1.5 and a 1 Million Token Window
Gemini 1.5 Pro marks a major step up for long-context AI: it pairs a newly updated model with a dramatically expanded context window—up to 1,048,576...
Google Stitch Just Became an AI Figma (And It's Free)
Google Labs’ Stitch has shifted from simple screenshot-to-design experiments into an agentic, Figma-like workflow for generating full UI...
Gemini Browser Use
Google’s Gemini 2.0 push for “browser use” is starting to look less like a closed, proprietary demo and more like a buildable automation...
Advanced RAG 01 - Self Querying Retrieval
RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is...
smolagents - HuggingFace's NEW Agent Framework
Hugging Face’s new “smolagents” framework pushes agent building toward “code agents”: instead of forcing an LLM to emit JSON-style plans, it can...
Ollama - Loading Custom Models
Ollama can run fine-tuned models that aren’t already listed locally—by downloading the right quantized weights (GGUF) and creating a small Ollama...
The Gemini Interactions API
Google’s new Gemini Interactions API reframes how developers build with Gemini models by shifting from simple, stateless “prompt in, text out” calls...
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Getting rid of OpenAI entirely for Retrieval QA with LangChain is feasible, but the quality hinges on the local LLM’s context limits, prompt format...
Introducing Gemma - 2B 7B 6Trillion Tokens
Google’s new Gemma model suite brings open-weight, English text-only large language models in four sizes—2B and 7B, each available in base and...
Creating an AI Agent with LangGraph Llama 3 & Groq
LangGraph is positioned as the “middle layer” for building AI agents that need structure, state, and controllable decision points—without handing...
Building a LangChain Custom Medical Agent with Memory
A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns...
Gemini RAG - File Search Tool
Gemini’s API team introduced the File Search Tool, a built-in, automated RAG pipeline that turns uploaded documents into a ready-to-query vector...
Getting Started with Gemini Pro on Google AI Studio
Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab...
Anthropic's New Agent Protocol!
Anthropic’s Model Context Protocol (MCP) aims to turn LLMs into practical “agents” by standardizing how models connect to external tools and...
Using LangChain Output Parsers to get what you want out of LLMs
LLM apps fail most often when they accept whatever text a model happens to generate instead of forcing that output into a structure the application...
Introducing Gemini 3.1 Pro
Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer...
Gemini 2.0 Flash
Google’s Gemini 2.0 Flash marks a shift from “multimodal input” to “multimodal output,” with the model able to generate audio and images...
Kimi K2.5- The Agent Swarm
Moonshot AI’s Kimi K2.5 positions itself less as a single “bigger model” and more as a platform for task-specialized reasoning—especially through an...
The Improved Gemini 2.5 Pro - A Coding Powerhouse
Google’s new Gemini 2.5 Pro preview version is being positioned as a major step up for coding—less about generic “reasoning” gains and more about...
Gemini 2.5 Pro for Audio Transcription
Gemini 2.5 Pro’s jump to a 64,000-token generation limit is the practical unlock for high-quality podcast transcription at scale—long enough to turn...
How to make a custom dataset like Alpaca7B
A practical path to building an “Alpaca-style” instruction dataset is to start with a small set of human-written seed tasks, then use GPT-3 to expand...
Gemini 3 Flash - Your Daily Workhorse Upgraded
Gemini 3 Flash lands as a faster, more cost-efficient “workhorse” model that, in many benchmarks, lands near Gemini 3 Pro—and sometimes beats...
Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT is positioned as an autonomous AI agent that can carry out multi-step tasks end-to-end—searching the web, browsing pages, extracting...
OpenAI Functions + LangChain : Building a Multi Tool Agent
OpenAI’s function-calling system, wired through LangChain, can turn a plain chat model into a finance assistant that reliably selects the right API...
PydanticAI - The NEW Agent Builder on the Block
PydanticAI positions itself as a new, Pydantic-first agent and LLM application framework built to make model outputs reliably conform to structured...
Advanced RAG 03 - Hybrid Search BM25 & Ensembles
Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair...
DeepSeek R1 for Structured Agents
DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no...
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Hooking Alpaca to LangChain is straightforward: build a local Hugging Face text-generation pipeline around the Alpaca/LLaMA-compatible model, wrap it...
Gemini Embedding 2 - Audio, Text, Images, Docs, Videos
Gemini Embedding 2 is positioned as a single, natively multimodal embedding model that collapses what used to require many separate pipelines—one...
Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework
OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small...
Kyutai STT & TTS - A Perfect Local Voice Solution?
Kyutai’s latest local speech stack—QI TTS for text-to-speech and speech-to-text for ASR—pairs fast, small models with voice conditioning that can...
The 5 Types of LLM Apps
LLM apps can be sorted into five practical categories—ranging from chat-style assistants to fully autonomous agents—so builders can more clearly...
Claude 3 Vs Gemini Vs GPT-4: Who Can Make Amazing Powerpoints?
LLMs can reliably generate the *facts* and basic slide structure for a presentation, but they still struggle to produce consistently sleek, polished...
Ollama Launch + Claude Code + GLM Flash
Ollama has introduced “Ollama launch,” a one-command way to run Anthropic API–compatible coding assistants locally—making it possible to use Claude...
How can GPT-4.5 be So Bad?
GPT-4.5 arrives with a “bigger and more natural” pitch, but benchmark results and practical tradeoffs paint it as an also-ran: stronger than GPT-4 in...
GPT-4o: What They Didn't Say!
OpenAI’s GPT-4o (“Omni”) marks a shift toward a single, more capable multimodal system—one that can take in text, images, and audio and produce...
Mistral Small 3 - The NEW Mini Model Killer
Mistral has released “Mistral Small 3,” a new 24B-parameter open-weight model positioned as a fast, capable “workhorse” for everyday tasks—aimed at...
Running Gemma using HuggingFace Transformers or Ollama
Running Gemma locally or in a notebook is straightforward, but the biggest practical takeaway is that Gemma’s chat behavior depends heavily on its...
Qwen QwQ 32B - The Best Local Reasoning Model?
QwQ 32B is being positioned as a top-tier “local reasoning” model that can run on personal hardware, and the core claim is that it delivers...
Ollama - Libraries, Vision and Updates
Ollama’s latest updates push local AI further into “build-and-automate” territory: new Python/JavaScript libraries, expanded vision model support,...
Anthropic Does The Unthinkable with Haiku 3.5
Claude 3.5 Haiku arrives with a major price jump—$1 per million tokens in and $5 per million tokens out—turning what used to be a budget-friendly...
Gemini Pro + LangChain - Chains, Mini RAG, PAL + Multimodal
Gemini Pro becomes a practical building block inside LangChain, enabling everything from simple prompt-to-response chains to mini RAG, PAL...
NVIDIA NemoCLAW!! - GTC 2026
NVIDIA’s biggest GTC 2026 announcement isn’t new space hardware or flashy modules—it’s a push to bring OpenClaw-style “agent” software into...
Building a Summarization System with LangChain and GPT-3 - Part 1
Summarization quality no longer has to rely on training bespoke models for every writing style. With modern instruction-tuned and RLHF-tuned large...
Qwen 3.5 - The next NEXT model
Qwen 3.5 lands as a major shift in how fast, capable AI can be—pairing a large mixture-of-experts model with a reported up to 19x decoding speed...
Clone ANY Voice for Free — Qwen Just Changed Everything
Open-source voice cloning and “voice design” have moved from closed, API-only systems into the open TTS ecosystem: Qwen has released its Quen 3 TTS...
Information Extraction with LangChain & Kor
Turning messy text into structured data is the bottleneck for many NLP workflows—especially when there’s no labeled dataset to train a named-entity...
Advanced RAG 05 - HyDE - Hypothetical Document Embeddings
HyDE (Hypothetical Document Embeddings) improves retrieval in RAG by using a large language model to draft a “hypothetical answer,” embedding that...
Meta's Code World Model
Meta’s researchers at FAIR released “Code World Model” (CWM), a 32B open-weights model aimed at code generation that goes beyond copying syntax. The...
Qwen3 Multimodal Embeddings: Finally, RAG That Sees
Qwen 3 VL’s multimodal embedding models aim to make RAG retrieval “see” beyond text by mapping text, images, and video-like content into a shared...
Advanced RAG 04 - Contextual Compressors & Filters
RAG systems often fail not because retrieval misses everything, but because they bring back too much irrelevant text—or the right facts buried inside...
Qwen 3 Embeddings & Rerankers
A new open suite of text embedding and reranking models from Qwen is aimed squarely at retrieval-augmented generation (RAG) use cases—especially...
SmolDocling - The SmolOCR Solution?
SmolDocling—an IBM-partnered document understanding model on Hugging Face—aims to do more than “plain OCR” by converting documents into a structured,...
GeminiCLI - The Deep Dive with MCPs
Gemini CLI’s built-in tools and MCP integrations can turn “rough” app scaffolding into a working, deployable project—especially when developers lean...
Google's Agent Upgrade
Google’s latest “Opal” upgrade shifts agent building from fixed, step-by-step workflows toward goal-driven, interactive experiences—complete with...
vLLM - Turbo Charge your LLM Inference
Local and cloud deployments of large language models often feel unusably slow, even on strong hardware, because inference bottlenecks pile up around...
Gemini 2.0 - How to use the Live Bidirectional API
Gemini 2.0’s Live Bidirectional API is built for real-time, two-way multimodal interaction—letting users talk back and forth with voice, stream...
Gemini 2.0 - Video Analyzer with Code
Gemini’s “Video Analyzer” turns uploaded videos into structured, time-coded outputs—captions, spoken transcripts, visual scene descriptions, key...
OpenAI's New OPEN Models - GPT-OSS 120B & 20B
OpenAI has released two open-weights language models under an Apache 2.0 license: a 120B-parameter model and a 20B-parameter model. The headline...
How to use BGE Embeddings for LangChain and RAG
BGE embeddings from the Beijing Academy of AI have surged to the top of major embedding benchmarks while dramatically shrinking model size—making...
CrewAI - Building a Custom Crew
A custom CrewAI workflow can reliably turn a user-chosen topic into a researched, saved markdown article—but the “process shape” matters. In a...
Cohere's Command-R a Strong New Model for RAG
Cohere’s Command-R arrives as a purpose-built model for retrieval-augmented generation (RAG) and tool/function calling, not as a bid to replace top...
How to use Custom Prompts for RetrievalQA on LLaMA-2 7B
RetrievalQA with LLaMA-2 can produce “correct-but-then-junk” outputs—answers that start right and then trail off into unhelpful or incorrect text....
LLaMA2 Tokenizer and Prompt Tricks
LLaMA 2’s behavior hinges less on “magic prompting” and more on two concrete levers: the tokenizer’s limited vocabulary size and, especially, the...
Camel + LangChain for Synthetic Data & Market Research
Camel—an “autonomous GPT” approach built around two agents talking to each other—gets positioned as a practical engine for synthetic data and market...
NEW - Anthropic Updated Claude Models & Computer Use Agents!!
Anthropic’s latest release pairs two upgraded Claude models with a new “computer use” capability that lets Claude interact with a user’s computer...
HOW to Make Conversational Form with LangChain | LangChain TUTORIAL
Conversational forms don’t have to feel like web-page data entry. By extracting structured fields from free-form chat and then asking only what’s...
MiroThinker 1.5 - The 30B That Outperforms 1T Models
MirrorThinker 1.5 is positioned as a practical shift in agent design: instead of relying on a single, information-heavy model, it’s built to...
SmolLMv3 - A Small Reasoner with Tool Use.
Hugging Face has released SmolLMv3, a 3B-parameter language model aimed at “small” local deployment without giving up reasoning and tool use. The...
KittenTTS - The Nano TTS
Kitten ML’s “KittenTTS” pushes text-to-speech into a new size category: multiple TTS models that fit under 25 MB, are optimized for CPU-only use, and...
Haiku 4.5 - Small Beats Big
Claude Haiku 4.5 is arriving with higher prices, but it’s also delivering a rare mix of speed and task performance that makes it a strong candidate...
The "Token Muncher" Problem: Is Sonnet 4.6 Actually Cheaper?
Claude Sonnet 4.6 is positioned as a cheaper, more capable step up from earlier Sonnet models—especially for knowledge work and “computer use”...
EmbeddingGemma - Micro Embeddings for Mobile Devices
EmbeddingGemma is a family of tiny, text-only embedding models designed to run on-device, enabling retrieval, semantic search, clustering, and “micro...
Qwen3 Next - Behind the Curtain
Qwen 3 Next is an 80B Mixture of Experts (MoE) model built to run with only 3B active parameters per inference—an efficiency leap that still lands it...
OpenAI's Agent Builder
OpenAI’s Agent Builder turns agent design into a node-and-guardrail workflow: prompts, classification, conditional routing, and tool use are...
Building Single-User vs Multi-User Agents: What Actually Changes
The biggest shift in building agent systems isn’t “one agent vs many agents.” It’s “one user’s private world vs a shared, multi-tenant world,” and...
Tiny Aya - Cohere's Mini Multilingual Models
Choosing a language model for non-English languages is often a guessing game—especially for low-resource languages with limited internet data and...
Sora 2 - OpenAI's TikTok
OpenAI’s Sora 2 is arriving not just as a better video-generation model, but as the foundation for a TikTok-style social network—complete with an iOS...