Sam Witteveen — Channel Summaries
AI-powered summaries of 183 videos about Sam Witteveen.
183 summaries
DeepSeek OCR - More than OCR
DeepSeek OCR’s headline idea isn’t better document reading—it’s a new way to compress long text into far fewer “vision tokens,” then decode it back...
Fine-tuning LLMs with PEFT and LoRA
Fine-tuning large language models is expensive because it requires updating massive weight tensors, which drives up both compute needs and checkpoint...
Introducing Gemini CLI
Google’s Gemini team is rolling out Gemini CLI, a command-line interface that turns Gemini Code Assist into an agent-like workflow for editing files,...
Opal - Google Labs Killer NEW App
Google Labs’ Opal is a no-code workflow builder aimed at turning natural-language requests into working LLM “mini apps,” with built-in steps for web...
Ollama - Local Models on your machine
Ollama is a user-friendly way to run large language models locally on a Mac or Linux machine by downloading them and serving them through a local...
Google's New Universal Commerce Protocol
Google’s Universal Commerce Protocol (UCP) is positioned as the missing retail layer in agentic commerce: a shared standard that lets AI agents...
LangChain Retrieval QA Over Multiple Files with ChromaDB
LangChain retrieval QA becomes practical at scale once the document embeddings live in a persistent ChromaDB vector store on disk. Instead of...
LangChain - Using Hugging Face Models locally (code walkthrough)
Running Hugging Face models locally inside LangChain is the practical workaround when Hugging Face Hub access fails—especially for conversational...
LangGraph Crash Course with code examples
LangGraph is positioned as a new way to run LangChain-based LLM agents by modeling agent behavior as a graph-driven state machine rather than a fixed...
Talk to your CSV & Excel with LangChain
LangChain can turn natural-language questions into accurate, on-the-fly analysis of CSV and Excel data by using a “CSV agent” that runs a Python REPL...
olmOCR - The Open OCR System
OCR for PDFs is getting a practical upgrade: Llama AI’s olmOCR is a fine-tuned vision-language model designed to turn rasterized PDF pages (including...
Microsoft Launches 10 NEW AI Agents
Microsoft’s Ignite announcements push autonomous AI agents deep into enterprise software—especially Dynamics 365—by rolling out 10 new agent...
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
Large language models are powerful at generating text, but they don’t plug cleanly into the rest of software—especially when apps need state,...
Gemini 3 Pro - The Model You've Been Waiting For
Gemini 3 Pro is positioned as a long-horizon, tool-using model built for “clever, concise, direct” work—less about a flashy personality and more...
GPT 5 - What They Didn't Say
OpenAI’s GPT-5 rollout is being framed less as a single leap in raw intelligence and more as a cost-and-workflow upgrade: ChatGPT-5 behaves like a...
Anthropic's Meta Prompt: A Must-try!
Anthropic’s “Metaprompt” tool turns weak, one-off prompts into a structured, model-ready instruction set—by using Claude itself to generate the final...
LangExtract - Google's New Library for NLP Tasks
Standard NLP tasks—like sentiment classification, named-entity extraction, and entity disambiguation—have long relied on fine-tuned BERT-style...
DeepSeek's New Image Model - Janus Pro
DeepSeek’s Janus Pro stands out for combining two capabilities in one multimodal system: it can answer questions about images (using a SigLIP-based...
LangChain Agents - Joining Tools and Chains with Decisions
LangChain agents let a language model choose—at runtime—which tools to use (or whether to use any tools at all) to answer a user’s question. Instead...
Mesop - Google's New UI Maker
Building LLM apps often stalls on two fronts: getting a working interface in front of real users fast enough to collect feedback, and validating...
LlamaOCR - Building your Own Private OCR System
LlamaOCR turns screenshots and scanned documents into editable Markdown by using a vision-capable Llama 3.2 model hosted via Together AI. The...
LangChain - Conversations with Memory (explanation & code walkthrough)
Memory is the difference between a chat agent that feels coherent and one that repeatedly “forgets” what a user meant earlier—especially when people...
Master CrewAI: Your Ultimate Beginner's Guide!
High-quality AI agents hinge on consistency: they must deliver the right outcome reliably, not just “work” most of the time. The framework presented...
Google Launches an Agent SDK - Agent Development Kit
Google has launched an “Agent Development Kit” (Agent SDK) aimed at building deployable AI agents in the cloud, with built-in support for evaluation,...
Understanding ReACT with LangChain
ReACT (Reasoning and Action) is a prompting-and-agent pattern designed to make large language models do multi-step problem solving by alternating...
LiteParse - The Local Document Parser
Coding agents can generate impressive Python at scale, but document-heavy workflows expose a persistent failure mode: PDFs, spreadsheets, and charts...
Mistral OCR - Multimodal & Multilingual OCR
Mistral has launched a non-open-source OCR system delivered through an API that turns scanned documents, PDFs, and images into structured, multimodal...
Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)
Custom tools are the key lever for making LangChain conversational agents more useful—and the biggest practical lesson is that tool use often...
Master PDF Chat with LangChain - Your essential guide to queries on documents
Building a “chat with your PDF” system hinges on one practical fix: plain prompting can’t reliably handle long books because the context window is...
Gemini 1.5 Pro for Code - Part 01
Gemini 1.5 Pro for Code can ingest a real GitHub-style repository, then generate working multi-agent Python code that interacts with the repo’s...
Testing Gemini 1.5 and a 1 Million Token Window
Gemini 1.5 Pro marks a major step up for long-context AI: it pairs a newly updated model with a dramatically expanded context window—up to 1,048,576...
Google Stitch Just Became an AI Figma (And It's Free)
Google Labs’ Stitch has shifted from simple screenshot-to-design experiments into an agentic, Figma-like workflow for generating full UI...
Ollama meets LangChain
Running Ollama models locally turns LangChain into an on-device workflow: Python code can call a local LLaMA-2 instance through an API, generate...
Llama3 + CrewAI + Groq = Email AI Agent
A practical recipe for turning Llama 3 into an email-reply agent with CrewAI is built around Groq’s fast inference—using the Llama 3 70B model with...
Gemini Browser Use
Google’s Gemini 2.0 push for “browser use” is starting to look less like a closed, proprietary demo and more like a buildable automation...
Advanced RAG 01 - Self Querying Retrieval
RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is...
smolagents - HuggingFace's NEW Agent Framework
Hugging Face’s new “smolagents” framework pushes agent building toward “code agents”: instead of forcing an LLM to emit JSON-style plans, it can...
Ollama - Loading Custom Models
Ollama can run fine-tuned models that aren’t already listed locally—by downloading the right quantized weights (GGUF) and creating a small Ollama...
The Gemini Interactions API
Google’s new Gemini Interactions API reframes how developers build with Gemini models by shifting from simple, stateless “prompt in, text out” calls...
LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!
Getting rid of OpenAI entirely for Retrieval QA with LangChain is feasible, but the quality hinges on the local LLM’s context limits, prompt format...
Introducing Gemma - 2B 7B 6Trillion Tokens
Google’s new Gemma model suite brings open-weight, English text-only large language models in four sizes—2B and 7B, each available in base and...
Creating an AI Agent with LangGraph Llama 3 & Groq
LangGraph is positioned as the “middle layer” for building AI agents that need structure, state, and controllable decision points—without handing...
MetaVoice 1B - TTS & Voice Cloning
MetaVoice has released MetaVoice 1B, a 1.2B-parameter, Apache-licensed text-to-speech model aimed at open experimentation—along with a GitHub repo...
Building a LangChain Custom Medical Agent with Memory
A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns...
Gemini RAG - File Search Tool
Gemini’s API team introduced the File Search Tool, a built-in, automated RAG pipeline that turns uploaded documents into a ready-to-query vector...
Gemini 1.5 Pro for Video Analysis
Gemini 1.5 Pro can extract highly specific information from a long video—down to approximate timestamps for when key topics appear—making video-based...
Getting Started with Gemini Pro on Google AI Studio
Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab...
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Running function calling and structured JSON outputs locally is practical with smaller open models—especially Llama 3 8B on Ollama—and it enables...
Anthropic's New Agent Protocol!
Anthropic’s Model Context Protocol (MCP) aims to turn LLMs into practical “agents” by standardizing how models connect to external tools and...
Using LangChain Output Parsers to get what you want out of LLMs
LLM apps fail most often when they accept whatever text a model happens to generate instead of forcing that output into a structure the application...
Gemini TTS - Native Audio Out
Google’s “native audio out” for Gemini is now available in preview, letting developers generate speech directly from Gemini models with controllable...
Introducing Gemini 3.1 Pro
Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer...
Claude Skills - SOPs For Agents
Standard operating procedures are becoming a core building block for AI agents—not just a human workflow tool. The shift matters because LLM output...
Gemini 2.0 Flash
Google’s Gemini 2.0 Flash marks a shift from “multimodal input” to “multimodal output,” with the model able to generate audio and images...
Mistral 8x7B Part 1- So What is a Mixture of Experts Model?
Mistral’s newly released “8x7B” model is a Mixture of Experts (MoE) system: eight separate expert networks, each roughly the size of Mistral 7B, are...
DeepSeekR1 - Full Breakdown
DeepSeek has released open weights for its reasoning model family, led by DeepSeek R1, along with a set of distilled smaller models that can...
Image Annotation with LLava & Ollama
A practical way to turn a cluttered screenshot folder into a searchable archive is to run a local vision-language model over each image and save the...
Kimi K2.5- The Agent Swarm
Moonshot AI’s Kimi K2.5 positions itself less as a single “bigger model” and more as a platform for task-specialized reasoning—especially through an...
... there's more to Sonnet 4.5
Claude Sonnet 4.5 is being positioned as more than a faster, better coding model: it’s a stepping stone toward Anthropic’s “virtual collaborator,” an...
The Improved Gemini 2.5 Pro - A Coding Powerhouse
Google’s new Gemini 2.5 Pro preview version is being positioned as a major step up for coding—less about generic “reasoning” gains and more about...
Unveiling Meta's Impressive CV Model: Sam 2
Meta’s SAM 2 pushes “segment anything” from still images into real-time video—letting users prompt what to track and then generating precise...
Gemini 2.5 Pro for Audio Transcription
Gemini 2.5 Pro’s jump to a 64,000-token generation limit is the practical unlock for high-quality podcast transcription at scale—long enough to turn...
MedGemma - An Open Doctor Model?
Google’s newly released MedGemma models put open-source medical AI within reach for researchers and developers—complete with multimodal (image+text)...
How to make a custom dataset like Alpaca7B
A practical path to building an “Alpaca-style” instruction dataset is to start with a small set of human-written seed tasks, then use GPT-3 to expand...
Explaining OpenAI's o1 Reasoning Models
OpenAI’s o1 and o1 mini are reasoning-first models that trade speed for deeper problem solving by spending substantially more compute during...
Is GPT4All your new personal ChatGPT?
A new open-weight chat model called “GPT4All” is drawing attention as a potential “personal ChatGPT” alternative, but hands-on tests show it’s closer...
Gemini 3 Flash - Your Daily Workhorse Upgraded
Gemini 3 Flash lands as a faster, more cost-efficient “workhorse” model that, in many benchmarks, lands near Gemini 3 Pro—and sometimes beats...
Auto-GPT - How to Automate a Task Based AI with GPT-4
Auto-GPT is positioned as an autonomous AI agent that can carry out multi-step tasks end-to-end—searching the web, browsing pages, extracting...
Google's NEW Agent Money Protocol
Google is rolling out an “Agent Payments Protocol” designed to let AI agents handle money—making purchases, paying merchants, and managing financial...
OpenAI Functions + LangChain : Building a Multi Tool Agent
OpenAI’s function-calling system, wired through LangChain, can turn a plain chat model into a finance assistant that reliably selects the right API...
PydanticAI - The NEW Agent Builder on the Block
PydanticAI positions itself as a new, Pydantic-first agent and LLM application framework built to make model outputs reliably conform to structured...
Google's NEW Agent2Agent Protocol
Google’s new “agent-to-agent” (A2A) protocol is aimed at turning today’s tool-using AI agents into a network of collaborating agents that can...
The Rise of WebMCP
WebMCP is poised to replace today’s “guess-and-scrape” web interaction for AI agents by letting websites expose structured, callable tools directly...
Gemini Deep Think
Google’s “Deep Think” reasoning model is now publicly available after helping DeepMind reach gold-medal-level performance at the International...
Llama 3 - 8B & 70B Deep Dive
Meta’s Llama 3 release centers on two new open-weight language models—8B and 70B—that aim to outperform last generation’s Llama 2 while matching or...
Gemini 2.0 Pro - The Family Expands
Google’s Gemini model lineup expands with a new Gemini 2.0 Pro—an experimental, fully multimodal model with a 2 million token context...
BabyAGI: Discover the Power of Task-Driven Autonomous Agents!
Task-driven autonomous agents are moving from “chat” to structured, tool-using workflows: a large language model takes an objective, breaks it into a...
Advanced RAG 03 - Hybrid Search BM25 & Ensembles
Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair...
DeepSeek R1 for Structured Agents
DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no...
Talking to Alpaca with LangChain - Creating an Alpaca Chatbot
Hooking Alpaca to LangChain is straightforward: build a local Hugging Face text-generation pipeline around the Alpaca/LLaMA-compatible model, wrap it...
Gemini Embedding 2 - Audio, Text, Images, Docs, Videos
Gemini Embedding 2 is positioned as a single, natively multimodal embedding model that collapses what used to require many separate pipelines—one...
Building a LangGraph ReAct Mini Agent
A simple LangGraph pattern—one “reasoner” node plus a single prebuilt “tools” node—can replace sprawling agent graphs full of separate nodes for each...
Gemini 1.5 for Summarization
A long-context model can summarize, extract, and answer questions from a brand-new book—without relying on prior training on that specific text—by...
Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework
OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small...
Kyutai STT & TTS - A Perfect Local Voice Solution?
Kyutai’s latest local speech stack—QI TTS for text-to-speech and speech-to-text for ASR—pairs fast, small models with voice conditioning that can...
Open Reasoning vs OpenAI
OpenAI’s “o1” reasoning models may not keep their edge for long: within roughly two to two and a half months, multiple open-weights labs released...
PydanticAI - Building a Research Agent
A research agent built with PydanticAI can turn a single question into multiple targeted web searches, run them concurrently, and return a...
The 5 Types of LLM Apps
LLM apps can be sorted into five practical categories—ranging from chat-style assistants to fully autonomous agents—so builders can more clearly...
Claude 3 Vs Gemini Vs GPT-4: Who Can Make Amazing Powerpoints?
LLMs can reliably generate the *facts* and basic slide structure for a presentation, but they still struggle to produce consistently sleek, polished...
What is an LLM Router?
LLM routing is emerging as a practical way to cut inference costs without giving up much quality: instead of sending every prompt to the most capable...
Vicuna - 90% of ChatGPT quality by using a new dataset?
Vicuna is being positioned as an open-source-style chat model that delivers roughly “90% of ChatGPT quality” by fine-tuning a LLaMa base model on...
Ollama Launch + Claude Code + GLM Flash
Ollama has introduced “Ollama launch,” a one-command way to run Anthropic API–compatible coding assistants locally—making it possible to use Claude...
How can GPT-4.5 be So Bad?
GPT-4.5 arrives with a “bigger and more natural” pitch, but benchmark results and practical tradeoffs paint it as an also-ran: stronger than GPT-4 in...
Unlock Open Multimodality with Phi-4
Microsoft’s Phi-4 family just got more practical for local, multimodal work: the Phi-4 3.8B “mini instruct” lineup now includes function calling and...
Gemini 2.0 Flash Thinking
Google has released an experimental Gemini 2.0 Flash model branded “Gemini 2.0 Flash Thinking,” notable for exposing full reasoning traces...
GPT-4o: What They Didn't Say!
OpenAI’s GPT-4o (“Omni”) marks a shift toward a single, more capable multimodal system—one that can take in text, images, and audio and produce...
Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools
LangChain agents can be made to choose among three built-in tools—Wikipedia, DuckDuckGo search, and Python’s Read-Evaluate-Print Loop (Python...
Mistral Small 3 - The NEW Mini Model Killer
Mistral has released “Mistral Small 3,” a new 24B-parameter open-weight model positioned as a fast, capable “workhorse” for everyday tasks—aimed at...
Running Gemma using HuggingFace Transformers or Ollama
Running Gemma locally or in a notebook is straightforward, but the biggest practical takeaway is that Gemma’s chat behavior depends heavily on its...
Advanced RAG 02 - Parent Document Retriever
Parent document retrievers fix a common RAG tradeoff: embeddings need to be specific enough to find the right passage, but the language model needs...