Get AI summaries of any video or article — Sign up free

Sam Witteveen — Channel Summaries — Page 2

AI-powered summaries of 183 videos about Sam Witteveen.

183 summaries

No matches found.

Qwen QwQ 32B - The Best Local Reasoning Model?

Sam Witteveen · 2 min read

QwQ 32B is being positioned as a top-tier “local reasoning” model that can run on personal hardware, and the core claim is that it delivers...

Local Reasoning ModelsMixture of ExpertsReinforcement Learning

Ollama - Libraries, Vision and Updates

Sam Witteveen · 3 min read

Ollama’s latest updates push local AI further into “build-and-automate” territory: new Python/JavaScript libraries, expanded vision model support,...

Ollama LibrariesVision ModelsOpenAI Compatibility

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

Sam Witteveen · 3 min read

Mosaic’s MPT-7B is being positioned as the first fully trained, commercially usable, open LLaMA-style model that’s ready for real deployment—not just...

MPT-7B ReleaseLong-Context Fine-TuningCommercial Licensing

Anthropic Does The Unthinkable with Haiku 3.5

Sam Witteveen · 3 min read

Claude 3.5 Haiku arrives with a major price jump—$1 per million tokens in and $5 per million tokens out—turning what used to be a budget-friendly...

Claude 3.5 Haiku PricingAgentic WorkflowsCost vs Quality

Multi-Agent AI EXPLAINED: How Magentic-One Works

Sam Witteveen · 3 min read

Multi-agent systems are shifting toward “generalist” agents that can handle many tasks without hand-coding every step—and Microsoft’s Magentic-One is...

Multi-Agent OrchestrationProgress LedgersSub-Agent Tooling

Gemini Pro + LangChain - Chains, Mini RAG, PAL + Multimodal

Sam Witteveen · 3 min read

Gemini Pro becomes a practical building block inside LangChain, enabling everything from simple prompt-to-response chains to mini RAG, PAL...

Gemini ProLangChain ChainsMini RAG

Advanced RAG 06 - RAG Fusion

Sam Witteveen · 3 min read

RAG Fusion aims to narrow the gap between what users type and what they actually mean by turning one user query into several targeted search queries,...

RAG FusionQuery RewritingReciprocal Rank Fusion

NVIDIA NemoCLAW!! - GTC 2026

Sam Witteveen · 3 min read

NVIDIA’s biggest GTC 2026 announcement isn’t new space hardware or flashy modules—it’s a push to bring OpenClaw-style “agent” software into...

OpenClawNemo ClawOpenShell

Building a Summarization System with LangChain and GPT-3 - Part 1

Sam Witteveen · 2 min read

Summarization quality no longer has to rely on training bespoke models for every writing style. With modern instruction-tuned and RLHF-tuned large...

LangChain SummarizationToken LimitsMapReduce

Qwen 3.5 - The next NEXT model

Sam Witteveen · 3 min read

Qwen 3.5 lands as a major shift in how fast, capable AI can be—pairing a large mixture-of-experts model with a reported up to 19x decoding speed...

Qwen 3.5Mixture of ExpertsMultimodal Training

Mistral Large with Function Calling - Review and Code

Sam Witteveen · 3 min read

Mistral Large positions itself as a strong alternative to top closed models by pairing solid reasoning performance with native function calling—while...

Mistral LargeFunction CallingOn-Prem Deployment

Gemma 2 - Local RAG with Ollama and LangChain

Sam Witteveen · 3 min read

Running a fully local RAG pipeline with Gemma 2 is practical—and the fastest path starts with a clean indexing step, local embeddings, and a...

Local RAGGemma 2Ollama

Mistral 3: Europe's Answer to DeepSeek or Too Little, Too Late?

Sam Witteveen · 3 min read

Mistral has returned with a major open-model push—four new releases led by Mistral Large 3, plus smaller “Ministral 3” models that include both base...

Mistral 3MoE ModelsOpen-Source LLMs

Moshi The Talking AI

Sam Witteveen · 3 min read

Moshi is a duplex, open-domain “talking AI” system built to hold real-time conversations without the usual stop-and-go pattern of speech-to-text...

Duplex Speech AIAudio TokenizationOpen Source Models

NanoNets OCR-s

Sam Witteveen · 3 min read

A newly released “OCR Small” model from Nanets—built on the open-weight Quen 2.5VL base—turns a roughly 3B parameter vision-language model into a...

OCRVision-Language ModelsDocument Extraction

Clone ANY Voice for Free — Qwen Just Changed Everything

Sam Witteveen · 3 min read

Open-source voice cloning and “voice design” have moved from closed, API-only systems into the open TTS ecosystem: Qwen has released its Quen 3 TTS...

Open-Source TTSVoice CloningVoice Design

Information Extraction with LangChain & Kor

Sam Witteveen · 2 min read

Turning messy text into structured data is the bottleneck for many NLP workflows—especially when there’s no labeled dataset to train a named-entity...

Information ExtractionLangChainKor

Advanced RAG 05 - HyDE - Hypothetical Document Embeddings

Sam Witteveen · 3 min read

HyDE (Hypothetical Document Embeddings) improves retrieval in RAG by using a large language model to draft a “hypothetical answer,” embedding that...

HyDEHypothetical Document EmbeddingsRAG Retrieval

Meta's Code World Model

Sam Witteveen · 2 min read

Meta’s researchers at FAIR released “Code World Model” (CWM), a 32B open-weights model aimed at code generation that goes beyond copying syntax. The...

Code World ModelWorld ModelsCode Execution Traces

Qwen3 Multimodal Embeddings: Finally, RAG That Sees

Sam Witteveen · 3 min read

Qwen 3 VL’s multimodal embedding models aim to make RAG retrieval “see” beyond text by mapping text, images, and video-like content into a shared...

Multimodal EmbeddingsMultimodal RAGReranking

Advanced RAG 04 - Contextual Compressors & Filters

Sam Witteveen · 3 min read

RAG systems often fail not because retrieval misses everything, but because they bring back too much irrelevant text—or the right facts buried inside...

Contextual CompressionRAG FilteringLLM Extractors

Qwen 3 Embeddings & Rerankers

Sam Witteveen · 2 min read

A new open suite of text embedding and reranking models from Qwen is aimed squarely at retrieval-augmented generation (RAG) use cases—especially...

Text EmbeddingsRerankingRAG

SmolDocling - The SmolOCR Solution?

Sam Witteveen · 2 min read

SmolDocling—an IBM-partnered document understanding model on Hugging Face—aims to do more than “plain OCR” by converting documents into a structured,...

Document ConversionStructured OCRVision-Language Models

GeminiCLI - The Deep Dive with MCPs

Sam Witteveen · 3 min read

Gemini CLI’s built-in tools and MCP integrations can turn “rough” app scaffolding into a working, deployable project—especially when developers lean...

Gemini CLIMCP ServersStreaming Chat

LangChain Reaches 1.0 - Whats new?

Sam Witteveen · 3 min read

LangChain’s leap to “1.0” and “LangGraph 1.0,” paired with a $125 million Series B at a $1.25 billion valuation, signals a shift from experimental...

LangChain 1.0LangGraph 1.0Agent Engineering

Mistral Agents API - The NEW Agent System

Sam Witteveen · 2 min read

Mistral has launched an “agents API” designed to let developers build agentic systems that run against Mistral models through a cloud-based...

Mistral Agents APIPersistent MemoryBuilt-in Connectors

Tagging and Extraction - Classification using OpenAI Functions

Sam Witteveen · 3 min read

OpenAI “functions” can be used in LangChain not to trigger external code, but to force large language models to return structured JSON...

OpenAI FunctionsLangChain TaggingLangChain Extraction

Anthropic's Latest Winner - Workbench

Sam Witteveen · 2 min read

Anthropic has overhauled its developer “Workbench” inside the Anthropic console, turning prompt building into a full testing and benchmarking...

Anthropic WorkbenchPrompt EngineeringPrompt Evaluation

Google's RAG Experiment - NotebookLM

Sam Witteveen · 2 min read

NotebookLM is Google’s early, product-shaped experiment in retrieval-augmented generation (RAG): upload your own documents, ask questions, and get...

NotebookLMRetrieval Augmented GenerationGemini 1.5 Pro

Bard can now code and put that code in Colab for you.

Sam Witteveen · 3 min read

Google’s Bard has gained a practical new capability: it can generate Python code and export that code directly into Google Colab, turning prompts...

Bard Code ExportGoogle ColabSQLite and Python

Google's Agent Upgrade

Sam Witteveen · 2 min read

Google’s latest “Opal” upgrade shifts agent building from fixed, step-by-step workflows toward goal-driven, interactive experiences—complete with...

Opal Agent BuilderInteractive Agent StepsPersistent Memory

vLLM - Turbo Charge your LLM Inference

Sam Witteveen · 2 min read

Local and cloud deployments of large language models often feel unusably slow, even on strong hardware, because inference bottlenecks pile up around...

LLM InferencevLLM ServingPagedAttention

Gemini 2.0 - How to use the Live Bidirectional API

Sam Witteveen · 2 min read

Gemini 2.0’s Live Bidirectional API is built for real-time, two-way multimodal interaction—letting users talk back and forth with voice, stream...

Live Bidirectional APIMultimodal StreamingAI Studio Setup

The 4 Stacks of LLM Apps & Agents

Sam Witteveen · 3 min read

Building useful LLM apps and agents comes down to assembling four distinct “stacks” in the right places: the model itself, the data/search/memory...

LLM App ArchitectureLLM AgentsVector Stores

Falcon Soars to the Top - The NEW 40B LLM Rises above the rest.

Sam Witteveen · 3 min read

Falcon has arrived as a new, from-scratch large language model family—anchored by a 40B parameter model—and it’s already topping Hugging Face’s Open...

Falcon LLMHugging Face LeaderboardModel Licensing

How to make Muilt-Agent Apps with smolagents

Sam Witteveen · 3 min read

Multi-agent apps built with smolagents work best when the system leans on tool-calling and strong hosted models—small local “code agents” tend to...

Multi-Agent AppssmolagentsTool Calling

Gemini 2.0 - Video Analyzer with Code

Sam Witteveen · 2 min read

Gemini’s “Video Analyzer” turns uploaded videos into structured, time-coded outputs—captions, spoken transcripts, visual scene descriptions, key...

Video AnalysisFunction CallingTimecoded Captions

Agent Skills: Code Beats Markdown (Here's Why)

Sam Witteveen · 3 min read

Agent Skills—an open standard for “agent skills” used by models and coding harnesses—are gaining momentum because they let systems do tasks with code...

Agent SkillsContext EngineeringSandbox Scripts

AgentHQ by Github

Sam Witteveen · 2 min read

GitHub’s Universe pitch centers on a shift from “AI that helps write code” to “AI that runs software work under governance.” The centerpiece is Agent...

Agent HQAI GovernanceMulti-Vendor Agents

OpenAI's New OPEN Models - GPT-OSS 120B & 20B

Sam Witteveen · 3 min read

OpenAI has released two open-weights language models under an Apache 2.0 license: a 120B-parameter model and a 20B-parameter model. The headline...

Open-Weights ModelsApache 2.0 LicensingAgentic Tool Use

How to OPTIMIZE your prompts for better Reasoning!

Sam Witteveen · 3 min read

Prompt quality in large language model (LLM) work depends heavily on context and input design—not just the question. Microsoft’s new “prompt Wizard”...

Prompt OptimizationIn-Context LearningChain of Thought

StarCoder - The LLM to make you a coding star?

Sam Witteveen · 2 min read

StarCoder is positioned as a serious open-source coding model family—built for long-context code generation and fine-tuned into chat-style...

StarCoder FamilyLong-Context CodingFill In The Middle

RetrievalQA with LLaMA 2 70b & Chroma DB

Sam Witteveen · 2 min read

Retrieval-augmented QA with LLaMA-2 70B works cleanly when answers are grounded in a local Chroma vector database built from a set of research PDFs....

Retrieval QARAG PipelineChroma Vector Store

LLaMA2 for Multilingual Fine Tuning?

Sam Witteveen · 3 min read

Multilingual fine-tuning with LLaMA 2 hinges less on the model weights and more on whether its tokenizer breaks your target language into efficient...

Tokenizer EfficiencyMultilingual Fine-TuningUnicode Tokenization

FunctionGemma - Function Calling at the Edge

Sam Witteveen · 2 min read

Function Gemma brings customizable function calling to a compact Gemma model designed for edge deployment—so apps and games can run locally on phones...

Function CallingEdge DeploymentModel Fine-Tuning

Building a Vision App with Ollama Structured Outputs

Sam Witteveen · 3 min read

Structured outputs in Ollama make it practical to turn both text and images into validated, schema-shaped data—locally—using Python classes...

Structured OutputsPydantic SchemasVision Extraction

How to use BGE Embeddings for LangChain and RAG

Sam Witteveen · 2 min read

BGE embeddings from the Beijing Academy of AI have surged to the top of major embedding benchmarks while dramatically shrinking model size—making...

BGE EmbeddingsLangChain RAGChroma Vector Store

CrewAI - Building a Custom Crew

Sam Witteveen · 3 min read

A custom CrewAI workflow can reliably turn a user-chosen topic into a researched, saved markdown article—but the “process shape” matters. In a...

CrewAI Custom CrewSequential vs HierarchicalAgent Tools

Generative Agents - Deep Dive and GPT-4 Recreation

Sam Witteveen · 3 min read

Generative agents for “interactive simulacra” are built around a practical loop: each character continuously turns observations into memories,...

Generative AgentsInteractive SimulacraMemory Retrieval

Cohere's Command-R a Strong New Model for RAG

Sam Witteveen · 3 min read

Cohere’s Command-R arrives as a purpose-built model for retrieval-augmented generation (RAG) and tool/function calling, not as a bid to replace top...

Command-RRetrieval Augmented GenerationTool Use

Gemini 2.5 Pro for YouTube Analysis

Sam Witteveen · 3 min read

Gemini 2.5 Pro can analyze YouTube videos directly—either by uploading a video file or, more conveniently, by passing a public YouTube URL into...

YouTube URL AnalysisGemini 2.5 ProFiles API

The 4 Big Changes in LLMs

Sam Witteveen · 3 min read

LLMs are improving on multiple fronts at once—smarter reasoning, faster token generation, cheaper inference, and ever-larger context—and product...

LLM Product StrategySynthetic DataMultimodality

NEW LangChain Expression Language!!

Sam Witteveen · 3 min read

LangChain’s new Expression Language is a more declarative way to build LLM “chains,” making the flow of data through prompts, models, tools, and...

LangChain Expression LanguageDeclarative ChainsOpenAI Function Calling

Comparing LLMs with LangChain

Sam Witteveen · 3 min read

Choosing a “good for production” large language model isn’t about picking the biggest name—it’s about matching model behavior to the task. A...

Model EvaluationLangChainInstruction Tuning

How to use Custom Prompts for RetrievalQA on LLaMA-2 7B

Sam Witteveen · 2 min read

RetrievalQA with LLaMA-2 can produce “correct-but-then-junk” outputs—answers that start right and then trail off into unhelpful or incorrect text....

RetrievalQALLaMA-2 PromptingRAG Reliability

Mistral 7B - The New 7B LLaMA Killer?

Sam Witteveen · 3 min read

Mistral AI’s newly released Mistral 7B is being positioned as a “7B LLaMA killer” because it delivers stronger benchmark performance than larger...

Mistral 7BLLaMA BenchmarksInstruction Tuning

Gradio 5 - Building a Quick Chabot UI for LangChain

Sam Witteveen · 2 min read

Gradio 5 makes it straightforward to build a shareable, streaming chat UI on top of LangChain—so people can try an LLM-powered chatbot in a browser...

Gradio 5LangChain StreamingChat UI

LLaMA2 Tokenizer and Prompt Tricks

Sam Witteveen · 3 min read

LLaMA 2’s behavior hinges less on “magic prompting” and more on two concrete levers: the tokenizer’s limited vocabulary size and, especially, the...

LLaMA 2 TokenizerSystem Prompt TokensPrompt Steering

Camel + LangChain for Synthetic Data & Market Research

Sam Witteveen · 3 min read

Camel—an “autonomous GPT” approach built around two agents talking to each other—gets positioned as a practical engine for synthetic data and market...

Camel Multi-AgentInception PromptingRole-Playing Prompts

NEW - Anthropic Updated Claude Models & Computer Use Agents!!

Sam Witteveen · 3 min read

Anthropic’s latest release pairs two upgraded Claude models with a new “computer use” capability that lets Claude interact with a user’s computer...

Claude 3.5 SonnetClaude 3.5 HaikuComputer Use API

HOW to Make Conversational Form with LangChain | LangChain TUTORIAL

Sam Witteveen · 2 min read

Conversational forms don’t have to feel like web-page data entry. By extracting structured fields from free-form chat and then asking only what’s...

Conversational FormLangChainPydantic Extraction

Investigating Alpaca 7B - Finetuned LLaMa LLM

Sam Witteveen · 2 min read

Alpaca 7B is a newly released instruction-tuned 7-billion-parameter model built by Stanford that aims to match the quality of OpenAI’s...

Instruction TuningLLaMA Fine-TuningModel Evaluation

Microsoft's Phi 3.5 - The latest SLMs

Sam Witteveen · 2 min read

Microsoft has expanded its Phi 3 lineup with three new Phi 3.5 models—two instruction-tuned language models and an updated vision model—pushing...

Phi 3.5 ModelsLocal LLMsMixture of Experts

MiroThinker 1.5 - The 30B That Outperforms 1T Models

Sam Witteveen · 3 min read

MirrorThinker 1.5 is positioned as a practical shift in agent design: instead of relying on a single, information-heavy model, it’s built to...

Tool-Using AgentsMirrorThinker 1.5Mixture of Experts

Raven - RWKV-7B RNN's LLM Strikes Back

Sam Witteveen · 3 min read

RWKV is a rare attempt to bring RNNs back into the large-language-model conversation by fixing the two biggest pain points that pushed transformers...

RWKV ArchitectureRNN vs TransformersLong Context Generation

SmolLMv3 - A Small Reasoner with Tool Use.

Sam Witteveen · 3 min read

Hugging Face has released SmolLMv3, a 3B-parameter language model aimed at “small” local deployment without giving up reasoning and tool use. The...

SmolLMv3 ReleaseTool CallingDual Think Reasoning

KittenTTS - The Nano TTS

Sam Witteveen · 2 min read

Kitten ML’s “KittenTTS” pushes text-to-speech into a new size category: multiple TTS models that fit under 25 MB, are optimized for CPU-only use, and...

Edge TTSModel QuantizationCPU-Optimized Inference

Haiku 4.5 - Small Beats Big

Sam Witteveen · 3 min read

Claude Haiku 4.5 is arriving with higher prices, but it’s also delivering a rare mix of speed and task performance that makes it a strong candidate...

Claude Haiku 4.5Agent WorkflowsModel Pricing

OpenAI DevDay 2025 - What Hit What Missed

Sam Witteveen · 3 min read

OpenAI’s DevDay 2025 keynote centered on four practical product moves: apps inside ChatGPT, a new Agent Kit for building agentic systems, Codex...

ChatGPT Apps SDKAgent KitAgent Builder

Caught Distilling from Claude?

Sam Witteveen · 3 min read

A fresh wave of allegations claims Chinese AI labs are running large-scale “distillation attacks” to copy capabilities from Claude—using fleets of...

Distillation AttacksClaudeReinforcement Learning

Beating Cowork with Open Source Cowork

Sam Witteveen · 2 min read

Anthropic’s release of “co-work” triggered a high-stakes scramble in the AI startup world—but one company, Camel AI, responded with a pivot that...

Open Source PivotMulti-Agent ArchitectureTask Decomposition

The "Token Muncher" Problem: Is Sonnet 4.6 Actually Cheaper?

Sam Witteveen · 2 min read

Claude Sonnet 4.6 is positioned as a cheaper, more capable step up from earlier Sonnet models—especially for knowledge work and “computer use”...

Model PricingAdaptive ThinkingToken Usage

EmbeddingGemma - Micro Embeddings for Mobile Devices

Sam Witteveen · 2 min read

EmbeddingGemma is a family of tiny, text-only embedding models designed to run on-device, enabling retrieval, semantic search, clustering, and “micro...

Embedding ModelsOn-Device AIMicro RAG

The Qwen Avalanche

Sam Witteveen · 3 min read

Alibaba’s AppSara keynote kicked off a wave of new model releases from Qwen, but the most consequential thread running through the announcements is a...

Qwen Model ReleasesAgentic Tool CallingMultimodal Vision-Language

Open Responses - The NEW Standard API for Open Models

Sam Witteveen · 3 min read

OpenAI’s push for an “open responses” standard aims to make today’s agent-style features—tool calling, streaming, multimodal inputs, and structured...

Open Responses StandardAgentic Tool CallingReasoning Tokens

Qwen3 Next - Behind the Curtain

Sam Witteveen · 3 min read

Qwen 3 Next is an 80B Mixture of Experts (MoE) model built to run with only 3B active parameters per inference—an efficiency leap that still lands it...

Mixture of ExpertsMulti-Token PredictionInference Efficiency

OpenAI's Agent Builder

Sam Witteveen · 3 min read

OpenAI’s Agent Builder turns agent design into a node-and-guardrail workflow: prompts, classification, conditional routing, and tool use are...

Agent BuilderGuardrailsConditional Routing

Junie The Anti-Vibe Coding IDE

Sam Witteveen · 3 min read

AI coding tools often fail hardest when a developer already has a real repository to maintain—because they tend to “go wild,” generate large diffs,...

Smart Coding AgentAsk ModeJetBrains IDE Integration

Is Meta killing FAIR?

Sam Witteveen · 2 min read

Meta’s AI job cuts are hitting FAIR, Meta’s long-running open research lab tied to Facebook AI Research and associated with Yan LeCun’s leadership....

FAIRMeta AIOpen-Weight Models

Building Single-User vs Multi-User Agents: What Actually Changes

Sam Witteveen · 3 min read

The biggest shift in building agent systems isn’t “one agent vs many agents.” It’s “one user’s private world vs a shared, multi-tenant world,” and...

Single-User AgentsMulti-User AgentsAgent Harness

Tiny Aya - Cohere's Mini Multilingual Models

Sam Witteveen · 3 min read

Choosing a language model for non-English languages is often a guessing game—especially for low-resource languages with limited internet data and...

Multilingual Language ModelsTokenizationLow-Resource Languages

Sora 2 - OpenAI's TikTok

Sam Witteveen · 3 min read

OpenAI’s Sora 2 is arriving not just as a better video-generation model, but as the foundation for a TikTok-style social network—complete with an iOS...

Sora 2CameosShort-Form Video

The Future of AI Coding with Aja Hammerly

Sam Witteveen · 3 min read

AI coding is moving from flashy “one-shot” demos toward an iterative, pair-programming style workflow—where tools like Firebase Studio treat the...

AI CodingFirebase StudioPrompting