Get AI summaries of any video or article — Sign up free

Sam Witteveen — Person Summaries

AI-powered summaries of 95 videos about Sam Witteveen.

95 summaries

No matches found.

DeepSeek OCR - More than OCR

Sam Witteveen · 3 min read

DeepSeek OCR’s headline idea isn’t better document reading—it’s a new way to compress long text into far fewer “vision tokens,” then decode it back...

Context Optimal CompressionVision Token CompressionLong-Context Memory

Introducing Gemini CLI

Sam Witteveen · 2 min read

Google’s Gemini team is rolling out Gemini CLI, a command-line interface that turns Gemini Code Assist into an agent-like workflow for editing files,...

Gemini CLIGemini Code AssistMCP Integration

Opal - Google Labs Killer NEW App

Sam Witteveen · 3 min read

Google Labs’ Opal is a no-code workflow builder aimed at turning natural-language requests into working LLM “mini apps,” with built-in steps for web...

OpalGoogle LabsNo-Code LLM Workflows

LangChain Retrieval QA Over Multiple Files with ChromaDB

Sam Witteveen · 2 min read

LangChain retrieval QA becomes practical at scale once the document embeddings live in a persistent ChromaDB vector store on disk. Instead of...

LangChain Retrieval QAChromaDB PersistenceVector Store Retrieval

LangChain - Using Hugging Face Models locally (code walkthrough)

Sam Witteveen · 2 min read

Running Hugging Face models locally inside LangChain is the practical workaround when Hugging Face Hub access fails—especially for conversational...

LangChainHugging Face HubLocal Transformers

LangGraph Crash Course with code examples

Sam Witteveen · 3 min read

LangGraph is positioned as a new way to run LangChain-based LLM agents by modeling agent behavior as a graph-driven state machine rather than a fixed...

LangGraph Crash CourseAgent State MachineTool Calling Loops

Microsoft Launches 10 NEW AI Agents

Sam Witteveen · 3 min read

Microsoft’s Ignite announcements push autonomous AI agents deep into enterprise software—especially Dynamics 365—by rolling out 10 new agent...

Autonomous AgentsDynamics 365Sales Automation

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

Sam Witteveen · 3 min read

Large language models are powerful at generating text, but they don’t plug cleanly into the rest of software—especially when apps need state,...

LangChain BasicsPrompt TemplatesFew-Shot Prompting

Gemini 3 Pro - The Model You've Been Waiting For

Sam Witteveen · 3 min read

Gemini 3 Pro is positioned as a long-horizon, tool-using model built for “clever, concise, direct” work—less about a flashy personality and more...

Gemini 3 ProAgentic CodingAI Studio

Anthropic's Meta Prompt: A Must-try!

Sam Witteveen · 2 min read

Anthropic’s “Metaprompt” tool turns weak, one-off prompts into a structured, model-ready instruction set—by using Claude itself to generate the final...

MetapromptPrompt EngineeringClaude 3

LangExtract - Google's New Library for NLP Tasks

Sam Witteveen · 2 min read

Standard NLP tasks—like sentiment classification, named-entity extraction, and entity disambiguation—have long relied on fine-tuned BERT-style...

Information ExtractionSource GroundingFew-Shot Prompting

DeepSeek's New Image Model - Janus Pro

Sam Witteveen · 3 min read

DeepSeek’s Janus Pro stands out for combining two capabilities in one multimodal system: it can answer questions about images (using a SigLIP-based...

Multimodal AIImage UnderstandingText-to-Image Generation

LlamaOCR - Building your Own Private OCR System

Sam Witteveen · 3 min read

LlamaOCR turns screenshots and scanned documents into editable Markdown by using a vision-capable Llama 3.2 model hosted via Together AI. The...

LlamaOCRVision OCRTogether AI

Understanding ReACT with LangChain

Sam Witteveen · 3 min read

ReACT (Reasoning and Action) is a prompting-and-agent pattern designed to make large language models do multi-step problem solving by alternating...

ReACT PromptingChain-of-ThoughtTool-Using Agents

Mistral OCR - Multimodal & Multilingual OCR

Sam Witteveen · 3 min read

Mistral has launched a non-open-source OCR system delivered through an API that turns scanned documents, PDFs, and images into structured, multimodal...

Multimodal OCRMultilingual OCRStructured JSON Extraction

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Sam Witteveen · 3 min read

Custom tools are the key lever for making LangChain conversational agents more useful—and the biggest practical lesson is that tool use often...

Custom ToolsLangChain AgentsReAct Tool Selection

Gemini 1.5 Pro for Code - Part 01

Sam Witteveen · 2 min read

Gemini 1.5 Pro for Code can ingest a real GitHub-style repository, then generate working multi-agent Python code that interacts with the repo’s...

Gemini 1.5 Pro for CodecrewAI multi-agent botsLangChain integration

Testing Gemini 1.5 and a 1 Million Token Window

Sam Witteveen · 2 min read

Gemini 1.5 Pro marks a major step up for long-context AI: it pairs a newly updated model with a dramatically expanded context window—up to 1,048,576...

Gemini 1.5 ProMillion Token ContextMixture of Experts

Google Stitch Just Became an AI Figma (And It's Free)

Sam Witteveen · 3 min read

Google Labs’ Stitch has shifted from simple screenshot-to-design experiments into an agentic, Figma-like workflow for generating full UI...

Agentic DesignDesign SystemsURL-Based Styling

Gemini Browser Use

Sam Witteveen · 3 min read

Google’s Gemini 2.0 push for “browser use” is starting to look less like a closed, proprietary demo and more like a buildable automation...

Gemini Browser UseBrowser AutomationLangChain Models

Advanced RAG 01 - Self Querying Retrieval

Sam Witteveen · 3 min read

RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is...

Self-Querying RetrievalHybrid RAGMetadata Filtering

smolagents - HuggingFace's NEW Agent Framework

Sam Witteveen · 2 min read

Hugging Face’s new “smolagents” framework pushes agent building toward “code agents”: instead of forcing an LLM to emit JSON-style plans, it can...

Agent FrameworkCode AgentsTool Calling

Ollama - Loading Custom Models

Sam Witteveen · 2 min read

Ollama can run fine-tuned models that aren’t already listed locally—by downloading the right quantized weights (GGUF) and creating a small Ollama...

Ollama Custom ModelsGGUF QuantizationLLaMA.cpp Integration

The Gemini Interactions API

Sam Witteveen · 2 min read

Google’s new Gemini Interactions API reframes how developers build with Gemini models by shifting from simple, stateless “prompt in, text out” calls...

Gemini Interactions APIServer-Side StateAgent Background Execution

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Sam Witteveen · 2 min read

Getting rid of OpenAI entirely for Retrieval QA with LangChain is feasible, but the quality hinges on the local LLM’s context limits, prompt format...

Retrieval QALangChainLocal LLMs

Introducing Gemma - 2B 7B 6Trillion Tokens

Sam Witteveen · 2 min read

Google’s new Gemma model suite brings open-weight, English text-only large language models in four sizes—2B and 7B, each available in base and...

Gemma ModelsOpen-Weight LLMsTraining Tokens

Creating an AI Agent with LangGraph Llama 3 & Groq

Sam Witteveen · 2 min read

LangGraph is positioned as the “middle layer” for building AI agents that need structure, state, and controllable decision points—without handing...

LangGraph AgentsState MachinesConditional Routing

Building a LangChain Custom Medical Agent with Memory

Sam Witteveen · 3 min read

A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns...

LangChain AgentsTool WrappersCustom Output Parsing

Gemini RAG - File Search Tool

Sam Witteveen · 3 min read

Gemini’s API team introduced the File Search Tool, a built-in, automated RAG pipeline that turns uploaded documents into a ready-to-query vector...

Gemini APIRAGFile Search Tool

Getting Started with Gemini Pro on Google AI Studio

Sam Witteveen · 3 min read

Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab...

API Key SetupGemini Pro TextSafety Settings

Anthropic's New Agent Protocol!

Sam Witteveen · 3 min read

Anthropic’s Model Context Protocol (MCP) aims to turn LLMs into practical “agents” by standardizing how models connect to external tools and...

Model Context ProtocolAgent Tool UseClaude Desktop

Using LangChain Output Parsers to get what you want out of LLMs

Sam Witteveen · 2 min read

LLM apps fail most often when they accept whatever text a model happens to generate instead of forcing that output into a structure the application...

Output ParsingLangChainPydantic Schemas

Introducing Gemini 3.1 Pro

Sam Witteveen · 3 min read

Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer...

Gemini 3.1 ProThinking LevelsRL Training

Gemini 2.0 Flash

Sam Witteveen · 2 min read

Google’s Gemini 2.0 Flash marks a shift from “multimodal input” to “multimodal output,” with the model able to generate audio and images...

Gemini 2.0 FlashNative Audio OutputImage Generation

Kimi K2.5- The Agent Swarm

Sam Witteveen · 2 min read

Moonshot AI’s Kimi K2.5 positions itself less as a single “bigger model” and more as a platform for task-specialized reasoning—especially through an...

Kimi K2.5Agent SwarmVision Coding

The Improved Gemini 2.5 Pro - A Coding Powerhouse

Sam Witteveen · 3 min read

Google’s new Gemini 2.5 Pro preview version is being positioned as a major step up for coding—less about generic “reasoning” gains and more about...

Gemini 2.5 ProCoding AgentsGoogle Agent Development Kit

Gemini 2.5 Pro for Audio Transcription

Sam Witteveen · 3 min read

Gemini 2.5 Pro’s jump to a 64,000-token generation limit is the practical unlock for high-quality podcast transcription at scale—long enough to turn...

Audio TranscriptionDiarizationToken Limits

How to make a custom dataset like Alpaca7B

Sam Witteveen · 3 min read

A practical path to building an “Alpaca-style” instruction dataset is to start with a small set of human-written seed tasks, then use GPT-3 to expand...

Instruction Fine-TuningDataset GenerationGPT-3 Prompt Expansion

Gemini 3 Flash - Your Daily Workhorse Upgraded

Sam Witteveen · 3 min read

Gemini 3 Flash lands as a faster, more cost-efficient “workhorse” model that, in many benchmarks, lands near Gemini 3 Pro—and sometimes beats...

Gemini 3 FlashToken EfficiencyStructured Outputs

Auto-GPT - How to Automate a Task Based AI with GPT-4

Sam Witteveen · 3 min read

Auto-GPT is positioned as an autonomous AI agent that can carry out multi-step tasks end-to-end—searching the web, browsing pages, extracting...

Autonomous AI AgentsTask AutomationWeb Scraping

OpenAI Functions + LangChain : Building a Multi Tool Agent

Sam Witteveen · 3 min read

OpenAI’s function-calling system, wired through LangChain, can turn a plain chat model into a finance assistant that reliably selects the right API...

OpenAI Function CallingLangChain AgentsYahoo Finance API

PydanticAI - The NEW Agent Builder on the Block

Sam Witteveen · 2 min read

PydanticAI positions itself as a new, Pydantic-first agent and LLM application framework built to make model outputs reliably conform to structured...

PydanticAIStructured OutputTool Calling

Advanced RAG 03 - Hybrid Search BM25 & Ensembles

Sam Witteveen · 2 min read

Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair...

Hybrid SearchBM25 RetrievalEnsemble Retrievers

DeepSeek R1 for Structured Agents

Sam Witteveen · 3 min read

DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no...

Structured AgentsDeepSeek R1Pydantic AI

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Sam Witteveen · 3 min read

Hooking Alpaca to LangChain is straightforward: build a local Hugging Face text-generation pipeline around the Alpaca/LLaMA-compatible model, wrap it...

Alpaca ChatbotLangChain MemoryHugging Face Pipeline

Gemini Embedding 2 - Audio, Text, Images, Docs, Videos

Sam Witteveen · 3 min read

Gemini Embedding 2 is positioned as a single, natively multimodal embedding model that collapses what used to require many separate pipelines—one...

Multimodal EmbeddingsCross-Modal SearchRAG Indexing

Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework

Sam Witteveen · 3 min read

OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small...

Swarm FrameworkMulti-Agent HandoffsAgent Routines

Kyutai STT & TTS - A Perfect Local Voice Solution?

Sam Witteveen · 2 min read

Kyutai’s latest local speech stack—QI TTS for text-to-speech and speech-to-text for ASR—pairs fast, small models with voice conditioning that can...

Local SpeechQI TTSSpeech-to-Text

The 5 Types of LLM Apps

Sam Witteveen · 3 min read

LLM apps can be sorted into five practical categories—ranging from chat-style assistants to fully autonomous agents—so builders can more clearly...

LLM App CategoriesConversational AgentsCopilots

Claude 3 Vs Gemini Vs GPT-4: Who Can Make Amazing Powerpoints?

Sam Witteveen · 3 min read

LLMs can reliably generate the *facts* and basic slide structure for a presentation, but they still struggle to produce consistently sleek, polished...

LLM Slide GenerationPython Deck CodeDesign Constraints

Ollama Launch + Claude Code + GLM Flash

Sam Witteveen · 2 min read

Ollama has introduced “Ollama launch,” a one-command way to run Anthropic API–compatible coding assistants locally—making it possible to use Claude...

Ollama LaunchClaude CodeGLM 4.7 Flash

How can GPT-4.5 be So Bad?

Sam Witteveen · 2 min read

GPT-4.5 arrives with a “bigger and more natural” pitch, but benchmark results and practical tradeoffs paint it as an also-ran: stronger than GPT-4 in...

GPT-4.5LLM ScalingBenchmarks

GPT-4o: What They Didn't Say!

Sam Witteveen · 3 min read

OpenAI’s GPT-4o (“Omni”) marks a shift toward a single, more capable multimodal system—one that can take in text, images, and audio and produce...

GPT-4oMultimodal AIVoice and Prosody

Mistral Small 3 - The NEW Mini Model Killer

Sam Witteveen · 2 min read

Mistral has released “Mistral Small 3,” a new 24B-parameter open-weight model positioned as a fast, capable “workhorse” for everyday tasks—aimed at...

Mistral Small 3Open-Weight ModelsFunction Calling

Running Gemma using HuggingFace Transformers or Ollama

Sam Witteveen · 3 min read

Running Gemma locally or in a notebook is straightforward, but the biggest practical takeaway is that Gemma’s chat behavior depends heavily on its...

Gemma InferenceHugging Face Setup4-bit Quantization

Qwen QwQ 32B - The Best Local Reasoning Model?

Sam Witteveen · 2 min read

QwQ 32B is being positioned as a top-tier “local reasoning” model that can run on personal hardware, and the core claim is that it delivers...

Local Reasoning ModelsMixture of ExpertsReinforcement Learning

Ollama - Libraries, Vision and Updates

Sam Witteveen · 3 min read

Ollama’s latest updates push local AI further into “build-and-automate” territory: new Python/JavaScript libraries, expanded vision model support,...

Ollama LibrariesVision ModelsOpenAI Compatibility

Anthropic Does The Unthinkable with Haiku 3.5

Sam Witteveen · 3 min read

Claude 3.5 Haiku arrives with a major price jump—$1 per million tokens in and $5 per million tokens out—turning what used to be a budget-friendly...

Claude 3.5 Haiku PricingAgentic WorkflowsCost vs Quality

Gemini Pro + LangChain - Chains, Mini RAG, PAL + Multimodal

Sam Witteveen · 3 min read

Gemini Pro becomes a practical building block inside LangChain, enabling everything from simple prompt-to-response chains to mini RAG, PAL...

Gemini ProLangChain ChainsMini RAG

NVIDIA NemoCLAW!! - GTC 2026

Sam Witteveen · 3 min read

NVIDIA’s biggest GTC 2026 announcement isn’t new space hardware or flashy modules—it’s a push to bring OpenClaw-style “agent” software into...

OpenClawNemo ClawOpenShell

Building a Summarization System with LangChain and GPT-3 - Part 1

Sam Witteveen · 2 min read

Summarization quality no longer has to rely on training bespoke models for every writing style. With modern instruction-tuned and RLHF-tuned large...

LangChain SummarizationToken LimitsMapReduce

Qwen 3.5 - The next NEXT model

Sam Witteveen · 3 min read

Qwen 3.5 lands as a major shift in how fast, capable AI can be—pairing a large mixture-of-experts model with a reported up to 19x decoding speed...

Qwen 3.5Mixture of ExpertsMultimodal Training

Clone ANY Voice for Free — Qwen Just Changed Everything

Sam Witteveen · 3 min read

Open-source voice cloning and “voice design” have moved from closed, API-only systems into the open TTS ecosystem: Qwen has released its Quen 3 TTS...

Open-Source TTSVoice CloningVoice Design

Information Extraction with LangChain & Kor

Sam Witteveen · 2 min read

Turning messy text into structured data is the bottleneck for many NLP workflows—especially when there’s no labeled dataset to train a named-entity...

Information ExtractionLangChainKor

Advanced RAG 05 - HyDE - Hypothetical Document Embeddings

Sam Witteveen · 3 min read

HyDE (Hypothetical Document Embeddings) improves retrieval in RAG by using a large language model to draft a “hypothetical answer,” embedding that...

HyDEHypothetical Document EmbeddingsRAG Retrieval

Meta's Code World Model

Sam Witteveen · 2 min read

Meta’s researchers at FAIR released “Code World Model” (CWM), a 32B open-weights model aimed at code generation that goes beyond copying syntax. The...

Code World ModelWorld ModelsCode Execution Traces

Qwen3 Multimodal Embeddings: Finally, RAG That Sees

Sam Witteveen · 3 min read

Qwen 3 VL’s multimodal embedding models aim to make RAG retrieval “see” beyond text by mapping text, images, and video-like content into a shared...

Multimodal EmbeddingsMultimodal RAGReranking

Advanced RAG 04 - Contextual Compressors & Filters

Sam Witteveen · 3 min read

RAG systems often fail not because retrieval misses everything, but because they bring back too much irrelevant text—or the right facts buried inside...

Contextual CompressionRAG FilteringLLM Extractors

Qwen 3 Embeddings & Rerankers

Sam Witteveen · 2 min read

A new open suite of text embedding and reranking models from Qwen is aimed squarely at retrieval-augmented generation (RAG) use cases—especially...

Text EmbeddingsRerankingRAG

SmolDocling - The SmolOCR Solution?

Sam Witteveen · 2 min read

SmolDocling—an IBM-partnered document understanding model on Hugging Face—aims to do more than “plain OCR” by converting documents into a structured,...

Document ConversionStructured OCRVision-Language Models

GeminiCLI - The Deep Dive with MCPs

Sam Witteveen · 3 min read

Gemini CLI’s built-in tools and MCP integrations can turn “rough” app scaffolding into a working, deployable project—especially when developers lean...

Gemini CLIMCP ServersStreaming Chat

Google's Agent Upgrade

Sam Witteveen · 2 min read

Google’s latest “Opal” upgrade shifts agent building from fixed, step-by-step workflows toward goal-driven, interactive experiences—complete with...

Opal Agent BuilderInteractive Agent StepsPersistent Memory

vLLM - Turbo Charge your LLM Inference

Sam Witteveen · 2 min read

Local and cloud deployments of large language models often feel unusably slow, even on strong hardware, because inference bottlenecks pile up around...

LLM InferencevLLM ServingPagedAttention

Gemini 2.0 - How to use the Live Bidirectional API

Sam Witteveen · 2 min read

Gemini 2.0’s Live Bidirectional API is built for real-time, two-way multimodal interaction—letting users talk back and forth with voice, stream...

Live Bidirectional APIMultimodal StreamingAI Studio Setup

Gemini 2.0 - Video Analyzer with Code

Sam Witteveen · 2 min read

Gemini’s “Video Analyzer” turns uploaded videos into structured, time-coded outputs—captions, spoken transcripts, visual scene descriptions, key...

Video AnalysisFunction CallingTimecoded Captions

OpenAI's New OPEN Models - GPT-OSS 120B & 20B

Sam Witteveen · 3 min read

OpenAI has released two open-weights language models under an Apache 2.0 license: a 120B-parameter model and a 20B-parameter model. The headline...

Open-Weights ModelsApache 2.0 LicensingAgentic Tool Use

How to use BGE Embeddings for LangChain and RAG

Sam Witteveen · 2 min read

BGE embeddings from the Beijing Academy of AI have surged to the top of major embedding benchmarks while dramatically shrinking model size—making...

BGE EmbeddingsLangChain RAGChroma Vector Store

CrewAI - Building a Custom Crew

Sam Witteveen · 3 min read

A custom CrewAI workflow can reliably turn a user-chosen topic into a researched, saved markdown article—but the “process shape” matters. In a...

CrewAI Custom CrewSequential vs HierarchicalAgent Tools

Cohere's Command-R a Strong New Model for RAG

Sam Witteveen · 3 min read

Cohere’s Command-R arrives as a purpose-built model for retrieval-augmented generation (RAG) and tool/function calling, not as a bid to replace top...

Command-RRetrieval Augmented GenerationTool Use

How to use Custom Prompts for RetrievalQA on LLaMA-2 7B

Sam Witteveen · 2 min read

RetrievalQA with LLaMA-2 can produce “correct-but-then-junk” outputs—answers that start right and then trail off into unhelpful or incorrect text....

RetrievalQALLaMA-2 PromptingRAG Reliability

LLaMA2 Tokenizer and Prompt Tricks

Sam Witteveen · 3 min read

LLaMA 2’s behavior hinges less on “magic prompting” and more on two concrete levers: the tokenizer’s limited vocabulary size and, especially, the...

LLaMA 2 TokenizerSystem Prompt TokensPrompt Steering

Camel + LangChain for Synthetic Data & Market Research

Sam Witteveen · 3 min read

Camel—an “autonomous GPT” approach built around two agents talking to each other—gets positioned as a practical engine for synthetic data and market...

Camel Multi-AgentInception PromptingRole-Playing Prompts

NEW - Anthropic Updated Claude Models & Computer Use Agents!!

Sam Witteveen · 3 min read

Anthropic’s latest release pairs two upgraded Claude models with a new “computer use” capability that lets Claude interact with a user’s computer...

Claude 3.5 SonnetClaude 3.5 HaikuComputer Use API

HOW to Make Conversational Form with LangChain | LangChain TUTORIAL

Sam Witteveen · 2 min read

Conversational forms don’t have to feel like web-page data entry. By extracting structured fields from free-form chat and then asking only what’s...

Conversational FormLangChainPydantic Extraction

MiroThinker 1.5 - The 30B That Outperforms 1T Models

Sam Witteveen · 3 min read

MirrorThinker 1.5 is positioned as a practical shift in agent design: instead of relying on a single, information-heavy model, it’s built to...

Tool-Using AgentsMirrorThinker 1.5Mixture of Experts

SmolLMv3 - A Small Reasoner with Tool Use.

Sam Witteveen · 3 min read

Hugging Face has released SmolLMv3, a 3B-parameter language model aimed at “small” local deployment without giving up reasoning and tool use. The...

SmolLMv3 ReleaseTool CallingDual Think Reasoning

KittenTTS - The Nano TTS

Sam Witteveen · 2 min read

Kitten ML’s “KittenTTS” pushes text-to-speech into a new size category: multiple TTS models that fit under 25 MB, are optimized for CPU-only use, and...

Edge TTSModel QuantizationCPU-Optimized Inference

Haiku 4.5 - Small Beats Big

Sam Witteveen · 3 min read

Claude Haiku 4.5 is arriving with higher prices, but it’s also delivering a rare mix of speed and task performance that makes it a strong candidate...

Claude Haiku 4.5Agent WorkflowsModel Pricing

The "Token Muncher" Problem: Is Sonnet 4.6 Actually Cheaper?

Sam Witteveen · 2 min read

Claude Sonnet 4.6 is positioned as a cheaper, more capable step up from earlier Sonnet models—especially for knowledge work and “computer use”...

Model PricingAdaptive ThinkingToken Usage

EmbeddingGemma - Micro Embeddings for Mobile Devices

Sam Witteveen · 2 min read

EmbeddingGemma is a family of tiny, text-only embedding models designed to run on-device, enabling retrieval, semantic search, clustering, and “micro...

Embedding ModelsOn-Device AIMicro RAG

Qwen3 Next - Behind the Curtain

Sam Witteveen · 3 min read

Qwen 3 Next is an 80B Mixture of Experts (MoE) model built to run with only 3B active parameters per inference—an efficiency leap that still lands it...

Mixture of ExpertsMulti-Token PredictionInference Efficiency

OpenAI's Agent Builder

Sam Witteveen · 3 min read

OpenAI’s Agent Builder turns agent design into a node-and-guardrail workflow: prompts, classification, conditional routing, and tool use are...

Agent BuilderGuardrailsConditional Routing

Building Single-User vs Multi-User Agents: What Actually Changes

Sam Witteveen · 3 min read

The biggest shift in building agent systems isn’t “one agent vs many agents.” It’s “one user’s private world vs a shared, multi-tenant world,” and...

Single-User AgentsMulti-User AgentsAgent Harness

Tiny Aya - Cohere's Mini Multilingual Models

Sam Witteveen · 3 min read

Choosing a language model for non-English languages is often a guessing game—especially for low-resource languages with limited internet data and...

Multilingual Language ModelsTokenizationLow-Resource Languages

Sora 2 - OpenAI's TikTok

Sam Witteveen · 3 min read

OpenAI’s Sora 2 is arriving not just as a better video-generation model, but as the foundation for a TikTok-style social network—complete with an iOS...

Sora 2CameosShort-Form Video