Sam Witteveen — Channel Summaries

AI-powered summaries of 183 videos about Sam Witteveen.

183 summaries

No matches found.

DeepSeek OCR - More than OCR

Sam Witteveen · 3 min read

DeepSeek OCR’s headline idea isn’t better document reading—it’s a new way to compress long text into far fewer “vision tokens,” then decode it back...

Context Optimal CompressionVision Token CompressionLong-Context Memory

Fine-tuning LLMs with PEFT and LoRA

Sam Witteveen · 3 min read

Fine-tuning large language models is expensive because it requires updating massive weight tensors, which drives up both compute needs and checkpoint...

Parameter Efficient Fine-TuningLoRA AdaptersCatastrophic Forgetting

Introducing Gemini CLI

Sam Witteveen · 2 min read

Google’s Gemini team is rolling out Gemini CLI, a command-line interface that turns Gemini Code Assist into an agent-like workflow for editing files,...

Gemini CLIGemini Code AssistMCP Integration

Opal - Google Labs Killer NEW App

Sam Witteveen · 3 min read

Google Labs’ Opal is a no-code workflow builder aimed at turning natural-language requests into working LLM “mini apps,” with built-in steps for web...

OpalGoogle LabsNo-Code LLM Workflows

Ollama - Local Models on your machine

Sam Witteveen · 2 min read

Ollama is a user-friendly way to run large language models locally on a Mac or Linux machine by downloading them and serving them through a local...

Local LLMsOllama SetupModel Downloads

Google's New Universal Commerce Protocol

Sam Witteveen · 3 min read

Google’s Universal Commerce Protocol (UCP) is positioned as the missing retail layer in agentic commerce: a shared standard that lets AI agents...

Universal Commerce ProtocolAgentic CommerceProduct Discovery

LangChain Retrieval QA Over Multiple Files with ChromaDB

Sam Witteveen · 2 min read

LangChain retrieval QA becomes practical at scale once the document embeddings live in a persistent ChromaDB vector store on disk. Instead of...

LangChain Retrieval QAChromaDB PersistenceVector Store Retrieval

LangChain - Using Hugging Face Models locally (code walkthrough)

Sam Witteveen · 2 min read

Running Hugging Face models locally inside LangChain is the practical workaround when Hugging Face Hub access fails—especially for conversational...

LangChainHugging Face HubLocal Transformers

LangGraph Crash Course with code examples

Sam Witteveen · 3 min read

LangGraph is positioned as a new way to run LangChain-based LLM agents by modeling agent behavior as a graph-driven state machine rather than a fixed...

LangGraph Crash CourseAgent State MachineTool Calling Loops

Talk to your CSV & Excel with LangChain

Sam Witteveen · 3 min read

LangChain can turn natural-language questions into accurate, on-the-fly analysis of CSV and Excel data by using a “CSV agent” that runs a Python REPL...

LangChain CSV AgentNatural Language Data Queryingpandas DataFrame Filtering

olmOCR - The Open OCR System

Sam Witteveen · 2 min read

OCR for PDFs is getting a practical upgrade: Llama AI’s olmOCR is a fine-tuned vision-language model designed to turn rasterized PDF pages (including...

OCR for PDFsVision-Language ModelsHandwriting Recognition

Microsoft Launches 10 NEW AI Agents

Sam Witteveen · 3 min read

Microsoft’s Ignite announcements push autonomous AI agents deep into enterprise software—especially Dynamics 365—by rolling out 10 new agent...

Autonomous AgentsDynamics 365Sales Automation

LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab

Sam Witteveen · 3 min read

Large language models are powerful at generating text, but they don’t plug cleanly into the rest of software—especially when apps need state,...

LangChain BasicsPrompt TemplatesFew-Shot Prompting

Gemini 3 Pro - The Model You've Been Waiting For

Sam Witteveen · 3 min read

Gemini 3 Pro is positioned as a long-horizon, tool-using model built for “clever, concise, direct” work—less about a flashy personality and more...

Gemini 3 ProAgentic CodingAI Studio

GPT 5 - What They Didn't Say

Sam Witteveen · 3 min read

OpenAI’s GPT-5 rollout is being framed less as a single leap in raw intelligence and more as a cost-and-workflow upgrade: ChatGPT-5 behaves like a...

GPT-5 SystemAgentic CodingBenchmark Methodology

Anthropic's Meta Prompt: A Must-try!

Sam Witteveen · 2 min read

Anthropic’s “Metaprompt” tool turns weak, one-off prompts into a structured, model-ready instruction set—by using Claude itself to generate the final...

MetapromptPrompt EngineeringClaude 3

LangExtract - Google's New Library for NLP Tasks

Sam Witteveen · 2 min read

Standard NLP tasks—like sentiment classification, named-entity extraction, and entity disambiguation—have long relied on fine-tuned BERT-style...

Information ExtractionSource GroundingFew-Shot Prompting

DeepSeek's New Image Model - Janus Pro

Sam Witteveen · 3 min read

DeepSeek’s Janus Pro stands out for combining two capabilities in one multimodal system: it can answer questions about images (using a SigLIP-based...

Multimodal AIImage UnderstandingText-to-Image Generation

LangChain Agents - Joining Tools and Chains with Decisions

Sam Witteveen · 3 min read

LangChain agents let a language model choose—at runtime—which tools to use (or whether to use any tools at all) to answer a user’s question. Instead...

LangChain AgentsTool SelectionZero-Shot React

Mesop - Google's New UI Maker

Sam Witteveen · 3 min read

Building LLM apps often stalls on two fronts: getting a working interface in front of real users fast enough to collect feedback, and validating...

LLM UI PrototypingMesop FrameworkLangChain Memory

LlamaOCR - Building your Own Private OCR System

Sam Witteveen · 3 min read

LlamaOCR turns screenshots and scanned documents into editable Markdown by using a vision-capable Llama 3.2 model hosted via Together AI. The...

LlamaOCRVision OCRTogether AI

LangChain - Conversations with Memory (explanation & code walkthrough)

Sam Witteveen · 3 min read

Memory is the difference between a chat agent that feels coherent and one that repeatedly “forgets” what a user meant earlier—especially when people...

LangChain MemoryConversation BufferConversation Summary

Master CrewAI: Your Ultimate Beginner's Guide!

Sam Witteveen · 3 min read

High-quality AI agents hinge on consistency: they must deliver the right outcome reliably, not just “work” most of the time. The framework presented...

Agent ReliabilityTool AugmentationCrewAI Core Concepts

Google Launches an Agent SDK - Agent Development Kit

Sam Witteveen · 2 min read

Google has launched an “Agent Development Kit” (Agent SDK) aimed at building deployable AI agents in the cloud, with built-in support for evaluation,...

Agent Development KitCloud DeploymentTool Integration

Understanding ReACT with LangChain

Sam Witteveen · 3 min read

ReACT (Reasoning and Action) is a prompting-and-agent pattern designed to make large language models do multi-step problem solving by alternating...

ReACT PromptingChain-of-ThoughtTool-Using Agents

LiteParse - The Local Document Parser

Sam Witteveen · 3 min read

Coding agents can generate impressive Python at scale, but document-heavy workflows expose a persistent failure mode: PDFs, spreadsheets, and charts...

Document ParsingOCR AccuracyAgent Tooling

Mistral OCR - Multimodal & Multilingual OCR

Sam Witteveen · 3 min read

Mistral has launched a non-open-source OCR system delivered through an API that turns scanned documents, PDFs, and images into structured, multimodal...

Multimodal OCRMultilingual OCRStructured JSON Extraction

Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)

Sam Witteveen · 3 min read

Custom tools are the key lever for making LangChain conversational agents more useful—and the biggest practical lesson is that tool use often...

Custom ToolsLangChain AgentsReAct Tool Selection

Master PDF Chat with LangChain - Your essential guide to queries on documents

Sam Witteveen · 3 min read

Building a “chat with your PDF” system hinges on one practical fix: plain prompting can’t reliably handle long books because the context window is...

PDF Question AnsweringVector StoresEmbeddings

Gemini 1.5 Pro for Code - Part 01

Sam Witteveen · 2 min read

Gemini 1.5 Pro for Code can ingest a real GitHub-style repository, then generate working multi-agent Python code that interacts with the repo’s...

Gemini 1.5 Pro for CodecrewAI multi-agent botsLangChain integration

Testing Gemini 1.5 and a 1 Million Token Window

Sam Witteveen · 2 min read

Gemini 1.5 Pro marks a major step up for long-context AI: it pairs a newly updated model with a dramatically expanded context window—up to 1,048,576...

Gemini 1.5 ProMillion Token ContextMixture of Experts

Google Stitch Just Became an AI Figma (And It's Free)

Sam Witteveen · 3 min read

Google Labs’ Stitch has shifted from simple screenshot-to-design experiments into an agentic, Figma-like workflow for generating full UI...

Agentic DesignDesign SystemsURL-Based Styling

Ollama meets LangChain

Sam Witteveen · 2 min read

Running Ollama models locally turns LangChain into an on-device workflow: Python code can call a local LLaMA-2 instance through an API, generate...

OllamaLangChainLocal LLM

Llama3 + CrewAI + Groq = Email AI Agent

Sam Witteveen · 3 min read

A practical recipe for turning Llama 3 into an email-reply agent with CrewAI is built around Groq’s fast inference—using the Llama 3 70B model with...

Email AI AgentsCrewAILlama 3

Gemini Browser Use

Sam Witteveen · 3 min read

Google’s Gemini 2.0 push for “browser use” is starting to look less like a closed, proprietary demo and more like a buildable automation...

Gemini Browser UseBrowser AutomationLangChain Models

Advanced RAG 01 - Self Querying Retrieval

Sam Witteveen · 3 min read

RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is...

Self-Querying RetrievalHybrid RAGMetadata Filtering

smolagents - HuggingFace's NEW Agent Framework

Sam Witteveen · 2 min read

Hugging Face’s new “smolagents” framework pushes agent building toward “code agents”: instead of forcing an LLM to emit JSON-style plans, it can...

Agent FrameworkCode AgentsTool Calling

Ollama - Loading Custom Models

Sam Witteveen · 2 min read

Ollama can run fine-tuned models that aren’t already listed locally—by downloading the right quantized weights (GGUF) and creating a small Ollama...

Ollama Custom ModelsGGUF QuantizationLLaMA.cpp Integration

The Gemini Interactions API

Sam Witteveen · 2 min read

Google’s new Gemini Interactions API reframes how developers build with Gemini models by shifting from simple, stateless “prompt in, text out” calls...

Gemini Interactions APIServer-Side StateAgent Background Execution

LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!

Sam Witteveen · 2 min read

Getting rid of OpenAI entirely for Retrieval QA with LangChain is feasible, but the quality hinges on the local LLM’s context limits, prompt format...

Retrieval QALangChainLocal LLMs

Introducing Gemma - 2B 7B 6Trillion Tokens

Sam Witteveen · 2 min read

Google’s new Gemma model suite brings open-weight, English text-only large language models in four sizes—2B and 7B, each available in base and...

Gemma ModelsOpen-Weight LLMsTraining Tokens

Creating an AI Agent with LangGraph Llama 3 & Groq

Sam Witteveen · 2 min read

LangGraph is positioned as the “middle layer” for building AI agents that need structure, state, and controllable decision points—without handing...

LangGraph AgentsState MachinesConditional Routing

MetaVoice 1B - TTS & Voice Cloning

Sam Witteveen · 2 min read

MetaVoice has released MetaVoice 1B, a 1.2B-parameter, Apache-licensed text-to-speech model aimed at open experimentation—along with a GitHub repo...

Text To SpeechVoice CloningOpen Source

Building a LangChain Custom Medical Agent with Memory

Sam Witteveen · 3 min read

A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns...

LangChain AgentsTool WrappersCustom Output Parsing

Gemini RAG - File Search Tool

Sam Witteveen · 3 min read

Gemini’s API team introduced the File Search Tool, a built-in, automated RAG pipeline that turns uploaded documents into a ready-to-query vector...

Gemini APIRAGFile Search Tool

Gemini 1.5 Pro for Video Analysis

Sam Witteveen · 2 min read

Gemini 1.5 Pro can extract highly specific information from a long video—down to approximate timestamps for when key topics appear—making video-based...

Video AnalysisGemini 1.5 ProLong Context Window

Getting Started with Gemini Pro on Google AI Studio

Sam Witteveen · 3 min read

Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab...

API Key SetupGemini Pro TextSafety Settings

Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3

Sam Witteveen · 2 min read

Running function calling and structured JSON outputs locally is practical with smaller open models—especially Llama 3 8B on Ollama—and it enables...

Local Function CallingOllamaLangChain

Anthropic's New Agent Protocol!

Sam Witteveen · 3 min read

Anthropic’s Model Context Protocol (MCP) aims to turn LLMs into practical “agents” by standardizing how models connect to external tools and...

Model Context ProtocolAgent Tool UseClaude Desktop

Using LangChain Output Parsers to get what you want out of LLMs

Sam Witteveen · 2 min read

LLM apps fail most often when they accept whatever text a model happens to generate instead of forcing that output into a structure the application...

Output ParsingLangChainPydantic Schemas

Gemini TTS - Native Audio Out

Sam Witteveen · 2 min read

Google’s “native audio out” for Gemini is now available in preview, letting developers generate speech directly from Gemini models with controllable...

Gemini TTSNative Audio OutMulti-Speaker Speech

Introducing Gemini 3.1 Pro

Sam Witteveen · 3 min read

Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer...

Gemini 3.1 ProThinking LevelsRL Training

Claude Skills - SOPs For Agents

Sam Witteveen · 2 min read

Standard operating procedures are becoming a core building block for AI agents—not just a human workflow tool. The shift matters because LLM output...

Agent SOPsClaude SkillsSkill Creator

Gemini 2.0 Flash

Sam Witteveen · 2 min read

Google’s Gemini 2.0 Flash marks a shift from “multimodal input” to “multimodal output,” with the model able to generate audio and images...

Gemini 2.0 FlashNative Audio OutputImage Generation

Mistral 8x7B Part 1- So What is a Mixture of Experts Model?

Sam Witteveen · 2 min read

Mistral’s newly released “8x7B” model is a Mixture of Experts (MoE) system: eight separate expert networks, each roughly the size of Mistral 7B, are...

Mixture of ExpertsGating NetworksMistral 8x7B

DeepSeekR1 - Full Breakdown

Sam Witteveen · 3 min read

DeepSeek has released open weights for its reasoning model family, led by DeepSeek R1, along with a set of distilled smaller models that can...

DeepSeek R1Model DistillationMixture of Experts

Image Annotation with LLava & Ollama

Sam Witteveen · 2 min read

A practical way to turn a cluttered screenshot folder into a searchable archive is to run a local vision-language model over each image and save the...

Screenshot AnnotationOllamaLLaVA 1.6

Kimi K2.5- The Agent Swarm

Sam Witteveen · 2 min read

Moonshot AI’s Kimi K2.5 positions itself less as a single “bigger model” and more as a platform for task-specialized reasoning—especially through an...

Kimi K2.5Agent SwarmVision Coding

... there's more to Sonnet 4.5

Sam Witteveen · 3 min read

Claude Sonnet 4.5 is being positioned as more than a faster, better coding model: it’s a stepping stone toward Anthropic’s “virtual collaborator,” an...

Claude Sonnet 4.5Virtual CollaboratorAgentic Coding

The Improved Gemini 2.5 Pro - A Coding Powerhouse

Sam Witteveen · 3 min read

Google’s new Gemini 2.5 Pro preview version is being positioned as a major step up for coding—less about generic “reasoning” gains and more about...

Gemini 2.5 ProCoding AgentsGoogle Agent Development Kit

Unveiling Meta's Impressive CV Model: Sam 2

Sam Witteveen · 3 min read

Meta’s SAM 2 pushes “segment anything” from still images into real-time video—letting users prompt what to track and then generating precise...

Video SegmentationPromptable AITemporal Memory

Gemini 2.5 Pro for Audio Transcription

Sam Witteveen · 3 min read

Gemini 2.5 Pro’s jump to a 64,000-token generation limit is the practical unlock for high-quality podcast transcription at scale—long enough to turn...

Audio TranscriptionDiarizationToken Limits

MedGemma - An Open Doctor Model?

Sam Witteveen · 2 min read

Google’s newly released MedGemma models put open-source medical AI within reach for researchers and developers—complete with multimodal (image+text)...

MedGemmaMedical AIMedQA Benchmark

How to make a custom dataset like Alpaca7B

Sam Witteveen · 3 min read

A practical path to building an “Alpaca-style” instruction dataset is to start with a small set of human-written seed tasks, then use GPT-3 to expand...

Instruction Fine-TuningDataset GenerationGPT-3 Prompt Expansion

Explaining OpenAI's o1 Reasoning Models

Sam Witteveen · 3 min read

OpenAI’s o1 and o1 mini are reasoning-first models that trade speed for deeper problem solving by spending substantially more compute during...

Reasoning ModelsReinforcement LearningInference-Time Compute

Is GPT4All your new personal ChatGPT?

Sam Witteveen · 2 min read

A new open-weight chat model called “GPT4All” is drawing attention as a potential “personal ChatGPT” alternative, but hands-on tests show it’s closer...

GPT4AllLoRA Fine-TuningNomic.ai Filtering

Gemini 3 Flash - Your Daily Workhorse Upgraded

Sam Witteveen · 3 min read

Gemini 3 Flash lands as a faster, more cost-efficient “workhorse” model that, in many benchmarks, lands near Gemini 3 Pro—and sometimes beats...

Gemini 3 FlashToken EfficiencyStructured Outputs

Auto-GPT - How to Automate a Task Based AI with GPT-4

Sam Witteveen · 3 min read

Auto-GPT is positioned as an autonomous AI agent that can carry out multi-step tasks end-to-end—searching the web, browsing pages, extracting...

Autonomous AI AgentsTask AutomationWeb Scraping

Google's NEW Agent Money Protocol

Sam Witteveen · 2 min read

Google is rolling out an “Agent Payments Protocol” designed to let AI agents handle money—making purchases, paying merchants, and managing financial...

Agent Payments ProtocolAgent2AgentMCP

OpenAI Functions + LangChain : Building a Multi Tool Agent

Sam Witteveen · 3 min read

OpenAI’s function-calling system, wired through LangChain, can turn a plain chat model into a finance assistant that reliably selects the right API...

OpenAI Function CallingLangChain AgentsYahoo Finance API

PydanticAI - The NEW Agent Builder on the Block

Sam Witteveen · 2 min read

PydanticAI positions itself as a new, Pydantic-first agent and LLM application framework built to make model outputs reliably conform to structured...

PydanticAIStructured OutputTool Calling

Google's NEW Agent2Agent Protocol

Sam Witteveen · 2 min read

Google’s new “agent-to-agent” (A2A) protocol is aimed at turning today’s tool-using AI agents into a network of collaborating agents that can...

Agent-to-Agent ProtocolAgent DiscoveryAgent Marketplaces

The Rise of WebMCP

Sam Witteveen · 3 min read

WebMCP is poised to replace today’s “guess-and-scrape” web interaction for AI agents by letting websites expose structured, callable tools directly...

WebMCPAI AgentsBrowser Tools

Gemini Deep Think

Sam Witteveen · 3 min read

Google’s “Deep Think” reasoning model is now publicly available after helping DeepMind reach gold-medal-level performance at the International...

International Mathematical OlympiadDeep ThinkReasoning Latency

Llama 3 - 8B & 70B Deep Dive

Sam Witteveen · 3 min read

Meta’s Llama 3 release centers on two new open-weight language models—8B and 70B—that aim to outperform last generation’s Llama 2 while matching or...

Llama 3 ReleaseModel BenchmarksHugging Face License

Gemini 2.0 Pro - The Family Expands

Sam Witteveen · 2 min read

Google’s Gemini model lineup expands with a new Gemini 2.0 Pro—an experimental, fully multimodal model with a 2 million token context...

Gemini 2.0 ProModel Availability2 Million Token Context

BabyAGI: Discover the Power of Task-Driven Autonomous Agents!

Sam Witteveen · 3 min read

Task-driven autonomous agents are moving from “chat” to structured, tool-using workflows: a large language model takes an objective, breaks it into a...

Task QueuesAutonomous AgentsVector Memory

Advanced RAG 03 - Hybrid Search BM25 & Ensembles

Sam Witteveen · 2 min read

Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair...

Hybrid SearchBM25 RetrievalEnsemble Retrievers

DeepSeek R1 for Structured Agents

Sam Witteveen · 3 min read

DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no...

Structured AgentsDeepSeek R1Pydantic AI

Talking to Alpaca with LangChain - Creating an Alpaca Chatbot

Sam Witteveen · 3 min read

Hooking Alpaca to LangChain is straightforward: build a local Hugging Face text-generation pipeline around the Alpaca/LLaMA-compatible model, wrap it...

Alpaca ChatbotLangChain MemoryHugging Face Pipeline

Gemini Embedding 2 - Audio, Text, Images, Docs, Videos

Sam Witteveen · 3 min read

Gemini Embedding 2 is positioned as a single, natively multimodal embedding model that collapses what used to require many separate pipelines—one...

Multimodal EmbeddingsCross-Modal SearchRAG Indexing

Building a LangGraph ReAct Mini Agent

Sam Witteveen · 3 min read

A simple LangGraph pattern—one “reasoner” node plus a single prebuilt “tools” node—can replace sprawling agent graphs full of separate nodes for each...

LangGraphReActFunction Calling

Gemini 1.5 for Summarization

Sam Witteveen · 3 min read

A long-context model can summarize, extract, and answer questions from a brand-new book—without relying on prior training on that specific text—by...

Long-Context SummarizationChapter and Interview ExtractionResource Indexing

Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework

Sam Witteveen · 3 min read

OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small...

Swarm FrameworkMulti-Agent HandoffsAgent Routines

Kyutai STT & TTS - A Perfect Local Voice Solution?

Sam Witteveen · 2 min read

Kyutai’s latest local speech stack—QI TTS for text-to-speech and speech-to-text for ASR—pairs fast, small models with voice conditioning that can...

Local SpeechQI TTSSpeech-to-Text

Open Reasoning vs OpenAI

Sam Witteveen · 3 min read

OpenAI’s “o1” reasoning models may not keep their edge for long: within roughly two to two and a half months, multiple open-weights labs released...

Reasoning ModelsTest-Time ComputeOpen Weights

PydanticAI - Building a Research Agent

Sam Witteveen · 3 min read

A research agent built with PydanticAI can turn a single question into multiple targeted web searches, run them concurrently, and return a...

Research AgentPydanticAIStructured Output

The 5 Types of LLM Apps

Sam Witteveen · 3 min read

LLM apps can be sorted into five practical categories—ranging from chat-style assistants to fully autonomous agents—so builders can more clearly...

LLM App CategoriesConversational AgentsCopilots

Claude 3 Vs Gemini Vs GPT-4: Who Can Make Amazing Powerpoints?

Sam Witteveen · 3 min read

LLMs can reliably generate the *facts* and basic slide structure for a presentation, but they still struggle to produce consistently sleek, polished...

LLM Slide GenerationPython Deck CodeDesign Constraints

What is an LLM Router?

Sam Witteveen · 3 min read

LLM routing is emerging as a practical way to cut inference costs without giving up much quality: instead of sending every prompt to the most capable...

LLM RoutingCost OptimizationModel Selection

Vicuna - 90% of ChatGPT quality by using a new dataset?

Sam Witteveen · 3 min read

Vicuna is being positioned as an open-source-style chat model that delivers roughly “90% of ChatGPT quality” by fine-tuning a LLaMa base model on...

VicunaLLaMa Fine-TuningShareGPT Dataset

Ollama Launch + Claude Code + GLM Flash

Sam Witteveen · 2 min read

Ollama has introduced “Ollama launch,” a one-command way to run Anthropic API–compatible coding assistants locally—making it possible to use Claude...

Ollama LaunchClaude CodeGLM 4.7 Flash

How can GPT-4.5 be So Bad?

Sam Witteveen · 2 min read

GPT-4.5 arrives with a “bigger and more natural” pitch, but benchmark results and practical tradeoffs paint it as an also-ran: stronger than GPT-4 in...

GPT-4.5LLM ScalingBenchmarks

Unlock Open Multimodality with Phi-4

Sam Witteveen · 3 min read

Microsoft’s Phi-4 family just got more practical for local, multimodal work: the Phi-4 3.8B “mini instruct” lineup now includes function calling and...

Phi-4 ReleaseFunction CallingOnyx Runtime

Gemini 2.0 Flash Thinking

Sam Witteveen · 3 min read

Google has released an experimental Gemini 2.0 Flash model branded “Gemini 2.0 Flash Thinking,” notable for exposing full reasoning traces...

Gemini 2.0 Flash ThinkingChain of Thought TracesTest-Time Compute

GPT-4o: What They Didn't Say!

Sam Witteveen · 3 min read

OpenAI’s GPT-4o (“Omni”) marks a shift toward a single, more capable multimodal system—one that can take in text, images, and audio and produce...

GPT-4oMultimodal AIVoice and Prosody

Using LangChain with DuckDuckGO Wikipedia & PythonREPL Tools

Sam Witteveen · 3 min read

LangChain agents can be made to choose among three built-in tools—Wikipedia, DuckDuckGo search, and Python’s Read-Evaluate-Print Loop (Python...

LangChain ToolsWikipedia LookupDuckDuckGo Search

Mistral Small 3 - The NEW Mini Model Killer

Sam Witteveen · 2 min read

Mistral has released “Mistral Small 3,” a new 24B-parameter open-weight model positioned as a fast, capable “workhorse” for everyday tasks—aimed at...

Mistral Small 3Open-Weight ModelsFunction Calling

Running Gemma using HuggingFace Transformers or Ollama

Sam Witteveen · 3 min read

Running Gemma locally or in a notebook is straightforward, but the biggest practical takeaway is that Gemma’s chat behavior depends heavily on its...

Gemma InferenceHugging Face Setup4-bit Quantization

Advanced RAG 02 - Parent Document Retriever

Sam Witteveen · 3 min read

Parent document retrievers fix a common RAG tradeoff: embeddings need to be specific enough to find the right passage, but the language model needs...

Parent Document RetrieverAdvanced RAGChunking Strategy