Get AI summaries of any video or article — Sign up free

Venelin Valkov — Person Summaries

AI-powered summaries of 66 videos about Venelin Valkov.

66 summaries

No matches found.

Fine-Tuning Llama 3 on a Custom Dataset: Training LLM for a RAG Q&A Use Case on a Single GPU

Venelin Valkov · 3 min read

Fine-tuning Meta’s Llama 3 8B Instruct on a domain-specific Q&A dataset can be done on a single GPU by combining 4-bit quantization with a LoRA-style...

LoRA Fine-Tuning4-Bit QuantizationChat Template Formatting

Chat With Your Database! Build a Local SQL AI Agent to Query Databases (LangChain & Ollama)

Venelin Valkov · 3 min read

A fully local “chat with your database” agent can translate natural-language questions into SQL, run the queries against a local SQLite database, and...

Local SQL AgentLangChain ToolsOllama Tool Calling

Private GPT4All : Chat with PDF with Local & Free LLM using GPT4All, LangChain & HuggingFace

Venelin Valkov · 2 min read

Running a local, privacy-friendly “chat with your PDF” pipeline is practical with GPT4All—provided the workflow is built around retrieval (embeddings...

Local PDF Q&ARetrieval-Augmented GenerationGPT4AllJ

SQL AI Agents: Analyze Relational Databases with Natural Language using Llama 3 (LLM) and CrewAI

Venelin Valkov · 3 min read

AI agents can turn natural-language questions into SQL queries, pull results from a relational database, and then generate a readable analysis and...

SQL Agent TeamsNatural Language to SQLCrewAI Orchestration

MCP Complete Tutorial - Connect Local AI Agent (Ollama) to Tools with MCP Server and Client

Venelin Valkov · 3 min read

Model Context Protocol (MCP) is positioned as a standardized way to connect AI models to external tools and data without hand-building bespoke...

Model Context ProtocolMCP ToolsJSON-RPC

Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama

Venelin Valkov · 2 min read

A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and...

Local AI AssistantDeepSeek-R1Ollama Deployment

Fine-tuning Alpaca: Train Alpaca LoRa for Sentiment Analysis on a Custom Dataset

Venelin Valkov · 2 min read

Fine-tuning Llama 7B with LoRA on a custom Bitcoin-tweet sentiment dataset can produce a practical sentiment classifier that labels new tweets as...

LoRA Fine-TuningLlama 7BBitcoin Sentiment

Local RAG with Llama 3.1 for PDFs | Private Chat with Your Documents using LangChain & Streamlit

Venelin Valkov · 3 min read

A fully local “chat with your PDFs” system can be built using open models and self-hosted infrastructure, with responses grounded in retrieved...

Local RAGPDF IngestionVector Retrieval

Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU

Venelin Valkov · 3 min read

Fine-tuning a “tiny” LLM on a custom dataset can deliver strong sentiment and topic predictions using a single GPU—provided the training setup is...

LoRA Fine-TuningTinyLlamaSentiment Analysis

Build a Private Chatbot with Local LLM (Falcon 7B) and LangChain

Venelin Valkov · 2 min read

A practical recipe for running a private chatbot on a single GPU hinges on two engineering moves: loading Falcon 7B instruct in 8-bit to fit within...

Local LLM8-bit QuantizationStopping Criteria

Fine-tuning Llama 3.2 on Your Data with a single GPU | Training LLM for Sentiment Analysis

Venelin Valkov · 3 min read

Fine-tuning Llama 3.2 (1B) for sentiment classification on a custom mental-health dataset can jump accuracy from roughly 30% to nearly 85% using a...

LoRA Fine-TuningLlama 3.2 1BSentiment Classification

Vectorless RAG - Local Financial RAG Without Vector Database | Tree-Based Indexing with Ollama

Venelin Valkov · 3 min read

Vectorless RAG can retrieve and answer questions from structured documents without any vector database by building a tree index from the document’s...

Vectorless RAGTree-Based IndexingLocal Financial RAG

LLM JSON Output - Get Valid JSON with Pydantic and LangChain Output Parsers

Venelin Valkov · 2 min read

Getting reliable JSON from large language models—especially ones that don’t natively support structured outputs—requires more than “please output...

JSON OutputPydantic SchemasLangChain Output Parsers

Deploy LayoutLMv3 for Document Classification using Streamlit, Transformers and HuggingFace Spaces

Venelin Valkov · 2 min read

A Streamlit web app is built to classify document images using a fine-tuned LayoutLMv3 model, then deployed to Hugging Face Spaces so anyone can...

Document ClassificationLayoutLMv3Streamlit Deployment

Convert Any Document To LLM Knowledge with Docling & Ollama (100% Local) | PDF to Markdown Pipeline

Venelin Valkov · 2 min read

Building a reliable, fully local knowledge base from PDFs hinges on turning messy layouts—especially tables and charts—into structured Markdown that...

PDF to MarkdownDocling PipelineTable Extraction

Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search

Venelin Valkov · 2 min read

Sentence Transformers (SBERT) turn sentences into fixed-length embeddings and then use cosine similarity to score semantic closeness—making it...

Sentence EmbeddingsSiamese NetworksSemantic Search

Reproducible Machine Learning & Experiment Tracking Pipeline with Python and DVC

Venelin Valkov · 2 min read

Data and model reproducibility hinges on tracking not just code, but the exact datasets, derived features, trained artifacts, and evaluation outputs...

DVC PipelinesExperiment TrackingReproducible ML

Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations

Venelin Valkov · 3 min read

LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it,...

LangChain MemoryConversation Buffer MemorySummary Buffer Memory

Local Gemma 4 with OpenCode & llama.cpp | Build a Local RAG with LangChain | 🔴 Live

Venelin Valkov · 3 min read

A local RAG app built around Gemma 4 can work surprisingly well on a single machine—but getting reliable retrieval depends less on the chat model and...

Local RAGGemma 4llama.cpp

Mamba vs. Transformers: The Future of LLMs? | Paper Overview & Google Colab Code & Mamba Chat

Venelin Valkov · 3 min read

Mamba’s core pitch is a way to make large language models handle much longer inputs without paying Transformers’ usual attention cost. Transformers...

Mamba ArchitectureSelective State SpacesLong-Context LLMs

100% Local CAG with Qwen3, Ollama and LangChain - AI Chatbot for Your Private Documents

Venelin Valkov · 3 min read

Cache-augmented generation (CAG) is presented as a simpler alternative to retrieval-augmented generation (RAG) for private-document chat: instead of...

Cache Augmented GenerationPrompt CachingLong-Context Comprehension

100% Local PDF OCR with Docling and Ollama | PDF to Markdown with VLM (Nanonets-OCR-s)

Venelin Valkov · 2 min read

A local, fully self-hosted pipeline can convert PDFs into Markdown by swapping out traditional OCR for a visual language model—specifically Docling...

DoclingOllamaPDF OCR

OpenLLaMA: Open-Source Reproduction of Meta AI's LLaMA for Commercial Use. Run in Google Colab.

Venelin Valkov · 2 min read

OpenLLaMA (a 7B-parameter, open-source LLaMA-style model) can be run in Google Colab using Hugging Face Transformers, but getting usable text depends...

OpenLLaMAHugging Face TransformersTop-k Sampling

Pydantic AI Tutorial: Build Agents to Analyze Mobile App Reviews in Python

Venelin Valkov · 2 min read

A practical agent workflow can turn stored mobile app reviews into a structured product brief—complete with improvement themes, marketing-ready...

Agentic ApplicationsPydantic AISQL Review Mining

Build 100% Local Chatbot with Gemma 3, Ollama and LangChain | AI Assistant with Memory and Tool Use

Venelin Valkov · 3 min read

A fully local chatbot can now keep both conversation history and long-term “memories” across separate chats—without sending data to a hosted service....

Local ChatbotMemory RetrievalTool Calling

DeepSeek R1 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG

Venelin Valkov · 3 min read

DeepSeek R1 and R10 are reasoning-focused large language models trained with a multi-stage process that aims to fix early problems like endless...

DeepSeek R1 TrainingOllama Local TestingReasoning Tags

LLM Function Calling (Tool Use) with Llama 3 | Tool Choice, Argument Mapping, Groq Llama 3 Tool Use

Venelin Valkov · 3 min read

Function calling with Llama 3 is no longer a niche capability: a Groq-tuned “Llama 3 tool use” model can reliably translate natural-language requests...

Function CallingTool UseLlama 3

Loaders, Indexes & Vectorstores in LangChain: Question Answering on PDF files with ChatGPT

Venelin Valkov · 3 min read

A practical LangChain pipeline for turning PDFs, YouTube transcripts, and plain text into question-answering over embeddings is the core takeaway—and...

LangChain LoadersVector StoresEmbeddings

Generative Agents: Simulating Human Behavior with ChatGPT

Venelin Valkov · 2 min read

Generative agents built on ChatGPT can simulate believable, goal-driven human behavior inside a small virtual town—without hand-scripting every...

Generative AgentsChatGPTAgent Memory

DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama

Venelin Valkov · 3 min read

DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with...

Local Document ChatDeepSeek-R1Ollama

Local Llama 3.2 (3B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling

Venelin Valkov · 2 min read

A 3B quantized Llama 3.2 model running locally through Ollama delivers fast, usable results for structured data extraction—especially when...

Local Llama 3.2Ollama InferenceStructured Data Extraction

DeepSeek Coder v2: First Open Coding Model that Beats GPT-4 Turbo?

Venelin Valkov · 3 min read

DeepSeek Coder V2 is pitched as an open coding model that can rival—or even beat—GPT-4 Turbo on programming benchmarks, and the practical tests in...

DeepSeek Coder V2Open Coding ModelsMixture of Experts

Why Your RAG Gives Wrong Answers (And 4 Chunking Strategies to Fix It) | LangChain Text Splitters

Venelin Valkov · 3 min read

RAG systems often fail for a surprisingly mundane reason: chunking breaks the information the model needs, even when embeddings, vector search, and...

RAG ChunkingLangChain Text SplittersSemantic Chunking

DeepSeek v3 Tested - Coding, Data Extraction, Summarization, Data Labelling, RAG

Venelin Valkov · 3 min read

DeepSeek V3 is positioned as a top-tier open-weight mixture-of-experts (MoE) model—strong on benchmarks and notably effective at real-world...

Mixture of ExpertsFP8 TrainingChain-of-Thought Post-Training

Gemma 3n: Open Multimodal Model by Google (Image, Audio, Video & Text) | Install and Test

Venelin Valkov · 3 min read

Google’s Gemma 3n (Geometry N in the transcript) is positioned as an open, mobile-targeted multimodal model that can take in text plus images, audio,...

Gemma 3nMultimodal InferenceHugging Face Transformers

Getting started with PyTorch Lightning for Deep Learning

Venelin Valkov · 3 min read

PyTorch Lightning is positioned as a way to train deep learning models with PyTorch while cutting out much of the repetitive “boilerplate” code. The...

PyTorch LightningGoEmotionsMulti-Label Classification

100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow

Venelin Valkov · 3 min read

A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis...

Local AI AgentsReddit Data FetchingLangGraph Workflows

Local Qwen 2.5 (14B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling

Venelin Valkov · 2 min read

Qwen 2.5 14B running locally through Ollama (via an Ollama server) delivers a noticeable jump in text-heavy tasks—especially sentiment/topic labeling...

Local OllamaQwen 2.5Structured Text Extraction

GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition

Venelin Valkov · 2 min read

GM-OCR is a two-stage OCR system that combines document layout analysis with character-level recognition, and it’s drawing attention because it...

GM-OCRDocument Layout AnalysisTable Recognition

StableVicuna: The Best Open Source Local ChatGPT? LLM based on Vicuna and LLaMa.

Venelin Valkov · 2 min read

Stability AI’s open-source chatbot model, StableVicuna, is positioned as a strong “local ChatGPT” alternative—especially because it can be run in a...

StableVicunaLocal LLMModel Quantization

Build Web Scraper with Llama 3.1 | Get Structured Data By Scraping Web Content With AI

Venelin Valkov · 3 min read

A practical pipeline for turning messy, JavaScript-heavy web pages into clean structured data is built by combining Playwright for rendering,...

Web ScrapingPlaywright RenderingHTML to Markdown

Gemini CLI + MCP Tools Deep Dive - Build a Completely Local RAG with Ollama | Context7, NextJS

Venelin Valkov · 3 min read

Gemini CLI can be paired with an MCP server (Context7) to generate and run a fully local RAG-style “chat with your files” web app—complete with...

Gemini CLIContext7 MCPLocal RAG

OuteTTS 0.3 - Local TTS and Voice Cloning

Venelin Valkov · 2 min read

OuteTTS 0.3 is a local, Apache 2.0–licensed text-to-speech system that also supports voice cloning, letting users generate speech in multiple...

Local TTSVoice CloningMulti-Language Speakers

Grok - LLM by Elon Musk & xAI | Overview, Tech Stack, PromptIDE and Sample Prompts

Venelin Valkov · 3 min read

Grok’s biggest differentiator is its claim of real-time knowledge drawn from the X platform, paired with a new “PromptIDE” tool aimed at making...

Grok OverviewxAI MissionPromptIDE

Build AI Agent Application with Agent Development Kit (ADK) | Get Started with Google's Agent SDK

Venelin Valkov · 3 min read

Google’s Agent Development Kit (ADK) is positioned as a practical way to build agentic applications with a clear workflow structure, built-in...

Agent Development KitTool CallingGemini 2.0 Flash

Automated Prompt Engineering with DSPy | Prompt Optimization for Financial News Semantic Analysis

Venelin Valkov · 2 min read

Prompt optimization can materially improve sentiment extraction from financial news without retraining a model—DSPy’s prompt optimizer boosted...

DSPy Prompt OptimizationFinancial News Sentiment AnalysisMIPROv2

Linear Regression with TensorFlow.js

Venelin Valkov · 2 min read

Linear regression in TensorFlow.js is built to learn the parameters of a straight-line (or hyperplane) relationship between house features and...

Linear RegressionTensorFlow.jsMultiple Linear Regression

MemGPT - Unlimited Context Window (Memory) for LLMs | Paper review, Installation & Demo

Venelin Valkov · 3 min read

MemGPT targets a core bottleneck in today’s large language models: limited context windows that force earlier parts of a conversation or large...

Virtual Context ManagementExternal MemoryTransformer Context Limits

Can GPT-4o's Memory Replace RAG Systems? Exploring Large Context Windows

Venelin Valkov · 3 min read

GPT-4o’s ability to retrieve information from extremely long prompts looks strong enough to challenge the usual need for retrieval-augmented...

RAG vs Long ContextGPT-4o MemoryNeedle in a Needle Stack

Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & Demo

Venelin Valkov · 2 min read

Mistral AI’s Mixtral 8×7B (an open-weight sparse Mixture of Experts model) is positioned as a practical alternative to much larger LLMs by routing...

Mixture of ExpertsSparse RoutingInstruction Tuning

GPT-4o API Deep Dive: Text Generation, Streaming, Vision, and Function Calling

Venelin Valkov · 3 min read

GPT-4o’s API is positioned as a drop-in upgrade for building faster, more capable AI apps—especially when you need streaming, structured JSON...

GPT-4o APIStreaming ResponsesJSON Output

Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset

Venelin Valkov · 3 min read

Choosing an LLM for a real project often fails when teams rely on classical ML metrics like accuracy, F1, or regression error. Those metrics assume...

LLM EvaluationRAG MetricsAI-as-Judge

Use Any LLM Provider with LiteLLM | Use ChatGPT, Claude, Gemini, Ollama with One API

Venelin Valkov · 2 min read

Switching between large language model (LLM) providers can break production systems when code depends on a single vendor’s SDK. LiteLLM is presented...

LLM Provider AbstractionStructured OutputsPydantic Validation

AI Agents with LangGraph & Llama 3 | Control the Execution Flow and State of Your Agent Apps

Venelin Valkov · 2 min read

LangGraph is positioned as a way to control both the execution order and the evolving state of agentic applications—down to loops, branching, and...

LangGraphAgent StateTool Schemas

Build an AI Social Media Content Generator in 20 Minutes | AI Agents with LangGraph and Llama 3.1

Venelin Valkov · 2 min read

A LangGraph-based agent loop can turn technical input into platform-ready social posts for both Twitter and LinkedIn—while iterating through multiple...

LangGraph AgentsSocial Media GenerationTwitter vs LinkedIn Prompts

Build Private Chatbot wtih LangChain, Ollama and Qwen 2.5 | Local AI App with Private LLM

Venelin Valkov · 3 min read

A fully local “private chatbot” workflow can be built by combining LangChain’s message orchestration (via LangGraph), Ollama for on-device model...

Local AI ChatbotLangChainLangGraph

OCRFlux (3B) - Local OCR AI Model Test | Turn PDFs into Markdown

Venelin Valkov · 2 min read

OCRFlux (3B) is a 3B-parameter visual-language OCR fine-tune aimed at turning document images (including PDFs) into structured Markdown. In local...

OCR to MarkdownTable ExtractionLocal Model Inference

Build Production-Ready Retrieval RAG Pipeline in LangChain | Hybrid Search (BM25), Re-ranking & HyDE

Venelin Valkov · 2 min read

A production-ready RAG pipeline needs more than embeddings: it must reliably fetch the right chunks, even when users ask for exact numbers. A simple...

RAG PipelinesHybrid SearchBM25

Advanced AI Agents with LangGraph and Llama 3.1 | Analyze Bitcoin, Ethereum and Solana Markets

Venelin Valkov · 3 min read

An AI agent workflow built with LangGraph can generate cryptocurrency market reports by combining three streams of evidence: cached historical price...

LangGraph AgentsCrypto Market AnalysisOpenBB Data

DeepSeek R1 0528 - Better Coding & Tool Calling | Is It Faster Now?

Venelin Valkov · 3 min read

DeepSeek R1 0528’s update centers on making the model more usable for real-world coding agents by adding support for JSON output and function...

DeepSeek R1 0528Tool CallingJSON Output

Segment Anything by Meta Research: Image Segmentation with the Largest Dataset and Model Yet!

Venelin Valkov · 3 min read

Meta’s Segment Anything (SAM) is built to turn image segmentation into a “promptable” task: users can click, draw boxes, or provide text-like prompts...

Promptable SegmentationSegment AnythingSA-1B Dataset

Build Smarter AI Apps: Memory, Tools, Retrieval & Structured Output with Python, Pydantic & Ollama

Venelin Valkov · 3 min read

AI apps become meaningfully more useful when they’re given four upgrades beyond plain text prompting: memory, structured outputs, tool use, and...

MemoryStructured OutputTool Use

FLUX.1 Kontext [dev] Local Test - Image Generation and Edit with HuggingFace (Open Weights Model)

Venelin Valkov · 2 min read

Black Forest WS’s FLUX.1 Context Dev (open weights) is proving it can do more than image editing: it can also generate photorealistic images from...

FLUX.1 Context DevImage EditingText-to-Image

Is RAG Dead in 2026? | Build Local RAG from First Principles

Venelin Valkov · 3 min read

Retrieval-Augmented Generation (RAG) is still considered necessary in 2026—not because large language models can’t answer, but because they often...

RAG ArchitectureLocal RAGTF-IDF Retrieval

Getting Started with LangGraph | Build Local Agentic Workflows and AI Agents with Ollama

Venelin Valkov · 2 min read

LangGraph is presented as a practical way to turn brittle, demo-only AI prototypes into maintainable agentic systems by replacing nested if/else...

LangGraph WorkflowsAI AgentsState Graphs

Gemma 4 Local OCR Test with llama.cpp | How Accurate It Is for PDF Document Understanding (🔴 Live)

Venelin Valkov · 3 min read

Gemma 4 can perform surprisingly strong document understanding for local OCR-style extraction—especially when the goal is to recover layout and...

Local OCRGemma 4llama.cpp