Venelin Valkov — Person Summaries
AI-powered summaries of 66 videos about Venelin Valkov.
66 summaries
Fine-Tuning Llama 3 on a Custom Dataset: Training LLM for a RAG Q&A Use Case on a Single GPU
Fine-tuning Meta’s Llama 3 8B Instruct on a domain-specific Q&A dataset can be done on a single GPU by combining 4-bit quantization with a LoRA-style...
Chat With Your Database! Build a Local SQL AI Agent to Query Databases (LangChain & Ollama)
A fully local “chat with your database” agent can translate natural-language questions into SQL, run the queries against a local SQLite database, and...
Private GPT4All : Chat with PDF with Local & Free LLM using GPT4All, LangChain & HuggingFace
Running a local, privacy-friendly “chat with your PDF” pipeline is practical with GPT4All—provided the workflow is built around retrieval (embeddings...
SQL AI Agents: Analyze Relational Databases with Natural Language using Llama 3 (LLM) and CrewAI
AI agents can turn natural-language questions into SQL queries, pull results from a relational database, and then generate a readable analysis and...
MCP Complete Tutorial - Connect Local AI Agent (Ollama) to Tools with MCP Server and Client
Model Context Protocol (MCP) is positioned as a standardized way to connect AI models to external tools and data without hand-building bespoke...
Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama
A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and...
Fine-tuning Alpaca: Train Alpaca LoRa for Sentiment Analysis on a Custom Dataset
Fine-tuning Llama 7B with LoRA on a custom Bitcoin-tweet sentiment dataset can produce a practical sentiment classifier that labels new tweets as...
Local RAG with Llama 3.1 for PDFs | Private Chat with Your Documents using LangChain & Streamlit
A fully local “chat with your PDFs” system can be built using open models and self-hosted infrastructure, with responses grounded in retrieved...
Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU
Fine-tuning a “tiny” LLM on a custom dataset can deliver strong sentiment and topic predictions using a single GPU—provided the training setup is...
Build a Private Chatbot with Local LLM (Falcon 7B) and LangChain
A practical recipe for running a private chatbot on a single GPU hinges on two engineering moves: loading Falcon 7B instruct in 8-bit to fit within...
Fine-tuning Llama 3.2 on Your Data with a single GPU | Training LLM for Sentiment Analysis
Fine-tuning Llama 3.2 (1B) for sentiment classification on a custom mental-health dataset can jump accuracy from roughly 30% to nearly 85% using a...
Vectorless RAG - Local Financial RAG Without Vector Database | Tree-Based Indexing with Ollama
Vectorless RAG can retrieve and answer questions from structured documents without any vector database by building a tree index from the document’s...
LLM JSON Output - Get Valid JSON with Pydantic and LangChain Output Parsers
Getting reliable JSON from large language models—especially ones that don’t natively support structured outputs—requires more than “please output...
Deploy LayoutLMv3 for Document Classification using Streamlit, Transformers and HuggingFace Spaces
A Streamlit web app is built to classify document images using a fine-tuned LayoutLMv3 model, then deployed to Hugging Face Spaces so anyone can...
Convert Any Document To LLM Knowledge with Docling & Ollama (100% Local) | PDF to Markdown Pipeline
Building a reliable, fully local knowledge base from PDFs hinges on turning messy layouts—especially tables and charts—into structured Markdown that...
Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search
Sentence Transformers (SBERT) turn sentences into fixed-length embeddings and then use cosine similarity to score semantic closeness—making it...
Reproducible Machine Learning & Experiment Tracking Pipeline with Python and DVC
Data and model reproducibility hinges on tracking not just code, but the exact datasets, derived features, trained artifacts, and evaluation outputs...
Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations
LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it,...
Local Gemma 4 with OpenCode & llama.cpp | Build a Local RAG with LangChain | 🔴 Live
A local RAG app built around Gemma 4 can work surprisingly well on a single machine—but getting reliable retrieval depends less on the chat model and...
Mamba vs. Transformers: The Future of LLMs? | Paper Overview & Google Colab Code & Mamba Chat
Mamba’s core pitch is a way to make large language models handle much longer inputs without paying Transformers’ usual attention cost. Transformers...
100% Local CAG with Qwen3, Ollama and LangChain - AI Chatbot for Your Private Documents
Cache-augmented generation (CAG) is presented as a simpler alternative to retrieval-augmented generation (RAG) for private-document chat: instead of...
100% Local PDF OCR with Docling and Ollama | PDF to Markdown with VLM (Nanonets-OCR-s)
A local, fully self-hosted pipeline can convert PDFs into Markdown by swapping out traditional OCR for a visual language model—specifically Docling...
OpenLLaMA: Open-Source Reproduction of Meta AI's LLaMA for Commercial Use. Run in Google Colab.
OpenLLaMA (a 7B-parameter, open-source LLaMA-style model) can be run in Google Colab using Hugging Face Transformers, but getting usable text depends...
Pydantic AI Tutorial: Build Agents to Analyze Mobile App Reviews in Python
A practical agent workflow can turn stored mobile app reviews into a structured product brief—complete with improvement themes, marketing-ready...
Build 100% Local Chatbot with Gemma 3, Ollama and LangChain | AI Assistant with Memory and Tool Use
A fully local chatbot can now keep both conversation history and long-term “memories” across separate chats—without sending data to a hosted service....
DeepSeek R1 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG
DeepSeek R1 and R10 are reasoning-focused large language models trained with a multi-stage process that aims to fix early problems like endless...
LLM Function Calling (Tool Use) with Llama 3 | Tool Choice, Argument Mapping, Groq Llama 3 Tool Use
Function calling with Llama 3 is no longer a niche capability: a Groq-tuned “Llama 3 tool use” model can reliably translate natural-language requests...
Loaders, Indexes & Vectorstores in LangChain: Question Answering on PDF files with ChatGPT
A practical LangChain pipeline for turning PDFs, YouTube transcripts, and plain text into question-answering over embeddings is the core takeaway—and...
Generative Agents: Simulating Human Behavior with ChatGPT
Generative agents built on ChatGPT can simulate believable, goal-driven human behavior inside a small virtual town—without hand-scripting every...
DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama
DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with...
Local Llama 3.2 (3B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling
A 3B quantized Llama 3.2 model running locally through Ollama delivers fast, usable results for structured data extraction—especially when...
DeepSeek Coder v2: First Open Coding Model that Beats GPT-4 Turbo?
DeepSeek Coder V2 is pitched as an open coding model that can rival—or even beat—GPT-4 Turbo on programming benchmarks, and the practical tests in...
Why Your RAG Gives Wrong Answers (And 4 Chunking Strategies to Fix It) | LangChain Text Splitters
RAG systems often fail for a surprisingly mundane reason: chunking breaks the information the model needs, even when embeddings, vector search, and...
DeepSeek v3 Tested - Coding, Data Extraction, Summarization, Data Labelling, RAG
DeepSeek V3 is positioned as a top-tier open-weight mixture-of-experts (MoE) model—strong on benchmarks and notably effective at real-world...
Gemma 3n: Open Multimodal Model by Google (Image, Audio, Video & Text) | Install and Test
Google’s Gemma 3n (Geometry N in the transcript) is positioned as an open, mobile-targeted multimodal model that can take in text plus images, audio,...
Getting started with PyTorch Lightning for Deep Learning
PyTorch Lightning is positioned as a way to train deep learning models with PyTorch while cutting out much of the repetitive “boilerplate” code. The...
100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow
A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis...
Local Qwen 2.5 (14B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling
Qwen 2.5 14B running locally through Ollama (via an Ollama server) delivers a noticeable jump in text-heavy tasks—especially sentiment/topic labeling...
GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition
GM-OCR is a two-stage OCR system that combines document layout analysis with character-level recognition, and it’s drawing attention because it...
StableVicuna: The Best Open Source Local ChatGPT? LLM based on Vicuna and LLaMa.
Stability AI’s open-source chatbot model, StableVicuna, is positioned as a strong “local ChatGPT” alternative—especially because it can be run in a...
Build Web Scraper with Llama 3.1 | Get Structured Data By Scraping Web Content With AI
A practical pipeline for turning messy, JavaScript-heavy web pages into clean structured data is built by combining Playwright for rendering,...
Gemini CLI + MCP Tools Deep Dive - Build a Completely Local RAG with Ollama | Context7, NextJS
Gemini CLI can be paired with an MCP server (Context7) to generate and run a fully local RAG-style “chat with your files” web app—complete with...
OuteTTS 0.3 - Local TTS and Voice Cloning
OuteTTS 0.3 is a local, Apache 2.0–licensed text-to-speech system that also supports voice cloning, letting users generate speech in multiple...
Grok - LLM by Elon Musk & xAI | Overview, Tech Stack, PromptIDE and Sample Prompts
Grok’s biggest differentiator is its claim of real-time knowledge drawn from the X platform, paired with a new “PromptIDE” tool aimed at making...
Build AI Agent Application with Agent Development Kit (ADK) | Get Started with Google's Agent SDK
Google’s Agent Development Kit (ADK) is positioned as a practical way to build agentic applications with a clear workflow structure, built-in...
Automated Prompt Engineering with DSPy | Prompt Optimization for Financial News Semantic Analysis
Prompt optimization can materially improve sentiment extraction from financial news without retraining a model—DSPy’s prompt optimizer boosted...
Linear Regression with TensorFlow.js
Linear regression in TensorFlow.js is built to learn the parameters of a straight-line (or hyperplane) relationship between house features and...
MemGPT - Unlimited Context Window (Memory) for LLMs | Paper review, Installation & Demo
MemGPT targets a core bottleneck in today’s large language models: limited context windows that force earlier parts of a conversation or large...
Can GPT-4o's Memory Replace RAG Systems? Exploring Large Context Windows
GPT-4o’s ability to retrieve information from extremely long prompts looks strong enough to challenge the usual need for retrieval-augmented...
Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & Demo
Mistral AI’s Mixtral 8×7B (an open-weight sparse Mixture of Experts model) is positioned as a practical alternative to much larger LLMs by routing...
GPT-4o API Deep Dive: Text Generation, Streaming, Vision, and Function Calling
GPT-4o’s API is positioned as a drop-in upgrade for building faster, more capable AI apps—especially when you need streaming, structured JSON...
Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset
Choosing an LLM for a real project often fails when teams rely on classical ML metrics like accuracy, F1, or regression error. Those metrics assume...
Use Any LLM Provider with LiteLLM | Use ChatGPT, Claude, Gemini, Ollama with One API
Switching between large language model (LLM) providers can break production systems when code depends on a single vendor’s SDK. LiteLLM is presented...
AI Agents with LangGraph & Llama 3 | Control the Execution Flow and State of Your Agent Apps
LangGraph is positioned as a way to control both the execution order and the evolving state of agentic applications—down to loops, branching, and...
Build an AI Social Media Content Generator in 20 Minutes | AI Agents with LangGraph and Llama 3.1
A LangGraph-based agent loop can turn technical input into platform-ready social posts for both Twitter and LinkedIn—while iterating through multiple...
Build Private Chatbot wtih LangChain, Ollama and Qwen 2.5 | Local AI App with Private LLM
A fully local “private chatbot” workflow can be built by combining LangChain’s message orchestration (via LangGraph), Ollama for on-device model...
OCRFlux (3B) - Local OCR AI Model Test | Turn PDFs into Markdown
OCRFlux (3B) is a 3B-parameter visual-language OCR fine-tune aimed at turning document images (including PDFs) into structured Markdown. In local...
Build Production-Ready Retrieval RAG Pipeline in LangChain | Hybrid Search (BM25), Re-ranking & HyDE
A production-ready RAG pipeline needs more than embeddings: it must reliably fetch the right chunks, even when users ask for exact numbers. A simple...
Advanced AI Agents with LangGraph and Llama 3.1 | Analyze Bitcoin, Ethereum and Solana Markets
An AI agent workflow built with LangGraph can generate cryptocurrency market reports by combining three streams of evidence: cached historical price...
DeepSeek R1 0528 - Better Coding & Tool Calling | Is It Faster Now?
DeepSeek R1 0528’s update centers on making the model more usable for real-world coding agents by adding support for JSON output and function...
Segment Anything by Meta Research: Image Segmentation with the Largest Dataset and Model Yet!
Meta’s Segment Anything (SAM) is built to turn image segmentation into a “promptable” task: users can click, draw boxes, or provide text-like prompts...
Build Smarter AI Apps: Memory, Tools, Retrieval & Structured Output with Python, Pydantic & Ollama
AI apps become meaningfully more useful when they’re given four upgrades beyond plain text prompting: memory, structured outputs, tool use, and...
FLUX.1 Kontext [dev] Local Test - Image Generation and Edit with HuggingFace (Open Weights Model)
Black Forest WS’s FLUX.1 Context Dev (open weights) is proving it can do more than image editing: it can also generate photorealistic images from...
Is RAG Dead in 2026? | Build Local RAG from First Principles
Retrieval-Augmented Generation (RAG) is still considered necessary in 2026—not because large language models can’t answer, but because they often...
Getting Started with LangGraph | Build Local Agentic Workflows and AI Agents with Ollama
LangGraph is presented as a practical way to turn brittle, demo-only AI prototypes into maintainable agentic systems by replacing nested if/else...
Gemma 4 Local OCR Test with llama.cpp | How Accurate It Is for PDF Document Understanding (🔴 Live)
Gemma 4 can perform surprisingly strong document understanding for local OCR-style extraction—especially when the goal is to recover layout and...