Venelin Valkov — Channel Summaries
AI-powered summaries of 131 videos about Venelin Valkov.
131 summaries
Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU
Fine-tuning Llama 2 on a task-specific dataset can dramatically improve how well a small “base” model produces structured, useful outputs—especially...
Time Series Prediction with LSTMs using TensorFlow 2 and Keras in Python
Time series forecasting with LSTMs hinges on treating past observations as a sequence, not as independent data points—and the practical payoff is a...
Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset
Fine-tuning Falcon 7B on a single GPU is practical even with a tiny, FAQ-style dataset—using QLoRA to train only a small slice of parameters. The...
Fine-Tuning Llama 3 on a Custom Dataset: Training LLM for a RAG Q&A Use Case on a Single GPU
Fine-tuning Meta’s Llama 3 8B Instruct on a domain-specific Q&A dataset can be done on a single GPU by combining 4-bit quantization with a LoRA-style...
Chat With Your Database! Build a Local SQL AI Agent to Query Databases (LangChain & Ollama)
A fully local “chat with your database” agent can translate natural-language questions into SQL, run the queries against a local SQLite database, and...
Private GPT4All : Chat with PDF with Local & Free LLM using GPT4All, LangChain & HuggingFace
Running a local, privacy-friendly “chat with your PDF” pipeline is practical with GPT4All—provided the workflow is built around retrieval (embeddings...
Intent Recognition with BERT using Keras and TensorFlow 2 in Python | Text Classification Tutorial
Fine-tuning a pre-trained BERT model for intent recognition on a seven-class dataset can deliver near-saturating accuracy—about 97% on a held-out...
Build an AI Document (PDF, DOC, XML) Processing Pipeline for RAG | Docling, OCR, Chunking, Images
Turning messy PDFs into reliable knowledge for RAG hinges on more than OCR. The core takeaway is a three-stage, fully local pipeline that converts a...
SQL AI Agents: Analyze Relational Databases with Natural Language using Llama 3 (LLM) and CrewAI
AI agents can turn natural-language questions into SQL queries, pull results from a relational database, and then generate a readable analysis and...
MCP Complete Tutorial - Connect Local AI Agent (Ollama) to Tools with MCP Server and Client
Model Context Protocol (MCP) is positioned as a standardized way to connect AI models to external tools and data without hand-building bespoke...
Visual Debugger for Jupyter Lab/IPython Notebooks | Installation, Code Examples & Debugging
A new visual debugger for JupyterLab/IPython notebooks adds interactive, breakpoint-based debugging directly in the notebook UI—complete with...
Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama
A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and...
100% Local RAG with DeepSeek-R1, Ollama and LangChain - Build Document AI for Your Private Files
A practical way to make local RAG work reliably on long documents is to retrieve the right text chunks—then feed only those chunks (plus chat...
Getting Started with LangChain and Llama 2 in 15 Minutes | Beginner's Guide to LangChain
LangChain’s core value is turning large language models like Llama 2 into systems that can pull in outside information and take actions—by chaining...
Create Custom Dataset for Question Answering with T5 using HuggingFace, Pytorch Lightning & PyTorch
Fine-tuning T5 for question answering starts with turning BioASQ biomedical QA files into a model-ready dataset: each training example becomes a...
Mastery List GPT: Chat with your ToDO List | Time Management and Habits with ChatGPT and LangChain
A Streamlit app can turn a simple habit spreadsheet into a working daily schedule—and then let users revise it through chat—by combining...
Customer Support Chatbot using Custom Knowledge Base with LangChain and Private LLM
A practical blueprint for building a customer-support chatbot from a custom knowledge base hinges on one design choice: retrieve the most relevant...
Advanced RAG with Llama 3 in Langchain | Chat with PDF using Free Embeddings, Reranker & LlamaParse
Building a high-quality “chat with your PDF” system hinges less on the language model and more on the pipeline around it: parsing complex documents...
Fine-tuning Alpaca: Train Alpaca LoRa for Sentiment Analysis on a Custom Dataset
Fine-tuning Llama 7B with LoRA on a custom Bitcoin-tweet sentiment dataset can produce a practical sentiment classifier that labels new tweets as...
Local RAG with Llama 3.1 for PDFs | Private Chat with Your Documents using LangChain & Streamlit
A fully local “chat with your PDFs” system can be built using open models and self-hosted infrastructure, with responses grounded in retrieved...
Deploy Your Private Llama 2 Model to Production with Text Generation Inference and RunPod
Deploying a private Llama 2–style model into production is practical on a single GPU when Text Generation Inference (TGI) is used as the serving...
Analyze Custom CSV Data with GPT-4 using Langchain
A LangChain “CSV agent” can turn a custom Bitcoin price spreadsheet into a question-answering system that writes and runs pandas code on the fly—then...
Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU
Fine-tuning a “tiny” LLM on a custom dataset can deliver strong sentiment and topic predictions using a single GPU—provided the training setup is...
Build a Private Chatbot with Local LLM (Falcon 7B) and LangChain
A practical recipe for running a private chatbot on a single GPU hinges on two engineering moves: loading Falcon 7B instruct in 8-bit to fit within...
Fine-tuning Llama 3.2 on Your Data with a single GPU | Training LLM for Sentiment Analysis
Fine-tuning Llama 3.2 (1B) for sentiment classification on a custom mental-health dataset can jump accuracy from roughly 30% to nearly 85% using a...
100% Free Claude Code | Run Claude Code with Local LLM with Ollama and Qwen 3.5
Running Claude Code locally with an Ollama-backed Qwen model can deliver practical coding assistance—especially when the task is narrowly scoped to...
QLoRA: Efficient Finetuning of Large Language Models on a Single GPU? LoRA & QLoRA paper review
QLoRA (4-bit QLoRA) makes it practical to fine-tune very large language models on a single consumer-style GPU by combining three ideas: LoRA-style...
Faster LLM Inference: Speeding up Falcon 7b (with QLoRA adapter) Prediction Time
Fine-tuned Falcon 7B inference speed can be cut dramatically by changing how the model is loaded—especially by running the quantized model in 8-bit....
Vectorless RAG - Local Financial RAG Without Vector Database | Tree-Based Indexing with Ollama
Vectorless RAG can retrieve and answer questions from structured documents without any vector database by building a tree index from the document’s...
Dolly 2.0: Free ChatGPT-like Model for Commercial Use
Dolly 2.0 is being released as a genuinely commercial-friendly, open instruction-tuned language model—complete with training code, dataset, and model...
Boost Your AI Predictions: Maximize Speed with vLLM Library for Large Language Model Inference
vLLM is positioned as a practical way to speed up large language model inference by boosting throughput—often by several multiples—without changing...
LLM JSON Output - Get Valid JSON with Pydantic and LangChain Output Parsers
Getting reliable JSON from large language models—especially ones that don’t natively support structured outputs—requires more than “please output...
Deploy LayoutLMv3 for Document Classification using Streamlit, Transformers and HuggingFace Spaces
A Streamlit web app is built to classify document images using a fine-tuned LayoutLMv3 model, then deployed to Hugging Face Spaces so anyone can...
Convert Any Document To LLM Knowledge with Docling & Ollama (100% Local) | PDF to Markdown Pipeline
Building a reliable, fully local knowledge base from PDFs hinges on turning messy layouts—especially tables and charts—into structured Markdown that...
Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search
Sentence Transformers (SBERT) turn sentences into fixed-length embeddings and then use cosine similarity to score semantic closeness—making it...
Getting Started with TensorFlow.js | Deep Learning for JavaScript Hackers (Part 0)
TensorFlow.js is positioned as a way to run machine-learning workflows directly in JavaScript—either in the browser or in Node.js—by bridging...
Reproducible Machine Learning & Experiment Tracking Pipeline with Python and DVC
Data and model reproducibility hinges on tracking not just code, but the exact datasets, derived features, trained artifacts, and evaluation outputs...
Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations
LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it,...
DeepSeek Coder: AI Writes Code | Free LLM For Code Generation Beats ChatGPT, ChatDev & Code Llama
DeepSeek Coder is an open-source code-focused language model from DeepSeek AI that’s trained heavily on programming data and tuned to follow coding...
HuggingGPT & JARVIS: "Advanced Artificial Intelligence" with ChatGPT and HuggingFace
HuggingGPT reframes “advanced AI” as orchestration: a large language model like ChatGPT (or GPT-4) can act as a controller that plans which...
Local Gemma 4 with OpenCode & llama.cpp | Build a Local RAG with LangChain | 🔴 Live
A local RAG app built around Gemma 4 can work surprisingly well on a single machine—but getting reliable retrieval depends less on the chat model and...
Mamba vs. Transformers: The Future of LLMs? | Paper Overview & Google Colab Code & Mamba Chat
Mamba’s core pitch is a way to make large language models handle much longer inputs without paying Transformers’ usual attention cost. Transformers...
100% Local CAG with Qwen3, Ollama and LangChain - AI Chatbot for Your Private Documents
Cache-augmented generation (CAG) is presented as a simpler alternative to retrieval-augmented generation (RAG) for private-document chat: instead of...
100% Local PDF OCR with Docling and Ollama | PDF to Markdown with VLM (Nanonets-OCR-s)
A local, fully self-hosted pipeline can convert PDFs into Markdown by swapping out traditional OCR for a visual language model—specifically Docling...
OpenLLaMA: Open-Source Reproduction of Meta AI's LLaMA for Commercial Use. Run in Google Colab.
OpenLLaMA (a 7B-parameter, open-source LLaMA-style model) can be run in Google Colab using Hugging Face Transformers, but getting usable text depends...
Pydantic AI Tutorial: Build Agents to Analyze Mobile App Reviews in Python
A practical agent workflow can turn stored mobile app reviews into a structured product brief—complete with improvement themes, marketing-ready...
Build 100% Local Chatbot with Gemma 3, Ollama and LangChain | AI Assistant with Memory and Tool Use
A fully local chatbot can now keep both conversation history and long-term “memories” across separate chats—without sending data to a hosted service....
Grok-1 Open Source: 314B Mixture-of-Experts Model by xAI | Blog post, GitHub/Source Code
xAI has open-sourced Gro-1, a 314B-parameter mixture-of-experts (MoE) model, releasing not only weights but also the model architecture and training...
DeepSeek R1 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG
DeepSeek R1 and R10 are reasoning-focused large language models trained with a multi-stage process that aims to fix early problems like endless...
Build 100% Local AI Agent to Chat with Your Files | Private AI Knowledge Base with MCP & RAG
A fully local “private knowledge base” agent can chat with a user’s own files by combining a custom MCP tool server with retrieval-augmented...
Build a custom dataset with LightningDataModule in PyTorch Lightning
A practical path to text classification in PyTorch Lightning starts with turning the multi-annotator GoEmotions dataset into one clean label per...
LLM Function Calling (Tool Use) with Llama 3 | Tool Choice, Argument Mapping, Groq Llama 3 Tool Use
Function calling with Llama 3 is no longer a niche capability: a Groq-tuned “Llama 3 tool use” model can reliably translate natural-language requests...
Loaders, Indexes & Vectorstores in LangChain: Question Answering on PDF files with ChatGPT
A practical LangChain pipeline for turning PDFs, YouTube transcripts, and plain text into question-answering over embeddings is the core takeaway—and...
Train Deep Learning Model with PyTorch Lightning - TensorBoard, Learning rate finder and Checkpoints
Fine-tuning an ELECTRA-based emotion classifier in PyTorch Lightning gets a major boost from two training “plumbing” upgrades: automatically finding...
Generative Agents: Simulating Human Behavior with ChatGPT
Generative agents built on ChatGPT can simulate believable, goal-driven human behavior inside a small virtual town—without hand-scripting every...
DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama
DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with...
Llama 3.3 70B Test - Coding, Data Extraction, Summarization, Data Labelling, RAG
Meta’s Llama 3.3 70B is landing as a strong all-around text model, with independent evaluations and hands-on tests pointing to performance that...
Local Llama 3.2 (3B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling
A 3B quantized Llama 3.2 model running locally through Ollama delivers fast, usable results for structured data extraction—especially when...
DeepSeek Coder v2: First Open Coding Model that Beats GPT-4 Turbo?
DeepSeek Coder V2 is pitched as an open coding model that can rival—or even beat—GPT-4 Turbo on programming benchmarks, and the practical tests in...
Why Your RAG Gives Wrong Answers (And 4 Chunking Strategies to Fix It) | LangChain Text Splitters
RAG systems often fail for a surprisingly mundane reason: chunking breaks the information the model needs, even when embeddings, vector search, and...
DeepSeek v3 Tested - Coding, Data Extraction, Summarization, Data Labelling, RAG
DeepSeek V3 is positioned as a top-tier open-weight mixture-of-experts (MoE) model—strong on benchmarks and notably effective at real-world...
Gemma 3n: Open Multimodal Model by Google (Image, Audio, Video & Text) | Install and Test
Google’s Gemma 3n (Geometry N in the transcript) is positioned as an open, mobile-targeted multimodal model that can take in text plus images, audio,...
Build a Neural Network for Classification from Scratch with PyTorch
A penguin-species classifier built from scratch in PyTorch hinges on three practical steps: turning a cleaned pandas dataset into numeric tensors,...
Phi 2: Small Language Model Better Than 7B LLMs? | Google Colab Tutorial with Python
Microsoft’s Phi-2 (2.7B parameters) is positioned as a test of whether “small” language models can match the useful behavior of much larger 7B–13B...
How To Extract ChatGPT Hidden Training Data | Making LLMs (e.g. Llama) Spill Out Their Training Data
A new line of research argues that large language models—despite safeguards meant to prevent memorized training data from leaking—can still be coaxed...
Gemma 3 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG
Gemma 3’s biggest practical win in local testing is its ability to deliver reliable, structured outputs—especially for coding, data extraction, and...
Getting started with PyTorch Lightning for Deep Learning
PyTorch Lightning is positioned as a way to train deep learning models with PyTorch while cutting out much of the repetitive “boilerplate” code. The...
100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow
A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis...
Local Qwen 2.5 (14B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling
Qwen 2.5 14B running locally through Ollama (via an Ollama server) delivers a noticeable jump in text-heavy tasks—especially sentiment/topic labeling...
GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition
GM-OCR is a two-stage OCR system that combines document layout analysis with character-level recognition, and it’s drawing attention because it...
StableVicuna: The Best Open Source Local ChatGPT? LLM based on Vicuna and LLaMa.
Stability AI’s open-source chatbot model, StableVicuna, is positioned as a strong “local ChatGPT” alternative—especially because it can be run in a...
Build Web Scraper with Llama 3.1 | Get Structured Data By Scraping Web Content With AI
A practical pipeline for turning messy, JavaScript-heavy web pages into clean structured data is built by combining Playwright for rendering,...
Build Better RAGs with Contextual Retrieval
Contextual retrieval boosts retrieval-augmented generation (RAG) accuracy by enriching every text chunk with extra, chunk-specific context derived...
Gemma 4 Local Test | New Open LLM King?
Gemma 4’s open, on-device push is starting to look practical: a 26B mixture-of-experts (MoE) instruction-tuned model running locally via Wama CP...
Analyzing Cryptocurrency Sentiment on Twitter with LangChain and ChatGPT | CryptoGPT
CryptoGPT’s sentiment pipeline turns an author’s Twitter activity into daily sentiment scores by combining LangChain with ChatGPT and forcing...
Gemini CLI + MCP Tools Deep Dive - Build a Completely Local RAG with Ollama | Context7, NextJS
Gemini CLI can be paired with an MCP server (Context7) to generate and run a fully local RAG-style “chat with your files” web app—complete with...
Real-World PyTorch: From Zero to Hero in Deep Learning & LLMs | Tensors, Operations, Model Training
The core takeaway is that PyTorch training for real data comes down to three practical skills: building the right tensor shapes and dtypes, moving...
LiteParse - 100% Local PDF Parsing (No GPU) | Document Processing for RAG & AI Agents
LiteParse positions itself as a fully local alternative for extracting structured text from PDFs—without relying on GPUs or cloud document-parsing...
OuteTTS 0.3 - Local TTS and Voice Cloning
OuteTTS 0.3 is a local, Apache 2.0–licensed text-to-speech system that also supports voice cloning, letting users generate speech in multiple...
Grok - LLM by Elon Musk & xAI | Overview, Tech Stack, PromptIDE and Sample Prompts
Grok’s biggest differentiator is its claim of real-time knowledge drawn from the X platform, paired with a new “PromptIDE” tool aimed at making...
Build AI Agent Application with Agent Development Kit (ADK) | Get Started with Google's Agent SDK
Google’s Agent Development Kit (ADK) is positioned as a practical way to build agentic applications with a clear workflow structure, built-in...
MedGemma 27B (Local) Multimodal Health AI Advisor | Xrays and Text-Only Diagnosis Test
MedGemma 27B is a Google fine-tuned, multimodal health AI model that can take both text and medical images (like X-rays) and produce structured,...
Mistral 7B - better than Llama 2? | Getting started, Prompt template & Comparison with Llama 2
Mistral 7B Instruct is positioned as a smaller model that can outperform larger Llama 2–class competitors, and hands-on tests in a Google Colab...
Auto-GPT: Autonomous Investment Manager Powered by GPT-4?
Auto-GPT can run GPT-4 (or GPT-3.5) in an autonomous loop: it takes an initial goal, produces intermediate outputs, feeds those outputs back into...
Automated Prompt Engineering with DSPy | Prompt Optimization for Financial News Semantic Analysis
Prompt optimization can materially improve sentiment extraction from financial news without retraining a model—DSPy’s prompt optimizer boosted...
Linear Regression with TensorFlow.js
Linear regression in TensorFlow.js is built to learn the parameters of a straight-line (or hyperplane) relationship between house features and...
MemGPT - Unlimited Context Window (Memory) for LLMs | Paper review, Installation & Demo
MemGPT targets a core bottleneck in today’s large language models: limited context windows that force earlier parts of a conversation or large...
How to Deploy LLMs | LLMOps Stack with vLLM, Docker, Grafana & MLflow
Running an LLM locally is only half the job; production needs concurrency, security, monitoring, and a way to detect failures. A practical LLMOps...
Can GPT-4o's Memory Replace RAG Systems? Exploring Large Context Windows
GPT-4o’s ability to retrieve information from extremely long prompts looks strong enough to challenge the usual need for retrieval-augmented...
Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & Demo
Mistral AI’s Mixtral 8×7B (an open-weight sparse Mixture of Experts model) is positioned as a practical alternative to much larger LLMs by routing...
Grok 4.1 vs Gemini 3 Pro - Which Model is THE ONE? | Prompt & Coding First Look
Grok 4.1 and Gemini 3 Pro both land near the top of current AI leaderboards, but a quick side-by-side test suggests Gemini 3 Pro may have the edge...
Fine-Tuning LLM on Your Data using Single GPU | Sentiment Analysis for Cryptocurrency Tweets
Fine-tuning Quentry 3 on a small, sentiment-labeled cryptocurrency tweet dataset can deliver a sizable accuracy jump—even when training runs on a...
GPT-4o API Deep Dive: Text Generation, Streaming, Vision, and Function Calling
GPT-4o’s API is positioned as a drop-in upgrade for building faster, more capable AI apps—especially when you need streaming, structured JSON...
Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset
Choosing an LLM for a real project often fails when teams rely on classical ML metrics like accuracy, F1, or regression error. Those metrics assume...
Vicuna: An Open-Source Chatbot Comparable to ChatGPT and Google Bard
Vicuna is an open-source chatbot built to deliver ChatGPT-like quality without matching OpenAI’s closed model approach. The project centers on a 13...
LIMA: Can you Fine-Tune Large Language Models (LLMs) with Small Datasets? Less Is More for Alignment
Meta AI’s LIMA (“Less Is More for Alignment”) argues that strong alignment behavior in large language models can be achieved with surprisingly small...
LangChain Models: ChatGPT, Flan Alpaca, OpenAI Embeddings, Prompt Templates & Streaming
LangChain can unify three major building blocks—text generation models, embeddings, and chat interfaces—so the same workflow (prompting, formatting,...
The New Prompting Rules: How to Prompt Frontier LLM Models like Gemini 2.5, GPT 4.1 & Claude 3.7
Frontier LLMs are getting dramatically easier to use because context windows have ballooned to 200,000 tokens and beyond, letting models reliably...
Use Any LLM Provider with LiteLLM | Use ChatGPT, Claude, Gemini, Ollama with One API
Switching between large language model (LLM) providers can break production systems when code depends on a single vendor’s SDK. LiteLLM is presented...
XGen-7B: Long Sequence Modeling with (up to) 8K Tokens. Overview, Dataset & Google Colab Code.
Salesforce’s XGen-7B is positioned as an open 7-billion-parameter language model built for long-context work, with an input sequence length that...