Get AI summaries of any video or article — Sign up free

Venelin Valkov — Channel Summaries

AI-powered summaries of 131 videos about Venelin Valkov.

131 summaries

No matches found.

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

Venelin Valkov · 3 min read

Fine-tuning Llama 2 on a task-specific dataset can dramatically improve how well a small “base” model produces structured, useful outputs—especially...

Fine-TuningQLoRALlama 2

Time Series Prediction with LSTMs using TensorFlow 2 and Keras in Python

Venelin Valkov · 3 min read

Time series forecasting with LSTMs hinges on treating past observations as a sequence, not as independent data points—and the practical payoff is a...

Time Series ForecastingLSTMBidirectional LSTM

Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

Venelin Valkov · 2 min read

Fine-tuning Falcon 7B on a single GPU is practical even with a tiny, FAQ-style dataset—using QLoRA to train only a small slice of parameters. The...

QLoRA Fine-TuningFalcon 7BSingle GPU Training

Fine-Tuning Llama 3 on a Custom Dataset: Training LLM for a RAG Q&A Use Case on a Single GPU

Venelin Valkov · 3 min read

Fine-tuning Meta’s Llama 3 8B Instruct on a domain-specific Q&A dataset can be done on a single GPU by combining 4-bit quantization with a LoRA-style...

LoRA Fine-Tuning4-Bit QuantizationChat Template Formatting

Chat With Your Database! Build a Local SQL AI Agent to Query Databases (LangChain & Ollama)

Venelin Valkov · 3 min read

A fully local “chat with your database” agent can translate natural-language questions into SQL, run the queries against a local SQLite database, and...

Local SQL AgentLangChain ToolsOllama Tool Calling

Private GPT4All : Chat with PDF with Local & Free LLM using GPT4All, LangChain & HuggingFace

Venelin Valkov · 2 min read

Running a local, privacy-friendly “chat with your PDF” pipeline is practical with GPT4All—provided the workflow is built around retrieval (embeddings...

Local PDF Q&ARetrieval-Augmented GenerationGPT4AllJ

Intent Recognition with BERT using Keras and TensorFlow 2 in Python | Text Classification Tutorial

Venelin Valkov · 3 min read

Fine-tuning a pre-trained BERT model for intent recognition on a seven-class dataset can deliver near-saturating accuracy—about 97% on a held-out...

Intent RecognitionBERT Fine-TuningText Classification

Build an AI Document (PDF, DOC, XML) Processing Pipeline for RAG | Docling, OCR, Chunking, Images

Venelin Valkov · 3 min read

Turning messy PDFs into reliable knowledge for RAG hinges on more than OCR. The core takeaway is a three-stage, fully local pipeline that converts a...

Docling PipelineOCR to MarkdownTable Handling

SQL AI Agents: Analyze Relational Databases with Natural Language using Llama 3 (LLM) and CrewAI

Venelin Valkov · 3 min read

AI agents can turn natural-language questions into SQL queries, pull results from a relational database, and then generate a readable analysis and...

SQL Agent TeamsNatural Language to SQLCrewAI Orchestration

MCP Complete Tutorial - Connect Local AI Agent (Ollama) to Tools with MCP Server and Client

Venelin Valkov · 3 min read

Model Context Protocol (MCP) is positioned as a standardized way to connect AI models to external tools and data without hand-building bespoke...

Model Context ProtocolMCP ToolsJSON-RPC

Visual Debugger for Jupyter Lab/IPython Notebooks | Installation, Code Examples & Debugging

Venelin Valkov · 2 min read

A new visual debugger for JupyterLab/IPython notebooks adds interactive, breakpoint-based debugging directly in the notebook UI—complete with...

JupyterLab DebuggingConda InstallationJupyterLab Extension

Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama

Venelin Valkov · 2 min read

A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and...

Local AI AssistantDeepSeek-R1Ollama Deployment

100% Local RAG with DeepSeek-R1, Ollama and LangChain - Build Document AI for Your Private Files

Venelin Valkov · 2 min read

A practical way to make local RAG work reliably on long documents is to retrieve the right text chunks—then feed only those chunks (plus chat...

Local RAGHybrid RetrievalDocument Chunking

Getting Started with LangChain and Llama 2 in 15 Minutes | Beginner's Guide to LangChain

Venelin Valkov · 3 min read

LangChain’s core value is turning large language models like Llama 2 into systems that can pull in outside information and take actions—by chaining...

LangChain BasicsLlama 2 SetupRetrieval QA

Create Custom Dataset for Question Answering with T5 using HuggingFace, Pytorch Lightning & PyTorch

Venelin Valkov · 2 min read

Fine-tuning T5 for question answering starts with turning BioASQ biomedical QA files into a model-ready dataset: each training example becomes a...

BioASQ Data PreparationT5 TokenizationQuestion Answering Dataset

Mastery List GPT: Chat with your ToDO List | Time Management and Habits with ChatGPT and LangChain

Venelin Valkov · 3 min read

A Streamlit app can turn a simple habit spreadsheet into a working daily schedule—and then let users revise it through chat—by combining...

Habit SchedulingStreamlit AppLangChain Prompts

Customer Support Chatbot using Custom Knowledge Base with LangChain and Private LLM

Venelin Valkov · 3 min read

A practical blueprint for building a customer-support chatbot from a custom knowledge base hinges on one design choice: retrieve the most relevant...

Retrieval-Augmented GenerationLangChain QA ChainChroma Vector Database

Advanced RAG with Llama 3 in Langchain | Chat with PDF using Free Embeddings, Reranker & LlamaParse

Venelin Valkov · 3 min read

Building a high-quality “chat with your PDF” system hinges less on the language model and more on the pipeline around it: parsing complex documents...

Advanced RAGPDF ParsingEmbeddings

Fine-tuning Alpaca: Train Alpaca LoRa for Sentiment Analysis on a Custom Dataset

Venelin Valkov · 2 min read

Fine-tuning Llama 7B with LoRA on a custom Bitcoin-tweet sentiment dataset can produce a practical sentiment classifier that labels new tweets as...

LoRA Fine-TuningLlama 7BBitcoin Sentiment

Local RAG with Llama 3.1 for PDFs | Private Chat with Your Documents using LangChain & Streamlit

Venelin Valkov · 3 min read

A fully local “chat with your PDFs” system can be built using open models and self-hosted infrastructure, with responses grounded in retrieved...

Local RAGPDF IngestionVector Retrieval

Deploy Your Private Llama 2 Model to Production with Text Generation Inference and RunPod

Venelin Valkov · 3 min read

Deploying a private Llama 2–style model into production is practical on a single GPU when Text Generation Inference (TGI) is used as the serving...

Llama 2 DeploymentText Generation InferenceRunPod GPU Hosting

Analyze Custom CSV Data with GPT-4 using Langchain

Venelin Valkov · 3 min read

A LangChain “CSV agent” can turn a custom Bitcoin price spreadsheet into a question-answering system that writes and runs pandas code on the fly—then...

LangChain CSV AgentGPT-4 Data AnalysisBitcoin Price CSV

Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU

Venelin Valkov · 3 min read

Fine-tuning a “tiny” LLM on a custom dataset can deliver strong sentiment and topic predictions using a single GPU—provided the training setup is...

LoRA Fine-TuningTinyLlamaSentiment Analysis

Build a Private Chatbot with Local LLM (Falcon 7B) and LangChain

Venelin Valkov · 2 min read

A practical recipe for running a private chatbot on a single GPU hinges on two engineering moves: loading Falcon 7B instruct in 8-bit to fit within...

Local LLM8-bit QuantizationStopping Criteria

Fine-tuning Llama 3.2 on Your Data with a single GPU | Training LLM for Sentiment Analysis

Venelin Valkov · 3 min read

Fine-tuning Llama 3.2 (1B) for sentiment classification on a custom mental-health dataset can jump accuracy from roughly 30% to nearly 85% using a...

LoRA Fine-TuningLlama 3.2 1BSentiment Classification

100% Free Claude Code | Run Claude Code with Local LLM with Ollama and Qwen 3.5

Venelin Valkov · 2 min read

Running Claude Code locally with an Ollama-backed Qwen model can deliver practical coding assistance—especially when the task is narrowly scoped to...

Claude CodeOllamaQwen 3.5

QLoRA: Efficient Finetuning of Large Language Models on a Single GPU? LoRA & QLoRA paper review

Venelin Valkov · 2 min read

QLoRA (4-bit QLoRA) makes it practical to fine-tune very large language models on a single consumer-style GPU by combining three ideas: LoRA-style...

LoRAQLoRA4-Bit Quantization

Faster LLM Inference: Speeding up Falcon 7b (with QLoRA adapter) Prediction Time

Venelin Valkov · 2 min read

Fine-tuned Falcon 7B inference speed can be cut dramatically by changing how the model is loaded—especially by running the quantized model in 8-bit....

LLM Inference SpeedFalcon 7BQLoRA Adapters

Vectorless RAG - Local Financial RAG Without Vector Database | Tree-Based Indexing with Ollama

Venelin Valkov · 3 min read

Vectorless RAG can retrieve and answer questions from structured documents without any vector database by building a tree index from the document’s...

Vectorless RAGTree-Based IndexingLocal Financial RAG

Dolly 2.0: Free ChatGPT-like Model for Commercial Use

Venelin Valkov · 2 min read

Dolly 2.0 is being released as a genuinely commercial-friendly, open instruction-tuned language model—complete with training code, dataset, and model...

Dolly 2.0Instruction TuningDolly 15K

Boost Your AI Predictions: Maximize Speed with vLLM Library for Large Language Model Inference

Venelin Valkov · 2 min read

vLLM is positioned as a practical way to speed up large language model inference by boosting throughput—often by several multiples—without changing...

Paged AttentionLLM Inference SpeedKV Tensor Memory

LLM JSON Output - Get Valid JSON with Pydantic and LangChain Output Parsers

Venelin Valkov · 2 min read

Getting reliable JSON from large language models—especially ones that don’t natively support structured outputs—requires more than “please output...

JSON OutputPydantic SchemasLangChain Output Parsers

Deploy LayoutLMv3 for Document Classification using Streamlit, Transformers and HuggingFace Spaces

Venelin Valkov · 2 min read

A Streamlit web app is built to classify document images using a fine-tuned LayoutLMv3 model, then deployed to Hugging Face Spaces so anyone can...

Document ClassificationLayoutLMv3Streamlit Deployment

Convert Any Document To LLM Knowledge with Docling & Ollama (100% Local) | PDF to Markdown Pipeline

Venelin Valkov · 2 min read

Building a reliable, fully local knowledge base from PDFs hinges on turning messy layouts—especially tables and charts—into structured Markdown that...

PDF to MarkdownDocling PipelineTable Extraction

Sentence Transformers (SBERT) with PyTorch: Similarity and Semantic Search

Venelin Valkov · 2 min read

Sentence Transformers (SBERT) turn sentences into fixed-length embeddings and then use cosine similarity to score semantic closeness—making it...

Sentence EmbeddingsSiamese NetworksSemantic Search

Getting Started with TensorFlow.js | Deep Learning for JavaScript Hackers (Part 0)

Venelin Valkov · 3 min read

TensorFlow.js is positioned as a way to run machine-learning workflows directly in JavaScript—either in the browser or in Node.js—by bridging...

TensorFlow.js SetupTensors and OperationsTFJS Vis Charts

Reproducible Machine Learning & Experiment Tracking Pipeline with Python and DVC

Venelin Valkov · 2 min read

Data and model reproducibility hinges on tracking not just code, but the exact datasets, derived features, trained artifacts, and evaluation outputs...

DVC PipelinesExperiment TrackingReproducible ML

Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations

Venelin Valkov · 3 min read

LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it,...

LangChain MemoryConversation Buffer MemorySummary Buffer Memory

DeepSeek Coder: AI Writes Code | Free LLM For Code Generation Beats ChatGPT, ChatDev & Code Llama

Venelin Valkov · 3 min read

DeepSeek Coder is an open-source code-focused language model from DeepSeek AI that’s trained heavily on programming data and tuned to follow coding...

DeepSeek CoderCode GenerationLeetCode

HuggingGPT & JARVIS: "Advanced Artificial Intelligence" with ChatGPT and HuggingFace

Venelin Valkov · 3 min read

HuggingGPT reframes “advanced AI” as orchestration: a large language model like ChatGPT (or GPT-4) can act as a controller that plans which...

HuggingGPTModel OrchestrationMultimodal AI

Local Gemma 4 with OpenCode & llama.cpp | Build a Local RAG with LangChain | 🔴 Live

Venelin Valkov · 3 min read

A local RAG app built around Gemma 4 can work surprisingly well on a single machine—but getting reliable retrieval depends less on the chat model and...

Local RAGGemma 4llama.cpp

Mamba vs. Transformers: The Future of LLMs? | Paper Overview & Google Colab Code & Mamba Chat

Venelin Valkov · 3 min read

Mamba’s core pitch is a way to make large language models handle much longer inputs without paying Transformers’ usual attention cost. Transformers...

Mamba ArchitectureSelective State SpacesLong-Context LLMs

100% Local CAG with Qwen3, Ollama and LangChain - AI Chatbot for Your Private Documents

Venelin Valkov · 3 min read

Cache-augmented generation (CAG) is presented as a simpler alternative to retrieval-augmented generation (RAG) for private-document chat: instead of...

Cache Augmented GenerationPrompt CachingLong-Context Comprehension

100% Local PDF OCR with Docling and Ollama | PDF to Markdown with VLM (Nanonets-OCR-s)

Venelin Valkov · 2 min read

A local, fully self-hosted pipeline can convert PDFs into Markdown by swapping out traditional OCR for a visual language model—specifically Docling...

DoclingOllamaPDF OCR

OpenLLaMA: Open-Source Reproduction of Meta AI's LLaMA for Commercial Use. Run in Google Colab.

Venelin Valkov · 2 min read

OpenLLaMA (a 7B-parameter, open-source LLaMA-style model) can be run in Google Colab using Hugging Face Transformers, but getting usable text depends...

OpenLLaMAHugging Face TransformersTop-k Sampling

Pydantic AI Tutorial: Build Agents to Analyze Mobile App Reviews in Python

Venelin Valkov · 2 min read

A practical agent workflow can turn stored mobile app reviews into a structured product brief—complete with improvement themes, marketing-ready...

Agentic ApplicationsPydantic AISQL Review Mining

Build 100% Local Chatbot with Gemma 3, Ollama and LangChain | AI Assistant with Memory and Tool Use

Venelin Valkov · 3 min read

A fully local chatbot can now keep both conversation history and long-term “memories” across separate chats—without sending data to a hosted service....

Local ChatbotMemory RetrievalTool Calling

Grok-1 Open Source: 314B Mixture-of-Experts Model by xAI | Blog post, GitHub/Source Code

Venelin Valkov · 2 min read

xAI has open-sourced Gro-1, a 314B-parameter mixture-of-experts (MoE) model, releasing not only weights but also the model architecture and training...

Gro-1 Open SourceMixture of ExpertsJAX Implementation

DeepSeek R1 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG

Venelin Valkov · 3 min read

DeepSeek R1 and R10 are reasoning-focused large language models trained with a multi-stage process that aims to fix early problems like endless...

DeepSeek R1 TrainingOllama Local TestingReasoning Tags

Build 100% Local AI Agent to Chat with Your Files | Private AI Knowledge Base with MCP & RAG

Venelin Valkov · 3 min read

A fully local “private knowledge base” agent can chat with a user’s own files by combining a custom MCP tool server with retrieval-augmented...

Local AI AgentMCP ToolsRAG Retrieval

Build a custom dataset with LightningDataModule in PyTorch Lightning

Venelin Valkov · 2 min read

A practical path to text classification in PyTorch Lightning starts with turning the multi-annotator GoEmotions dataset into one clean label per...

GoEmotions LabelingElectra TokenizationPyTorch Dataset

LLM Function Calling (Tool Use) with Llama 3 | Tool Choice, Argument Mapping, Groq Llama 3 Tool Use

Venelin Valkov · 3 min read

Function calling with Llama 3 is no longer a niche capability: a Groq-tuned “Llama 3 tool use” model can reliably translate natural-language requests...

Function CallingTool UseLlama 3

Loaders, Indexes & Vectorstores in LangChain: Question Answering on PDF files with ChatGPT

Venelin Valkov · 3 min read

A practical LangChain pipeline for turning PDFs, YouTube transcripts, and plain text into question-answering over embeddings is the core takeaway—and...

LangChain LoadersVector StoresEmbeddings

Train Deep Learning Model with PyTorch Lightning - TensorBoard, Learning rate finder and Checkpoints

Venelin Valkov · 2 min read

Fine-tuning an ELECTRA-based emotion classifier in PyTorch Lightning gets a major boost from two training “plumbing” upgrades: automatically finding...

Learning Rate FinderTensorBoard LoggingModel Checkpointing

Generative Agents: Simulating Human Behavior with ChatGPT

Venelin Valkov · 2 min read

Generative agents built on ChatGPT can simulate believable, goal-driven human behavior inside a small virtual town—without hand-scripting every...

Generative AgentsChatGPTAgent Memory

DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama

Venelin Valkov · 3 min read

DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with...

Local Document ChatDeepSeek-R1Ollama

Llama 3.3 70B Test - Coding, Data Extraction, Summarization, Data Labelling, RAG

Venelin Valkov · 3 min read

Meta’s Llama 3.3 70B is landing as a strong all-around text model, with independent evaluations and hands-on tests pointing to performance that...

Llama 3.3 70BGroq APICoding

Local Llama 3.2 (3B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling

Venelin Valkov · 2 min read

A 3B quantized Llama 3.2 model running locally through Ollama delivers fast, usable results for structured data extraction—especially when...

Local Llama 3.2Ollama InferenceStructured Data Extraction

DeepSeek Coder v2: First Open Coding Model that Beats GPT-4 Turbo?

Venelin Valkov · 3 min read

DeepSeek Coder V2 is pitched as an open coding model that can rival—or even beat—GPT-4 Turbo on programming benchmarks, and the practical tests in...

DeepSeek Coder V2Open Coding ModelsMixture of Experts

Why Your RAG Gives Wrong Answers (And 4 Chunking Strategies to Fix It) | LangChain Text Splitters

Venelin Valkov · 3 min read

RAG systems often fail for a surprisingly mundane reason: chunking breaks the information the model needs, even when embeddings, vector search, and...

RAG ChunkingLangChain Text SplittersSemantic Chunking

DeepSeek v3 Tested - Coding, Data Extraction, Summarization, Data Labelling, RAG

Venelin Valkov · 3 min read

DeepSeek V3 is positioned as a top-tier open-weight mixture-of-experts (MoE) model—strong on benchmarks and notably effective at real-world...

Mixture of ExpertsFP8 TrainingChain-of-Thought Post-Training

Gemma 3n: Open Multimodal Model by Google (Image, Audio, Video & Text) | Install and Test

Venelin Valkov · 3 min read

Google’s Gemma 3n (Geometry N in the transcript) is positioned as an open, mobile-targeted multimodal model that can take in text plus images, audio,...

Gemma 3nMultimodal InferenceHugging Face Transformers

Build a Neural Network for Classification from Scratch with PyTorch

Venelin Valkov · 2 min read

A penguin-species classifier built from scratch in PyTorch hinges on three practical steps: turning a cleaned pandas dataset into numeric tensors,...

Penguin ClassificationPyTorch TensorsTrain/Test Split

Phi 2: Small Language Model Better Than 7B LLMs? | Google Colab Tutorial with Python

Venelin Valkov · 3 min read

Microsoft’s Phi-2 (2.7B parameters) is positioned as a test of whether “small” language models can match the useful behavior of much larger 7B–13B...

Phi-2 ModelSmall Language ModelsSynthetic Training Data

How To Extract ChatGPT Hidden Training Data | Making LLMs (e.g. Llama) Spill Out Their Training Data

Venelin Valkov · 2 min read

A new line of research argues that large language models—despite safeguards meant to prevent memorized training data from leaking—can still be coaxed...

Training Data ExtractionMemorization RiskSuffix Array Matching

Gemma 3 Local Test with Ollama: Coding, Data Extraction, Data Labelling, Summarization, RAG

Venelin Valkov · 3 min read

Gemma 3’s biggest practical win in local testing is its ability to deliver reliable, structured outputs—especially for coding, data extraction, and...

Gemma 3OllamaQuantized Models

Getting started with PyTorch Lightning for Deep Learning

Venelin Valkov · 3 min read

PyTorch Lightning is positioned as a way to train deep learning models with PyTorch while cutting out much of the repetitive “boilerplate” code. The...

PyTorch LightningGoEmotionsMulti-Label Classification

100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow

Venelin Valkov · 3 min read

A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis...

Local AI AgentsReddit Data FetchingLangGraph Workflows

Local Qwen 2.5 (14B) Test using Ollama - Summarization, Structured Text Extraction, Data Labelling

Venelin Valkov · 2 min read

Qwen 2.5 14B running locally through Ollama (via an Ollama server) delivers a noticeable jump in text-heavy tasks—especially sentiment/topic labeling...

Local OllamaQwen 2.5Structured Text Extraction

GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition

Venelin Valkov · 2 min read

GM-OCR is a two-stage OCR system that combines document layout analysis with character-level recognition, and it’s drawing attention because it...

GM-OCRDocument Layout AnalysisTable Recognition

StableVicuna: The Best Open Source Local ChatGPT? LLM based on Vicuna and LLaMa.

Venelin Valkov · 2 min read

Stability AI’s open-source chatbot model, StableVicuna, is positioned as a strong “local ChatGPT” alternative—especially because it can be run in a...

StableVicunaLocal LLMModel Quantization

Build Web Scraper with Llama 3.1 | Get Structured Data By Scraping Web Content With AI

Venelin Valkov · 3 min read

A practical pipeline for turning messy, JavaScript-heavy web pages into clean structured data is built by combining Playwright for rendering,...

Web ScrapingPlaywright RenderingHTML to Markdown

Build Better RAGs with Contextual Retrieval

Venelin Valkov · 3 min read

Contextual retrieval boosts retrieval-augmented generation (RAG) accuracy by enriching every text chunk with extra, chunk-specific context derived...

Contextual RetrievalRAG AccuracyVector Databases

Gemma 4 Local Test | New Open LLM King?

Venelin Valkov · 3 min read

Gemma 4’s open, on-device push is starting to look practical: a 26B mixture-of-experts (MoE) instruction-tuned model running locally via Wama CP...

Gemma 4Local LLMMultimodal Inference

Analyzing Cryptocurrency Sentiment on Twitter with LangChain and ChatGPT | CryptoGPT

Venelin Valkov · 2 min read

CryptoGPT’s sentiment pipeline turns an author’s Twitter activity into daily sentiment scores by combining LangChain with ChatGPT and forcing...

Crypto SentimentLangChainChatGPT Prompting

Gemini CLI + MCP Tools Deep Dive - Build a Completely Local RAG with Ollama | Context7, NextJS

Venelin Valkov · 3 min read

Gemini CLI can be paired with an MCP server (Context7) to generate and run a fully local RAG-style “chat with your files” web app—complete with...

Gemini CLIContext7 MCPLocal RAG

Real-World PyTorch: From Zero to Hero in Deep Learning & LLMs | Tensors, Operations, Model Training

Venelin Valkov · 3 min read

The core takeaway is that PyTorch training for real data comes down to three practical skills: building the right tensor shapes and dtypes, moving...

PyTorch TensorsCUDA Device ManagementCustom Dataset

LiteParse - 100% Local PDF Parsing (No GPU) | Document Processing for RAG & AI Agents

Venelin Valkov · 2 min read

LiteParse positions itself as a fully local alternative for extracting structured text from PDFs—without relying on GPUs or cloud document-parsing...

Local PDF ParsingBounding BoxesTable Extraction

OuteTTS 0.3 - Local TTS and Voice Cloning

Venelin Valkov · 2 min read

OuteTTS 0.3 is a local, Apache 2.0–licensed text-to-speech system that also supports voice cloning, letting users generate speech in multiple...

Local TTSVoice CloningMulti-Language Speakers

Grok - LLM by Elon Musk & xAI | Overview, Tech Stack, PromptIDE and Sample Prompts

Venelin Valkov · 3 min read

Grok’s biggest differentiator is its claim of real-time knowledge drawn from the X platform, paired with a new “PromptIDE” tool aimed at making...

Grok OverviewxAI MissionPromptIDE

Build AI Agent Application with Agent Development Kit (ADK) | Get Started with Google's Agent SDK

Venelin Valkov · 3 min read

Google’s Agent Development Kit (ADK) is positioned as a practical way to build agentic applications with a clear workflow structure, built-in...

Agent Development KitTool CallingGemini 2.0 Flash

MedGemma 27B (Local) Multimodal Health AI Advisor | Xrays and Text-Only Diagnosis Test

Venelin Valkov · 2 min read

MedGemma 27B is a Google fine-tuned, multimodal health AI model that can take both text and medical images (like X-rays) and produce structured,...

MedGemma 27BMultimodal Health AIX-ray Diagnosis

Mistral 7B - better than Llama 2? | Getting started, Prompt template & Comparison with Llama 2

Venelin Valkov · 2 min read

Mistral 7B Instruct is positioned as a smaller model that can outperform larger Llama 2–class competitors, and hands-on tests in a Google Colab...

Mistral 7B InstructLlama 2 ComparisonGrouped Query Attention

Auto-GPT: Autonomous Investment Manager Powered by GPT-4?

Venelin Valkov · 2 min read

Auto-GPT can run GPT-4 (or GPT-3.5) in an autonomous loop: it takes an initial goal, produces intermediate outputs, feeds those outputs back into...

Auto-GPTAutonomous AgentsCrypto Portfolio

Automated Prompt Engineering with DSPy | Prompt Optimization for Financial News Semantic Analysis

Venelin Valkov · 2 min read

Prompt optimization can materially improve sentiment extraction from financial news without retraining a model—DSPy’s prompt optimizer boosted...

DSPy Prompt OptimizationFinancial News Sentiment AnalysisMIPROv2

Linear Regression with TensorFlow.js

Venelin Valkov · 2 min read

Linear regression in TensorFlow.js is built to learn the parameters of a straight-line (or hyperplane) relationship between house features and...

Linear RegressionTensorFlow.jsMultiple Linear Regression

MemGPT - Unlimited Context Window (Memory) for LLMs | Paper review, Installation & Demo

Venelin Valkov · 3 min read

MemGPT targets a core bottleneck in today’s large language models: limited context windows that force earlier parts of a conversation or large...

Virtual Context ManagementExternal MemoryTransformer Context Limits

How to Deploy LLMs | LLMOps Stack with vLLM, Docker, Grafana & MLflow

Venelin Valkov · 3 min read

Running an LLM locally is only half the job; production needs concurrency, security, monitoring, and a way to detect failures. A practical LLMOps...

LLMOps StackvLLM ServingDocker Compose Deployment

Can GPT-4o's Memory Replace RAG Systems? Exploring Large Context Windows

Venelin Valkov · 3 min read

GPT-4o’s ability to retrieve information from extremely long prompts looks strong enough to challenge the usual need for retrieval-augmented...

RAG vs Long ContextGPT-4o MemoryNeedle in a Needle Stack

Mixtral - Mixture of Experts (MoE) Free LLM that Rivals ChatGPT (3.5) by Mistral | Overview & Demo

Venelin Valkov · 2 min read

Mistral AI’s Mixtral 8×7B (an open-weight sparse Mixture of Experts model) is positioned as a practical alternative to much larger LLMs by routing...

Mixture of ExpertsSparse RoutingInstruction Tuning

Grok 4.1 vs Gemini 3 Pro - Which Model is THE ONE? | Prompt & Coding First Look

Venelin Valkov · 3 min read

Grok 4.1 and Gemini 3 Pro both land near the top of current AI leaderboards, but a quick side-by-side test suggests Gemini 3 Pro may have the edge...

Model ComparisonPromptingCoding Output

Fine-Tuning LLM on Your Data using Single GPU | Sentiment Analysis for Cryptocurrency Tweets

Venelin Valkov · 3 min read

Fine-tuning Quentry 3 on a small, sentiment-labeled cryptocurrency tweet dataset can deliver a sizable accuracy jump—even when training runs on a...

LLM Fine-TuningSentiment AnalysisCrypto Tweets

GPT-4o API Deep Dive: Text Generation, Streaming, Vision, and Function Calling

Venelin Valkov · 3 min read

GPT-4o’s API is positioned as a drop-in upgrade for building faster, more capable AI apps—especially when you need streaming, structured JSON...

GPT-4o APIStreaming ResponsesJSON Output

Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset

Venelin Valkov · 3 min read

Choosing an LLM for a real project often fails when teams rely on classical ML metrics like accuracy, F1, or regression error. Those metrics assume...

LLM EvaluationRAG MetricsAI-as-Judge

Vicuna: An Open-Source Chatbot Comparable to ChatGPT and Google Bard

Venelin Valkov · 2 min read

Vicuna is an open-source chatbot built to deliver ChatGPT-like quality without matching OpenAI’s closed model approach. The project centers on a 13...

VicunaLLaMA Fine-TuningShareGPT Data

LIMA: Can you Fine-Tune Large Language Models (LLMs) with Small Datasets? Less Is More for Alignment

Venelin Valkov · 2 min read

Meta AI’s LIMA (“Less Is More for Alignment”) argues that strong alignment behavior in large language models can be achieved with surprisingly small...

LIMA Fine-TuningSupervised AlignmentSmall Dataset

LangChain Models: ChatGPT, Flan Alpaca, OpenAI Embeddings, Prompt Templates & Streaming

Venelin Valkov · 2 min read

LangChain can unify three major building blocks—text generation models, embeddings, and chat interfaces—so the same workflow (prompting, formatting,...

LangChain Model ComparisonPrompt TemplatesEmbeddings

The New Prompting Rules: How to Prompt Frontier LLM Models like Gemini 2.5, GPT 4.1 & Claude 3.7

Venelin Valkov · 3 min read

Frontier LLMs are getting dramatically easier to use because context windows have ballooned to 200,000 tokens and beyond, letting models reliably...

Long ContextInstruction FollowingPrompt Delimiters

Use Any LLM Provider with LiteLLM | Use ChatGPT, Claude, Gemini, Ollama with One API

Venelin Valkov · 2 min read

Switching between large language model (LLM) providers can break production systems when code depends on a single vendor’s SDK. LiteLLM is presented...

LLM Provider AbstractionStructured OutputsPydantic Validation

XGen-7B: Long Sequence Modeling with (up to) 8K Tokens. Overview, Dataset & Google Colab Code.

Venelin Valkov · 3 min read

Salesforce’s XGen-7B is positioned as an open 7-billion-parameter language model built for long-context work, with an input sequence length that...

Long ContextModel TrainingMultilingual Data