Ollama - Libraries, Vision and Updates

TL;DR

Ollama added official Python and JavaScript libraries that call chat endpoints directly using OpenAI-like message roles, enabling faster prototyping without extra orchestration layers.

Briefing Cornell Notes

Briefing

Ollama’s latest updates push local AI further into “build-and-automate” territory: new Python/JavaScript libraries, expanded vision model support, and an OpenAI-compatible API layer that lets existing tooling run against local models. The practical payoff is faster prototyping—especially for RAG, agents, and batch workflows—without stitching together external frameworks just to make basic calls.

A major change is the addition of official Python and JavaScript libraries. Instead of manually hitting an endpoint or relying on orchestration frameworks like LangChain or LlamaIndex, developers can install Ollama and call chat-style endpoints directly using OpenAI-like message structures (roles such as user and system, plus content). The libraries also make it easier to run models in the background for non-interactive tasks. Rather than treating local LLMs only as real-time chatbots, the workflow shifts toward automation: looping over inputs, generating outputs, and scheduling runs like cron jobs. The transcript highlights this with examples using Mistral and LLaMA 2—showing streaming responses and noting that model load time can be a noticeable first step, followed by quicker token streaming once the model is resident.

Vision support is the second big pillar. Ollama has added LLaVA vision models (including LLaVA 1.6 variants at 7B, 13B, and 34B) and supports both command-line usage and library-driven automation. A simple CLI flow lets users pass an image path and a prompt to get descriptions back. More useful in practice is batch processing: pointing the system at a folder of screenshots or image files, generating captions/descriptions, and storing results for later use—potentially feeding into multimodal RAG pipelines. The transcript also emphasizes text-in-image extraction: these vision models can read text embedded in images, enabling faster indexing of image collections than relying on separate captioning or external services.

The third update is OpenAI compatibility. Ollama now integrates an API style that works with the OpenAI Python and JavaScript libraries by redirecting the base URL to a local Ollama server. That removes the need for an API key (a placeholder can be used) while keeping the familiar chat format (system/user/assistant roles and message content). This compatibility extends beyond direct OpenAI calls: tools that already speak the OpenAI API format—such as the Vercel AI SDK and frameworks like Autogen—can be pointed at Ollama locally, enabling multi-agent workflows to run on-device. The transcript cautions that local model quality still depends on which model is selected, but suggests that models like Mistral or Mixtral can deliver strong results for many tasks previously handled by cloud calls.

Finally, the interface and operational workflow improve. Ollama adds capabilities for saving and loading sessions/models, plus better visibility into model configuration—showing model files, templates, parameters, and the current system prompt. The transcript demonstrates setting a deliberately “bad” system prompt (a rude, slurring assistant) to verify prompt adherence, then saving it as a new model and reloading it later to confirm the behavior persists. Taken together, these updates make Ollama more turnkey for experimentation, batch processing, and multimodal local applications.

Cornell Notes

Ollama’s updates make local LLMs easier to use for automation and multimodal tasks. New official Python and JavaScript libraries let developers call chat endpoints directly with OpenAI-like message roles, enabling quick scripts and background processing (e.g., cron jobs) rather than only interactive chat. Vision support expands with LLaVA 1.6 models (7B, 13B, 34B), usable via CLI or libraries for tasks like image description, screenshot indexing, and text-in-image extraction. OpenAI compatibility redirects the OpenAI Python/JavaScript libraries to a local Ollama base URL, letting existing tooling (including Vercel AI SDK and Autogen-style workflows) run against local models. Added model/session saving and clearer configuration inspection improve prompt testing and reproducibility.

What changed with Ollama’s Python and JavaScript libraries, and why does it matter for real projects?

Ollama added official Python and JavaScript libraries so developers can install Ollama and call chat endpoints directly, using message formats similar to OpenAI chat (roles like user and system, plus content). This reduces dependence on external orchestration frameworks for basic usage. It also supports streaming responses and makes it straightforward to automate tasks—such as iterating over inputs and running models in the background—rather than requiring real-time back-and-forth interaction.

How do the new vision models fit into an automation workflow?

Ollama added LLaVA vision models, including LLaVA 1.6 variants at 7B, 13B, and 34B. They can be used from the command line by passing an image path and a prompt, but the transcript stresses library-based batch processing: point to a folder of images (like screenshots), generate descriptions or extracted text, and save results to a spreadsheet or database. That output can then feed into multimodal RAG pipelines.

What practical vision tasks are highlighted beyond “describe an image”?

The transcript emphasizes two practical uses: (1) indexing image collections by generating captions/descriptions and (2) text recognition inside images. It notes that the models can read text embedded in images, which can replace or reduce reliance on separate captioning models or external services for building searchable image metadata.

How does OpenAI compatibility work with existing libraries and frameworks?

Ollama integrates an OpenAI-compatible API layer by letting OpenAI Python/JavaScript libraries talk to a local Ollama server through a base URL override. The chat message format stays the same (system/user/assistant roles and content), and no real API key is required—any placeholder can be used. Because many tools already assume the OpenAI API format, they can be redirected to Ollama locally; the transcript specifically mentions the Vercel AI SDK and adapting Autogen-style agents by setting the base URL and model.

Why are model/session saving and configuration visibility important for prompt testing?

The transcript describes UI improvements that show model files, templates, parameters, and the active system prompt. It demonstrates setting a system prompt designed to produce a “drunk assistant” response, verifying the model follows it, then saving that configuration as a new model name (e.g., “drunk”). After reloading, the behavior persists, making it easier to reproduce experiments and compare prompt variants.

Review Questions

Which parts of Ollama’s new Python/JavaScript libraries reduce the need for frameworks like LangChain or LlamaIndex, and how does that change typical development workflows?
How can LLaVA 1.6 vision models be used to turn a folder of screenshots into searchable metadata for later RAG or indexing?
What does OpenAI compatibility enable in terms of reusing existing OpenAI-based SDKs and agent frameworks locally?

Key Points

1
Ollama added official Python and JavaScript libraries that call chat endpoints directly using OpenAI-like message roles, enabling faster prototyping without extra orchestration layers.
2
The libraries support streaming and make it easier to run local models in batch/background workflows (e.g., cron jobs) instead of only interactive chat.
3
Ollama expanded vision capabilities with LLaVA 1.6 models (7B, 13B, 34B), usable via CLI and libraries for image description and text-in-image extraction.
4
OpenAI compatibility redirects OpenAI Python/JavaScript SDKs to a local Ollama base URL, removing the need for an API key and preserving the familiar system/user/assistant message format.
5
OpenAI-style compatibility helps existing tools (including Vercel AI SDK and Autogen-style workflows) switch from cloud models to local Ollama models by changing the base URL and model selection.
6
Ollama’s UI improvements make model configuration and system prompts visible, and added save/load capabilities help reproduce prompt experiments reliably.

Highlights

Official Python/JavaScript libraries let developers call Ollama chat endpoints with OpenAI-like message structures, making automation scripts much simpler.

Vision support now includes LLaVA 1.6 variants (7B/13B/34B), enabling batch captioning and text extraction from screenshots.

OpenAI compatibility works by pointing OpenAI SDKs at a local Ollama base URL, letting existing OpenAI-based tooling run locally.

Saving and reloading model/session configurations preserves system prompts, making prompt testing repeatable. 

Topics

Ollama Libraries
Vision Models
OpenAI Compatibility
Local RAG
Multimodal Automation

Mentioned

Sam Witteveen
RAG
API
NPM
CPU
VLM
VLMs