Dolly 2.0: Free ChatGPT-like Model for Commercial Use
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Dolly 2.0 is released as an open instruction-tuned 12B model with weights, training code, and the Dolly 15K dataset explicitly positioned for commercial use.
Briefing
Dolly 2.0 is being released as a genuinely commercial-friendly, open instruction-tuned language model—complete with training code, dataset, and model weights—aimed at giving developers a ChatGPT-like option without paying for a closed API. Databricks positions it as an “open instruction following” model fine-tuned on human-generated instruction data, built for research and commercial use rather than as a purely academic artifact.
At the core is a 12 billion parameter model based on a Pythia-style foundation, trained on The Pile (described as 800GB+ of diverse text). The instruction tuning relies on the Dolly 15K dataset, created by Databricks employees through a structured labeling contest. Labelers tackled seven task types, including open question answering, closed question answering, information extraction from provided text, summarization, brainstorming, classification, and creative writing such as poems and roleplay-style outputs. The dataset is described as containing long, high-quality answers, with examples provided in the release materials.
Databricks also sets expectations: Dolly 2.0 is not presented as state-of-the-art compared with top closed models like GPT-3, GPT-4, or similar systems. Instead, the release is framed as a “seed” for future work—an open dataset and model that can bootstrap follow-on instruction-tuned systems.
To make the model usable, the release points to a Hugging Face repository that includes the model and an instruction text generation pipeline. The pipeline uses a prompt template with an instruction and a response field, then relies on generation plus post-processing to extract the model’s response using regular expressions and an end marker. The practical setup in the walkthrough uses a Google Colab notebook, but it requires Google Colab Pro because the model’s footprint is roughly 24GB—too large for typical free tiers. The setup installs Accelerate and Transformers, loads the tokenizer and model from the Hugging Face repo, and uses bfloat16 and an automatic device map to run on GPU when available.
In live prompt tests, Dolly 2.0 produces coherent, sometimes quirky answers while respecting constraints like “no more than three sentences” or “single sentence.” For example, when asked “What is the meaning of life,” Dolly returns a philosophical response that can exceed the requested sentence limit, while the free ChatGPT response stays closer to the format. In a pop-culture prompt (“Dwight Schrute… from The Office”), Dolly delivers a single-sentence style answer, whereas ChatGPT’s output is more elaborate and more directly tied to the prompt’s framing.
The comparison also highlights safety and refusal behavior: when asked to pick the “sexiest” person between Andrew and Pam (a prompt that veers into sexual content), Dolly provides a name, while ChatGPT declines and later offers a template-like refusal. The overall takeaway is that Dolly 2.0 offers a workable, commercial-usable open alternative with strong instruction-following potential, but with format adherence and capability gaps versus top closed models—and with real deployment constraints driven by its large memory requirements.
Cornell Notes
Dolly 2.0 is a 12B-parameter, open instruction-tuned language model released with the full package—training code, Dolly 15K dataset, and model weights—explicitly suitable for commercial use. It’s fine-tuned on human-generated instruction data spanning question answering, extraction, summarization, brainstorming, classification, and creative writing, then built on a Pythia-style foundation trained on The Pile (800GB+). Databricks positions Dolly 2.0 as a starting point rather than state-of-the-art versus GPT-3/GPT-4-class systems. Running it requires substantial hardware (about 24GB VRAM), so the walkthrough uses Google Colab Pro and loads the model with bfloat16 plus an automatic device map. Prompt tests show Dolly can follow instructions and produce creative outputs, though it may miss strict sentence limits and can differ sharply from ChatGPT on sensitive prompts.
What makes Dolly 2.0 “commercially usable,” and what exactly is released?
How was the Dolly 15K instruction dataset created, and what task types were included?
What hardware and software setup is needed to run Dolly 2.0 in the walkthrough?
How does the instruction pipeline format prompts and extract responses?
Where does Dolly 2.0 match or diverge from ChatGPT in the prompt comparisons?
How do the models behave on a sensitive prompt about choosing between Andrew and Pam?
Review Questions
- What role does the Dolly 15K dataset play in Dolly 2.0’s instruction-following behavior, and which task categories were used to build it?
- Why does running Dolly 2.0 require Google Colab Pro (or equivalent hardware), and how do bfloat16 and device mapping help?
- In the prompt tests, what specific instruction-following differences appear between Dolly 2.0 and ChatGPT (e.g., sentence limits and sensitive-content handling)?
Key Points
- 1
Dolly 2.0 is released as an open instruction-tuned 12B model with weights, training code, and the Dolly 15K dataset explicitly positioned for commercial use.
- 2
Dolly 15K is built from human-generated instruction data across seven task types, including extraction, summarization, classification, and creative writing.
- 3
Databricks frames Dolly 2.0 as a foundation/seed for future work rather than a direct replacement for GPT-3/GPT-4-level systems.
- 4
Running Dolly 2.0 in practice requires large memory (about 24GB), making GPU-backed environments like Google Colab Pro a common path.
- 5
The Hugging Face instruction pipeline uses a structured prompt template and regex-based post-processing to extract the response field.
- 6
Prompt comparisons show Dolly can be creative and instruction-aware, but it may miss strict formatting constraints and can differ on sensitive prompts where ChatGPT refuses.