$5 MILLION AI for FREE
Based on sentdex's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
BLOOM up to 176B parameters is available for free download and free hosted inference, lowering the barrier to frontier-scale language model experimentation.
Briefing
A 176-billion-parameter large language model called BLOOM is now available for free download and free hosted inference, putting a “multi-million-dollar” style AI within reach of anyone with a laptop—or at least an internet connection. The model, built by a large international effort and trained on a nuclear-powered supercomputer, is designed to handle multilingual text and programming tasks, and it’s accessible through Hugging Face both as downloadable weights and as an API that runs on A100 GPUs.
The practical barrier used to be cost and compute. Training a model of this scale would typically require hundreds of NVIDIA A100 GPUs and months of work, along with a specialized research team. BLOOM’s release flips that equation: multiple size variants are available up to 176B parameters, and the largest model is described as requiring roughly 680GB of memory in full precision or about 350GB at half precision. For people without that hardware, smaller BLOOM checkpoints—down to 350M, 1B, 2B, and 6B parameters—can cover many real-world tasks.
Hugging Face also hosts inference for free, using A100s to deliver much faster responses than running the full model locally, though a queue may apply. That combination—downloadable models plus hosted API access—turns a previously closed ecosystem into something developers can experiment with immediately, whether they’re building tools, testing prompts, or studying how large language models behave.
Under the hood, BLOOM functions as a next-token generator rather than a “true” chatbot. It’s trained on 1.6TB of text spanning 46 natural languages and 13 programming languages, so it learns patterns of language continuation. That matters because prompt engineering becomes the steering wheel: to get chatbot-like behavior, prompts must be structured like a dialogue transcript (e.g., “Person:” and “Bot:” lines), since the model will otherwise just continue whatever text style it sees.
The transcript also highlights how subtle prompt details can change outcomes. A straightforward prompt about coding in OpenCV may yield narrative text that resembles a developer’s musings, but adding cues that match code formatting can elicit actual commented code. Similarly, “argumentative chatbot” behavior may fail without examples, then improve when the prompt includes a sample response. The same next-token mechanism can be harnessed for multi-step tasks by embedding structured question lists, enabling the model to summarize and categorize a long product review in one pass.
Beyond chat and coding, the model can support tasks like error diagnosis (asking what an error means and how to fix it) and even cross-language dialogue, where one speaker responds in Spanish while the other speaks English—an outcome presented as surprisingly coherent for an abstract, translation-like interaction.
Overall, BLOOM’s release reframes what “access” to frontier-scale language models means: not just paying for API calls, but downloading weights, running smaller variants, and learning how to shape model behavior through structured prompts and generation settings like temperature and top-p. The result is a new playground for developers and non-developers alike, with the expectation that more applications will emerge as these models become genuinely open and widely testable.
Cornell Notes
BLOOM is a free, open-access large language model with up to 176 billion parameters, made available for download and for hosted inference via Hugging Face. Its scale used to imply multi-million-dollar training costs and massive compute, but the release lowers the barrier to experimentation through smaller checkpoints and a free A100-backed API. BLOOM works primarily as a next-token text generator, so “chatbot” behavior depends heavily on prompt structure—dialogue formatting, examples, and task-specific scaffolding. With the right prompts, it can produce code-like outputs, summarize and categorize reviews, diagnose errors, and even sustain cross-language conversations. This matters because it turns prompt engineering into a practical interface for extracting reliable behavior from a general language model.
Why does BLOOM’s “chatbot” performance depend on prompt structure rather than built-in conversation skills?
What compute and memory requirements are described for running the largest BLOOM model locally?
How does Hugging Face’s hosted inference change the practical access problem?
How can BLOOM be steered to produce code instead of narrative text about code?
What’s the role of temperature and top-p in controlling outputs?
How can BLOOM perform multi-step tasks like review summarization in one go?
Review Questions
- What evidence in the transcript supports the claim that BLOOM is primarily a next-token generator rather than a native chatbot?
- How do prompt formatting choices (speaker labels, examples, code markers) change BLOOM’s behavior?
- What tradeoffs are implied between running the 176B model locally versus using Hugging Face’s hosted API?
Key Points
- 1
BLOOM up to 176B parameters is available for free download and free hosted inference, lowering the barrier to frontier-scale language model experimentation.
- 2
Running the full 176B model locally is memory-intensive (about 680GB full precision or ~350GB half precision), so smaller variants often make more sense.
- 3
Hugging Face’s free A100-backed API can deliver much faster inference than local runs, though users may face a queue.
- 4
BLOOM’s “chatbot” feel comes from prompt engineering: dialogue-style formatting and stopping logic are needed because the model continues text rather than managing conversation state.
- 5
Structured prompts (like numbered question lists) can elicit multi-step behaviors such as summarizing and categorizing reviews in one generation.
- 6
Generation controls like temperature and top-p affect creativity and token selection, changing output stability versus variety.
- 7
BLOOM’s training data spans many natural languages and programming languages, which influences how it responds to prompts that resemble code or community Q&A styles.