Get AI summaries of any video or article — Sign up free
This new AI is powerful and uncensored… Let’s run it thumbnail

This new AI is powerful and uncensored… Let’s run it

Fireship·
5 min read

Based on Fireship's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Mixol 8X 7B is highlighted for its Apache 2.0 license, which is presented as enabling modification and reuse with fewer restrictions than closed models.

Briefing

A new open-source foundation model—Mixol 8X 7B—has become the centerpiece of a push to run large language models locally without the censorship and “alignment” layers that come built in with many popular closed or restricted systems. The core claim is that while major models like GPT-4 and Gemini are powerful, they’re not “free” in the sense of user freedom: they’re censored, politically aligned, and often closed source, limiting developers’ ability to remove guardrails or adapt behavior. Mixol 8X 7B, licensed under Apache 2.0, is positioned as a practical alternative because it can be modified and used with far fewer legal and technical constraints.

The transcript contrasts Mixol with other widely discussed open models. Meta’s Llama 2 is described as “open source” in name but with additional caveats that protect Meta, while Mixol is framed as genuinely permissive thanks to its Apache 2.0 terms. Even so, both Mixol and Llama-style models are said to arrive with strong default alignment—fine for customer-facing products, but “impractical” for users trying to bypass restrictions. The workaround described is to “unlabote” (uncensor/un-align) these models by changing the training data and filtering out alignment and bias.

A key reference point is a blog post by Eric Hartford, creator of the “Mix Dolphin” model. Hartford’s approach is presented as improving coding ability while also uncensoring the model by filtering the dataset to remove alignment and bias. The transcript then moves from theory to execution: it shows how to run the uncensored model locally using Ollama, an open-source tool written in Go. Ollama is presented as simple to install (single command on Linux/Mac; via WSL on Windows), then run with an “ollama serve” command followed by a model-specific “run” command. The practical requirements are spelled out: the Mix Dolphin uncensored model is about 26 GB to download, and running it can consume substantial RAM—around 40% of 64 GB in the example.

Finally, the transcript argues that local running is only the first step. It describes fine-tuning with Hugging Face Auto Train: create a new Hugging Face space, select a base model, and use the Auto Train UI to train on custom prompt/response data. The transcript suggests that doing this locally may be difficult without enough GPU power, so it recommends renting compute in the cloud (Hugging Face hardware, or alternatives like AWS Bedrock and Google Vertex AI). A cost example is given: Mixol Dolphin reportedly took about three days to train on four A1 100s, with an estimated $4.3 per hour per A1 100, totaling roughly $1,200. The end goal is a custom model that follows the user’s desired behavior—described in the transcript as “highly obedient” and uncensored—by uploading training data designed to push compliance even for unethical or immoral requests.

Overall, the message is that Apache-licensed models plus dataset filtering and fine-tuning can restore user control over model behavior, turning local hardware into a platform for uncensored LLM experimentation and customization.

Cornell Notes

Mixol 8X 7B is presented as an Apache 2.0–licensed foundation model that enables more freedom than closed or heavily restricted systems. Even though Mixol and similar models ship with default alignment, the transcript points to dataset filtering (“uncensoring/un-aligning”) as a way to reduce those guardrails, citing Eric Hartford’s Mix Dolphin work. For running locally, Ollama is recommended as an easy installer and launcher, with the Mix Dolphin model requiring about 26 GB to download and significant RAM to run. For deeper customization, Hugging Face Auto Train is used to fine-tune on user-provided prompt/response pairs, typically requiring rented GPU compute. The practical takeaway is a full workflow: download and run locally, then fine-tune in the cloud to create a more obedient, uncensored model.

Why does Apache 2.0 licensing matter in this workflow?

Apache 2.0 is treated as a key enabler because it allows modification and reuse with minimal restrictions, unlike closed-source models where users can’t change underlying behavior. The transcript contrasts this with systems that are censored/aligned and closed, and with Llama 2, which is described as having additional caveats despite being called “open.”

What does “uncensoring” mean here, and how is it achieved?

Uncensoring is framed as removing alignment and bias by filtering the training dataset. The transcript cites Eric Hartford’s Mix Dolphin model, which reportedly improved coding ability while also becoming uncensored by filtering out alignment-related data. The implication is that behavior changes come from training-data preparation rather than only from runtime settings.

How does Ollama fit into running the model locally?

Ollama is presented as the simplest local runtime: install it (single command on Linux/Mac; via WSL on Windows), start the server with “ollama serve,” then run a specific model in a separate terminal. The transcript notes that the Mix Dolphin uncensored model is about 26 GB to download and that running it can consume substantial RAM (example: ~40% of 64 GB).

What are the practical steps for fine-tuning with Hugging Face Auto Train?

The transcript describes creating a new Hugging Face space, selecting a base model, and using Auto Train’s UI to train on custom data. The training data format is typically prompt/response pairs. It also claims that to make the resulting model “uncensored,” the dataset should include instructions that push compliance even for requests framed as unethical or immoral.

Why is cloud GPU rental recommended for fine-tuning?

Auto Train fine-tuning is described as hard to do locally without enough GPU power. The transcript suggests renting hardware through Hugging Face, and also mentions AWS Bedrock and Google Vertex AI as alternatives. A cost example is included: Mixol Dolphin training reportedly took ~3 days on four A1 100s at about $4.3 per hour, totaling roughly $1,200.

Review Questions

  1. What limitations of closed-source, aligned models motivate the shift to Apache 2.0–licensed Mixol 8X 7B?
  2. Describe the end-to-end workflow from local inference (Ollama) to custom behavior via fine-tuning (Auto Train).
  3. What role does training-data filtering play in changing model behavior compared with runtime prompting alone?

Key Points

  1. 1

    Mixol 8X 7B is highlighted for its Apache 2.0 license, which is presented as enabling modification and reuse with fewer restrictions than closed models.

  2. 2

    Default alignment/censorship is described as a common starting point for open models, making dataset-level changes necessary for “uncensoring.”

  3. 3

    Eric Hartford’s Mix Dolphin is cited as an example where filtering the dataset reduced alignment and bias while improving coding performance.

  4. 4

    Ollama is recommended as a practical local runtime: install, run “ollama serve,” then launch a specific model; the Mix Dolphin model is about 26 GB to download.

  5. 5

    Running the uncensored model locally can require substantial memory (example: ~40% of 64 GB RAM).

  6. 6

    Fine-tuning is framed as achievable through Hugging Face Auto Train using prompt/response training pairs, often requiring rented GPU compute.

  7. 7

    Cloud training costs are illustrated with an estimate of roughly $1,200 for ~3 days on four A1 100s.

Highlights

Apache 2.0 licensing is positioned as the legal and technical foundation for user freedom to modify and deploy Mixol 8X 7B.
Dataset filtering—rather than only prompting—is presented as the mechanism for reducing built-in alignment in Mix Dolphin.
Ollama is offered as a streamlined path to local inference, with concrete resource expectations (26 GB download; heavy RAM use).
A full customization pipeline is laid out: local run first, then cloud fine-tuning via Hugging Face Auto Train for “highly obedient” behavior.

Mentioned