Fine-tuning Alpaca: Train Alpaca LoRa for Sentiment Analysis on a Custom Dataset

TL;DR

Convert tweet sentiment CSV data into Alpaca LoRA JSON with fields for instruction, input (tweet text), and output (positive/neutral/negative).

Briefing Cornell Notes

Briefing

Fine-tuning Llama 7B with LoRA on a custom Bitcoin-tweet sentiment dataset can produce a practical sentiment classifier that labels new tweets as positive, neutral, or negative. The workflow hinges on converting Kaggle tweet data into Alpaca-style instruction records, then training only a small fraction of parameters via low-rank adapters—making the process feasible on a single GPU using 8-bit model loading.

The dataset starts from “BTC tweets sentiment” on Kaggle, with roughly 50k scraped tweets and sentiment labels. Before training, the pipeline removes retweeted tweets and drops any tweets containing links to reduce noisy or non-original text. Sentiment labels are then normalized into three classes: the script maps numeric sentiment scores to strings (“positive” when the score is above 1, “negative” when below 0, otherwise “neutral”). Each training example is reformatted into the Alpaca LoRA schema: a constant instruction (“detect the sentiment of the tweet”), the tweet text as the input, and the sentiment class as the output. The resulting JSON file becomes the direct training input for the fine-tuning step.

On the modeling side, the base is Llama 7B loaded from Hugging Face, using 8-bit weights to save memory and speed up training. The tokenizer is initialized from the same base model, with padding configured by setting the pad token ID to 0 and using left padding. The training dataset is loaded from the JSON and wrapped with the Alpaca prompt template (“Write a response that appropriately completes the request…”), then tokenized with a hard cutoff length of 256 tokens. Labels are constructed so the model learns to generate the correct response portion of each prompt.

Instead of updating all model weights, the process uses PEFT’s LoRA (low-rank adaptation). LoRA is configured to target the query and value projection layers, with rank r=8, alpha scaling, and a 5% dropout, and it’s applied for causal language modeling. Training updates about 0.06% of parameters—small enough to be efficient while still steering the model toward the sentiment task. The run uses micro-batch size 4, mixed precision (float16), Adam with weight decay, and trains for 300 steps with evaluation and checkpointing every 50 steps. After training, the fine-tuned model is saved and published to Hugging Face under the name “Alpaca Bitcoin Tweets Sentiment.”

To validate performance, the workflow switches from training to inference: it clones the Alpaca LoRA repo at a fixed commit, then runs generate.py with the base Llama model plus the custom LoRA weights. A Gradio interface lets users paste tweets and receive sentiment predictions. Example inputs include bullish phrasing (“A project with great prospects and opportunities”) yielding “positive,” neutral market commentary (“Get ready to take short positions”) yielding “neutral,” and bearish signals (“If you think the run of BTC is over…”) yielding “negative.” The end result is a reproducible recipe for turning a general LLM into a domain-specific sentiment tool using instruction tuning plus LoRA.

Cornell Notes

The project fine-tunes Llama 7B for Bitcoin tweet sentiment by converting Kaggle-labeled tweets into Alpaca-style instruction data. Tweets are cleaned by removing retweets and link-containing posts, then sentiment scores are mapped into three labels: positive, neutral, and negative. Training loads the base model in 8-bit and uses PEFT LoRA to update only a small slice of parameters (about 0.06%), targeting the query and value projections for causal language modeling. Data is wrapped in the Alpaca prompt template, tokenized with a 256-token cutoff, and trained for 300 steps with periodic evaluation. A Gradio-based generate.py script then uses the base model plus the LoRA weights to classify new tweets.

How does the pipeline turn raw tweet sentiment data into something Alpaca LoRA can train on?

It converts the Kaggle CSV into a JSON where each record has three fields: an instruction (constant: “detect the sentiment of the tweet”), an input (the tweet text), and an output (the sentiment label). Numeric sentiment scores are mapped to strings via a helper: scores above 1 become “positive,” below 0 become “negative,” and everything else becomes “neutral.” The script also filters out retweeted tweets and any tweet containing links before building the JSON.

Why load Llama 7B in 8-bit, and what does that enable in practice?

8-bit loading reduces GPU memory usage and can speed up training. The tokenizer and model are initialized from the Hugging Face base model, then the model is loaded with 8-bit tensors (with float16 used for mixed precision). This makes it possible to fine-tune a large model on a single GPU rather than requiring full-precision weights and large multi-GPU setups.

What role does LoRA play, and which parts of the model get trained?

LoRA (low-rank adaptation) adds trainable low-rank matrices to selected layers while freezing most original weights. In this setup, the LoRA config targets the query and value projection modules, uses rank r=8, alpha scaling, and 5% dropout, and is configured for causal language modeling. The result is that only about 0.06% of parameters are trainable, even though the base model is ~7B parameters.

How is the prompt and label construction handled during training?

Each example is embedded into the Alpaca prompt template: an instruction plus an input, followed by “Write a response that appropriately completes the request.” The code then tokenizes the full prompt and applies labels so the loss focuses on the response tokens. Tokenization uses truncation with a cutoff length of 256 tokens, with no padding during tokenization.

What training settings determine how the model learns and how progress is monitored?

Training uses micro-batch size 4, mixed precision (float16), Adam with weight decay, and runs for 300 steps. It evaluates and saves checkpoints every 50 steps, logs every 10 steps, and loads the best checkpoint at the end. The dataset is split into a training set and a small validation set (200 examples) with shuffling and seed 42.

How does inference work after fine-tuning?

Inference clones the Alpaca LoRA repo at a specific commit and runs generate.py with the base model and the saved LoRA weights. A Gradio interface is launched to accept a tweet and an instruction (“detect the sentiment of the tweet”), then the model returns a sentiment label. The examples demonstrate positive, neutral, and negative outputs on both hand-picked and live tweets.

Review Questions

What preprocessing steps are applied to the Kaggle tweets before converting them into Alpaca-style JSON records?
Which model components are targeted by LoRA in this setup, and roughly what fraction of parameters becomes trainable?
How do prompt formatting and the 256-token cutoff affect what the model is trained to generate?

Key Points

1
Convert tweet sentiment CSV data into Alpaca LoRA JSON with fields for instruction, input (tweet text), and output (positive/neutral/negative).
2
Filter training data by removing retweets and tweets containing links to reduce noise in sentiment learning.
3
Load Llama 7B in 8-bit and use float16 mixed precision to make fine-tuning practical on a single GPU.
4
Use PEFT LoRA to train only low-rank adapters, targeting query and value projection layers for causal language modeling.
5
Wrap each example in the Alpaca prompt template and tokenize with truncation at a 256-token cutoff to control context length.
6
Train for a limited number of steps (300) with periodic evaluation/checkpointing, then save and reuse the LoRA weights for inference.
7
Run generate.py with the base model plus the custom LoRA weights and serve predictions through a Gradio interface for real tweet inputs.

Highlights

The sentiment task is implemented as instruction tuning: every training record pairs a fixed instruction with a tweet and a three-class sentiment label.

LoRA makes the fine-tuning efficient by updating only ~0.06% of parameters while leaving the rest of Llama 7B frozen.

8-bit model loading plus float16 mixed precision is used to fit and train a 7B model on a single GPU.

Inference is validated through generate.py + Gradio, producing positive/neutral/negative predictions on new tweets.

The training pipeline relies on Alpaca prompt formatting and a strict 256-token truncation to shape what the model learns to output.

Topics

LoRA Fine-Tuning
Llama 7B
Bitcoin Sentiment
Instruction Data
Gradio Inference

Mentioned

Venelin Valkov
LoRA
PEFT
CSV
JSON
GPU
CUDA
Adam
PEFT
8-bit
float16
LLM