Get AI summaries of any video or article — Sign up free
Analyzing Cryptocurrency Sentiment on Twitter with LangChain and ChatGPT | CryptoGPT thumbnail

Analyzing Cryptocurrency Sentiment on Twitter with LangChain and ChatGPT | CryptoGPT

Venelin Valkov·
4 min read

Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Tweets are grouped by date per Twitter handle before any sentiment scoring happens.

Briefing

CryptoGPT’s sentiment pipeline turns an author’s Twitter activity into daily sentiment scores by combining LangChain with ChatGPT and forcing structured JSON output. The core move is to group a single account’s tweets by date, enrich each tweet with engagement signals (like view counts), and then ask ChatGPT to return an aggregate bullish/bearish score on a 0–100 scale for each date—without any extra commentary.

The workflow starts after tweets have already been collected and cleaned. For sentiment analysis, the system takes one Twitter handle at a time and groups that author’s tweets by their posted date. For a chosen date range, it passes ChatGPT a list of tweets in a custom text format: each tweet entry includes the tweet’s view count and the cleaned tweet text. The design also caps the number of tweets sent to ChatGPT—if an author has more than 100 tweets, the pipeline samples 100 to keep prompts within practical limits.

On the LangChain side, the implementation uses a ChatOpenAI model configured with temperature set to 0 to reduce randomness and improve repeatability. A PromptTemplate is built with two key inputs: the Twitter handle and the formatted tweet list. The chain is created with LMChain, then executed by feeding in the handle and the generated tweet text. The output is then parsed as JSON, relying on the prompt to keep responses machine-readable.

The prompt itself is tailored to steer ChatGPT toward consistent scoring. It frames the model as an experienced crypto trader who pays attention to historical predictions made by that specific Twitter account, then weighs those patterns against the new tweets provided for each date. The scoring instruction is explicit: return only a JSON object where each record contains a date and a sentiment value from 0 to 100, with the mapping defined as bearish at the low end and bullish at the high end. It also includes a “no explanations” constraint—“return just the roll Json response… do not explain”—to prevent the model from adding prose that would break downstream parsing.

A helper function prepares the tweet list for the prompt by iterating over tweets grouped by date and concatenating entries in the format “views - tweet text.” The pipeline then runs the sentiment analyzer for each handle stored in session state, storing results as a dictionary keyed by author and date.

In testing, the system produces a JSON-like response with multiple dates and sentiment scores for accounts such as Michael Saylor (shown as “michael sa” in the example) and Elon Musk (“L musk”). The results demonstrate that ChatGPT can follow the structured-output requirement and generate daily aggregate sentiment values from engagement-weighted tweet text. The next step, teased for a follow-up, is visualization—turning the returned sentiment dictionary into a DataFrame and chart using Streamlit.

Cornell Notes

The sentiment analyzer builds daily crypto sentiment scores for a single Twitter account by combining LangChain with ChatGPT. Tweets are cleaned, grouped by date, and converted into a prompt-friendly text list that includes each tweet’s view count plus the tweet text. A PromptTemplate instructs ChatGPT to act like an experienced crypto trader, consider the account’s historical predictions, and score each date’s tweets on a 0–100 scale (bearish to bullish). The system forces structured output by requiring ChatGPT to return only JSON with date and sentiment fields, and it uses temperature=0 to keep results more deterministic. This makes the output easy to store, parse, and later visualize.

How does the pipeline convert raw tweets into something ChatGPT can score per day?

It groups tweets by their posted date for one Twitter handle, then builds a single text block where each date maps to multiple tweet entries. Each entry includes engagement context—specifically the tweet’s view count—followed by a dash and the cleaned tweet text. If an author has more than 100 tweets, the pipeline samples 100 to avoid overly large prompts.

What does the prompt require ChatGPT to output, and why is that important?

The prompt demands a JSON-only response: each record must contain a date and an aggregate sentiment score on a 0–100 scale. It explicitly says not to explain or add prose. This matters because the code expects machine-readable JSON so it can parse results into a dictionary and feed later steps like visualization.

Why set temperature to 0 when calling ChatGPT through LangChain?

Temperature controls randomness in the model’s output. Setting temperature=0 aims to make outputs more deterministic, which improves consistency when repeatedly scoring the same account’s tweets across dates.

What role do tweet view counts play in the sentiment scoring?

View counts are included alongside tweet text in the prompt. The intent is to let ChatGPT treat more widely seen tweets as more influential when aggregating sentiment for a given date, effectively weighting the signal by engagement.

How does LangChain fit into the implementation?

LangChain constructs a ChatOpenAI model (GPT 3.5-turbo) and wraps it with a PromptTemplate. An LMChain then ties the model and prompt together so the code can pass in the Twitter handle and the formatted tweet list, execute the chain, and capture the JSON response.

What is the sentiment scale used for, and how is it interpreted?

Scores range from 0 to 100. The low end corresponds to bearish sentiment, while the high end corresponds to bullish sentiment. The output is aggregated per date, producing a time series of sentiment values for each author.

Review Questions

  1. What exact inputs does the PromptTemplate use, and how is the tweet list text constructed for each date?
  2. How does the code ensure ChatGPT returns data that can be parsed reliably downstream?
  3. Why might sampling to 100 tweets per author change the sentiment time series?

Key Points

  1. 1

    Tweets are grouped by date per Twitter handle before any sentiment scoring happens.

  2. 2

    Each tweet entry sent to ChatGPT includes both view count and cleaned tweet text to add engagement context.

  3. 3

    LangChain uses ChatOpenAI with temperature=0 and GPT 3.5-turbo to reduce output randomness.

  4. 4

    A PromptTemplate feeds ChatGPT the Twitter handle plus a formatted, date-grouped tweet list.

  5. 5

    The prompt forces JSON-only output with fields for date and a 0–100 sentiment score (bearish to bullish).

  6. 6

    To keep prompts manageable, the pipeline samples up to 100 tweets per author when necessary.

  7. 7

    Results are stored as a dictionary keyed by author and date, enabling later DataFrame conversion and visualization.

Highlights

Daily sentiment scores come from feeding ChatGPT a date-grouped list of tweets enriched with view counts, then requiring a JSON-only response.
Temperature=0 is used to make sentiment outputs more consistent across repeated runs.
The prompt explicitly instructs “do not explain” to prevent extra text that would break JSON parsing.

Topics

Mentioned

  • JSON