Analyzing Cryptocurrency Sentiment on Twitter with LangChain and ChatGPT | CryptoGPT
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Tweets are grouped by date per Twitter handle before any sentiment scoring happens.
Briefing
CryptoGPT’s sentiment pipeline turns an author’s Twitter activity into daily sentiment scores by combining LangChain with ChatGPT and forcing structured JSON output. The core move is to group a single account’s tweets by date, enrich each tweet with engagement signals (like view counts), and then ask ChatGPT to return an aggregate bullish/bearish score on a 0–100 scale for each date—without any extra commentary.
The workflow starts after tweets have already been collected and cleaned. For sentiment analysis, the system takes one Twitter handle at a time and groups that author’s tweets by their posted date. For a chosen date range, it passes ChatGPT a list of tweets in a custom text format: each tweet entry includes the tweet’s view count and the cleaned tweet text. The design also caps the number of tweets sent to ChatGPT—if an author has more than 100 tweets, the pipeline samples 100 to keep prompts within practical limits.
On the LangChain side, the implementation uses a ChatOpenAI model configured with temperature set to 0 to reduce randomness and improve repeatability. A PromptTemplate is built with two key inputs: the Twitter handle and the formatted tweet list. The chain is created with LMChain, then executed by feeding in the handle and the generated tweet text. The output is then parsed as JSON, relying on the prompt to keep responses machine-readable.
The prompt itself is tailored to steer ChatGPT toward consistent scoring. It frames the model as an experienced crypto trader who pays attention to historical predictions made by that specific Twitter account, then weighs those patterns against the new tweets provided for each date. The scoring instruction is explicit: return only a JSON object where each record contains a date and a sentiment value from 0 to 100, with the mapping defined as bearish at the low end and bullish at the high end. It also includes a “no explanations” constraint—“return just the roll Json response… do not explain”—to prevent the model from adding prose that would break downstream parsing.
A helper function prepares the tweet list for the prompt by iterating over tweets grouped by date and concatenating entries in the format “views - tweet text.” The pipeline then runs the sentiment analyzer for each handle stored in session state, storing results as a dictionary keyed by author and date.
In testing, the system produces a JSON-like response with multiple dates and sentiment scores for accounts such as Michael Saylor (shown as “michael sa” in the example) and Elon Musk (“L musk”). The results demonstrate that ChatGPT can follow the structured-output requirement and generate daily aggregate sentiment values from engagement-weighted tweet text. The next step, teased for a follow-up, is visualization—turning the returned sentiment dictionary into a DataFrame and chart using Streamlit.
Cornell Notes
The sentiment analyzer builds daily crypto sentiment scores for a single Twitter account by combining LangChain with ChatGPT. Tweets are cleaned, grouped by date, and converted into a prompt-friendly text list that includes each tweet’s view count plus the tweet text. A PromptTemplate instructs ChatGPT to act like an experienced crypto trader, consider the account’s historical predictions, and score each date’s tweets on a 0–100 scale (bearish to bullish). The system forces structured output by requiring ChatGPT to return only JSON with date and sentiment fields, and it uses temperature=0 to keep results more deterministic. This makes the output easy to store, parse, and later visualize.
How does the pipeline convert raw tweets into something ChatGPT can score per day?
What does the prompt require ChatGPT to output, and why is that important?
Why set temperature to 0 when calling ChatGPT through LangChain?
What role do tweet view counts play in the sentiment scoring?
How does LangChain fit into the implementation?
What is the sentiment scale used for, and how is it interpreted?
Review Questions
- What exact inputs does the PromptTemplate use, and how is the tweet list text constructed for each date?
- How does the code ensure ChatGPT returns data that can be parsed reliably downstream?
- Why might sampling to 100 tweets per author change the sentiment time series?
Key Points
- 1
Tweets are grouped by date per Twitter handle before any sentiment scoring happens.
- 2
Each tweet entry sent to ChatGPT includes both view count and cleaned tweet text to add engagement context.
- 3
LangChain uses ChatOpenAI with temperature=0 and GPT 3.5-turbo to reduce output randomness.
- 4
A PromptTemplate feeds ChatGPT the Twitter handle plus a formatted, date-grouped tweet list.
- 5
The prompt forces JSON-only output with fields for date and a 0–100 sentiment score (bearish to bullish).
- 6
To keep prompts manageable, the pipeline samples up to 100 tweets per author when necessary.
- 7
Results are stored as a dictionary keyed by author and date, enabling later DataFrame conversion and visualization.