Build a Next.js App with Streaming Large Language Responses (Gemini Pro API Tutorial)
Based on AI Arcade's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Streaming requires the backend route handler to return a streaming text response, not a standard JSON response.
Briefing
Streaming large language model output in a Next.js app becomes practical once the backend returns a streaming response and the frontend renders tokens as they arrive. The build centers on a Gemini Pro API integration that streams text in real time, shows a “generation in progress” cue, and lets users stop generation mid-response—turning a slow, all-at-once reply into a responsive chat experience.
The project starts from a fresh Next.js setup and focuses first on UI structure: a prompt form with an input and a send button, plus a message list area. Tailwind CSS handles layout and styling, while Lucide React supplies icons (send, user, bot, and a spinner for loading). The form is wired so submitting via the button or pressing Enter triggers the chat request.
For streaming, the workflow splits into two parts. On the API side, a Next.js route handler is required to return a streaming text response rather than a normal JSON response. The implementation uses Vercel’s AI SDK (installed as npm i ai) to simplify the streaming mechanics. On the UI side, the same SDK provides React hooks—primarily use chat—which manages internal state for the conversation messages array and exposes handlers for input changes and submission. When the user submits the form, the hook automatically sends the full messages history as payload, while the tutorial also demonstrates adding extra fields (like the latest prompt) via the handle submit call.
A dedicated API route (app/api/…/route.ts) receives the request body, logs the incoming payload, extracts the prompt, and calls Gemini Pro through Google’s Generative AI library. The model is instantiated with a Google AI Studio API key stored in an .env file (kept out of GitHub via .gitignore). The backend uses generateContentStream to obtain a token stream from Gemini Pro, then wraps it into a readable stream compatible with the AI SDK’s streaming text response. Once wired, the frontend’s messages array fills incrementally: the assistant message grows token by token, confirming that streaming is working.
Rendering turns raw message objects into a chat UI. Messages are displayed in a flex column-reverse layout so the newest content appears at the top, with whitespace preserved for formatting. Each message gets a role-based style (user vs assistant), plus a left-side icon. While generation runs, the input disables and the send button swaps to a spinning loader; a stop handler halts the stream when clicked. To add personality without distraction, the bot icon bounces only for the most recently streaming message.
Finally, Gemini’s markdown output is made readable and interactive. A markdown rendering component uses marked to convert markdown to HTML, then sanitizes it with DOMPurify before injecting it via dangerouslySetInnerHTML to reduce XSS risk. Links produced by markdown become real <a> elements, and global CSS styles within a #chatbox scope make those links visually consistent and clickable. The result is a Next.js chat app that streams Gemini Pro responses, supports stop-in-progress, and renders markdown cleanly.
Cornell Notes
A Next.js chat app can stream Gemini Pro responses by combining a streaming-capable API route with a frontend hook that renders tokens as they arrive. The backend returns a streaming text response and uses Google’s generateContentStream to pull incremental output from the Gemini Pro model. The frontend uses Vercel’s AI SDK use chat to manage the messages array, bind the input, submit requests, and expose isLoading and stop handlers. While generation is active, the UI disables the input, shows a spinner, and lets users stop mid-response. To display Gemini’s markdown properly, the app converts markdown to HTML with marked and sanitizes it with DOMPurify before rendering it safely in React.
What two changes are required to stream LLM output in a Next.js chat app?
How does the frontend know what to send when the user submits a prompt?
Where does Gemini Pro streaming happen, and what makes it compatible with Next.js streaming responses?
How is “stop generation” implemented for an in-progress assistant reply?
Why is markdown rendering done with sanitization, and how are links made clickable?
Review Questions
- How does the messages array grow during streaming, and what role tags does it contain?
- What is the difference between returning a normal response and returning a streaming text response in a Next.js route handler?
- What security risk comes with dangerouslySetInnerHTML, and what package is used here to mitigate it?
Key Points
- 1
Streaming requires the backend route handler to return a streaming text response, not a standard JSON response.
- 2
Vercel’s AI SDK use chat manages conversation state (messages array) and provides handlers for input, submit, loading status, and stopping generation.
- 3
Gemini Pro streaming is implemented with generateContentStream and then adapted into a readable stream compatible with the AI SDK’s streaming response.
- 4
The UI disables input and swaps the send icon for a spinner while generation is in progress, and a stop handler lets users interrupt mid-response.
- 5
Chat messages are rendered with role-based styling and icons, with flex column-reverse so the newest content appears first.
- 6
Markdown output is converted to HTML with marked and sanitized with DOMPurify before rendering via dangerouslySetInnerHTML to keep links clickable and formatting correct.