Build a Next.js App with Streaming Large Language Responses (Gemini Pro API Tutorial)

TL;DR

Streaming requires the backend route handler to return a streaming text response, not a standard JSON response.

Briefing Cornell Notes

Briefing

Streaming large language model output in a Next.js app becomes practical once the backend returns a streaming response and the frontend renders tokens as they arrive. The build centers on a Gemini Pro API integration that streams text in real time, shows a “generation in progress” cue, and lets users stop generation mid-response—turning a slow, all-at-once reply into a responsive chat experience.

The project starts from a fresh Next.js setup and focuses first on UI structure: a prompt form with an input and a send button, plus a message list area. Tailwind CSS handles layout and styling, while Lucide React supplies icons (send, user, bot, and a spinner for loading). The form is wired so submitting via the button or pressing Enter triggers the chat request.

For streaming, the workflow splits into two parts. On the API side, a Next.js route handler is required to return a streaming text response rather than a normal JSON response. The implementation uses Vercel’s AI SDK (installed as npm i ai) to simplify the streaming mechanics. On the UI side, the same SDK provides React hooks—primarily use chat—which manages internal state for the conversation messages array and exposes handlers for input changes and submission. When the user submits the form, the hook automatically sends the full messages history as payload, while the tutorial also demonstrates adding extra fields (like the latest prompt) via the handle submit call.

A dedicated API route (app/api/…/route.ts) receives the request body, logs the incoming payload, extracts the prompt, and calls Gemini Pro through Google’s Generative AI library. The model is instantiated with a Google AI Studio API key stored in an .env file (kept out of GitHub via .gitignore). The backend uses generateContentStream to obtain a token stream from Gemini Pro, then wraps it into a readable stream compatible with the AI SDK’s streaming text response. Once wired, the frontend’s messages array fills incrementally: the assistant message grows token by token, confirming that streaming is working.

Rendering turns raw message objects into a chat UI. Messages are displayed in a flex column-reverse layout so the newest content appears at the top, with whitespace preserved for formatting. Each message gets a role-based style (user vs assistant), plus a left-side icon. While generation runs, the input disables and the send button swaps to a spinning loader; a stop handler halts the stream when clicked. To add personality without distraction, the bot icon bounces only for the most recently streaming message.

Finally, Gemini’s markdown output is made readable and interactive. A markdown rendering component uses marked to convert markdown to HTML, then sanitizes it with DOMPurify before injecting it via dangerouslySetInnerHTML to reduce XSS risk. Links produced by markdown become real <a> elements, and global CSS styles within a #chatbox scope make those links visually consistent and clickable. The result is a Next.js chat app that streams Gemini Pro responses, supports stop-in-progress, and renders markdown cleanly.

Cornell Notes

A Next.js chat app can stream Gemini Pro responses by combining a streaming-capable API route with a frontend hook that renders tokens as they arrive. The backend returns a streaming text response and uses Google’s generateContentStream to pull incremental output from the Gemini Pro model. The frontend uses Vercel’s AI SDK use chat to manage the messages array, bind the input, submit requests, and expose isLoading and stop handlers. While generation is active, the UI disables the input, shows a spinner, and lets users stop mid-response. To display Gemini’s markdown properly, the app converts markdown to HTML with marked and sanitizes it with DOMPurify before rendering it safely in React.

What two changes are required to stream LLM output in a Next.js chat app?

First, the API route handler must return a streaming text response rather than a normal JSON response. Second, the UI must render incremental updates as tokens arrive—here done with Vercel’s AI SDK use chat hook, which manages the messages array and updates the assistant message content continuously during generation.

How does the frontend know what to send when the user submits a prompt?

use chat maintains a messages array containing prior turns with role tags (user and assistant). When handle submit runs, it sends that messages history by default as the payload to the configured API endpoint (e.g., API route named api/sl gen aai). The tutorial also demonstrates adding extra payload fields (like prompt: input) by calling handle submit with an additional data object.

Where does Gemini Pro streaming happen, and what makes it compatible with Next.js streaming responses?

The API route extracts the prompt from request.json(), creates a GoogleGenerativeAI instance using an API key from .env, selects the model with model: 'gemini-pro', and calls generateContentStream(prompt). Because the streaming text response expects a readable stream, the tutorial wraps Gemini’s stream using a helper (Google generative AI stream) from the AI SDK before returning it as a streaming text response.

How is “stop generation” implemented for an in-progress assistant reply?

The UI reads isLoading from use chat to show a spinner and disable the input while generation is active. It also uses a stop handler exposed by the hook; wiring stop to the loader button’s onClick lets the user halt the stream mid-sentence.

Why is markdown rendering done with sanitization, and how are links made clickable?

Gemini output includes markdown syntax (bold, headings, italics, links). The app converts markdown to HTML using marked, then sanitizes the HTML with DOMPurify to reduce XSS risk before injecting it via dangerouslySetInnerHTML. The resulting HTML includes real <a> tags, so links become clickable; CSS scoped to #chatbox styles them for visibility.

Review Questions

How does the messages array grow during streaming, and what role tags does it contain?
What is the difference between returning a normal response and returning a streaming text response in a Next.js route handler?
What security risk comes with dangerouslySetInnerHTML, and what package is used here to mitigate it?

Key Points

1
Streaming requires the backend route handler to return a streaming text response, not a standard JSON response.
2
Vercel’s AI SDK use chat manages conversation state (messages array) and provides handlers for input, submit, loading status, and stopping generation.
3
Gemini Pro streaming is implemented with generateContentStream and then adapted into a readable stream compatible with the AI SDK’s streaming response.
4
The UI disables input and swaps the send icon for a spinner while generation is in progress, and a stop handler lets users interrupt mid-response.
5
Chat messages are rendered with role-based styling and icons, with flex column-reverse so the newest content appears first.
6
Markdown output is converted to HTML with marked and sanitized with DOMPurify before rendering via dangerouslySetInnerHTML to keep links clickable and formatting correct.

Highlights

The assistant response grows token-by-token because the API returns a streaming text response backed by Gemini’s generateContentStream.

A single stop handler from use chat can interrupt generation immediately, even in the middle of a sentence.

Markdown becomes safe and usable by converting with marked and sanitizing with DOMPurify before injecting HTML in React.

The messages array automatically accumulates multi-turn context with role tags, and the UI renders it incrementally as streaming updates arrive.

Topics

Next.js Streaming
Gemini Pro API
Vercel AI SDK
Markdown Rendering
React Chat UI