GPT-4 First Impression - A New Era Begins?

TL;DR

GPT-4’s announced context length is 8,192 tokens, with an additional 32k context mode framed as about 50 pages of text.

Briefing Cornell Notes

Briefing

OpenAI’s GPT-4 arrives with a major jump in both capacity and capability—especially longer context and multimodal (text + image) understanding—while also emphasizing months of safety work to keep outputs aligned with how people want to use it. Early access through ChatGPT Plus lets users try GPT-4 directly, with a message cap and model options that trade speed for deeper reasoning.

A headline feature is GPT-4’s ability to handle far more text at once. The announced context length is 8,192 tokens—described as roughly twice the size of GPT-3.5-era offerings like text-davinci-003. OpenAI also mentions “unlimited access” to a 32k token context size, framed as about 50 pages of text. That scale matters because it changes what kinds of tasks fit in a single run: longer story drafts, bigger codebases, and more extensive document analysis without constantly trimming or summarizing.

In hands-on tests, GPT-4 performs well on critique-and-rewrite workflows. Using an “AI critic” style prompt, it reviews a short story about “the last dragon rider” in a world where dragons are hunted to extinction. The critique highlights concrete weaknesses—predictability, lack of character development, pacing problems, overused clichés, and inconsistencies in the protagonist’s thought process. When asked to rewrite the story based on that feedback, the output shifts into more elevated, descriptive prose, with imagery and mood-setting that feels closer to a polished literary style than the original.

GPT-4 also handles absurd, constraint-heavy instructions with structured, step-by-step planning. A prompt about moving snow from Norway to the Sahara is met with a logistics plan: obtain permits, plan routes and infrastructure, use refrigerated transport (trucks, trains, planes), hire logistics support, prepare insulated containers, monitor transit, deposit and distribute the snow at a chosen desert site, and then document and publicize the effort. The plan explicitly flags environmental, social, and economic impacts—turning a joke premise into a checklist-like response.

Beyond text, GPT-4’s multimodal ability is presented as a practical expansion of what users can ask. It can accept images alongside prompts, enabling tasks like describing a multi-panel image “panel by panel,” interpreting diagrams or graphs to answer questions, and spotting unusual details in everyday scenes. Examples include identifying components in an image of a lightning-to-VGA adapter package, reasoning over a chart about average daily meat consumption across regions, and describing an odd situation in a photo of a man ironing clothes on an ironing board attached to a moving taxi.

Access details include a ChatGPT Plus cap of 100 messages every four hours for GPT-4, and the interface offers model choices that include a slower “reasoning” option. The overall takeaway from these first impressions is clear: GPT-4’s longer context, stronger critique-and-generation loop, and image understanding broaden what can be attempted in a single conversation—while safety and alignment work aims to keep those capabilities usable rather than chaotic.

Cornell Notes

GPT-4 brings a step-change in what can fit into one prompt—8,192 tokens by default and an announced 32k context mode (framed as ~50 pages). That expanded context supports longer writing, deeper analysis, and fewer interruptions for summarization. Early tests through ChatGPT Plus show GPT-4 can critique a story by pointing to specific issues (predictability, weak character development, pacing, clichés, and inconsistencies) and then rewrite it with more developed language. It also follows complex, multi-step instructions, even for unrealistic scenarios like transporting snow from Norway to the Sahara, while addressing practical and impact considerations. A key differentiator is multimodal input: GPT-4 can interpret images and answer questions about diagrams, products, and unusual scenes.

What does GPT-4’s context window change for real tasks?

The announced context length is 8,192 tokens, described as about twice the size of text-davinci-003-era offerings. OpenAI also references a 32k token context size (framed as roughly 50 pages). In practice, that means longer documents, bigger drafts, and more extensive code or analysis can be handled in a single interaction—reducing the need to truncate inputs or repeatedly summarize.

How does GPT-4 perform in a critique-and-rewrite workflow?

In a test, GPT-4 critiques a short dragon-rider story by naming specific weaknesses: predictability, underdeveloped characters, pacing issues, overused clichés, and inconsistencies in the protagonist’s thought process. When asked to rewrite based on that critique, it produces a more polished, descriptive style with stronger imagery and mood-setting, indicating it can convert feedback into revised prose rather than only offering general comments.

Why is the “snow from Norway to the Sahara” prompt a useful demonstration?

It forces GPT-4 to produce a structured plan under extreme constraints. The response includes logistics steps like obtaining permits, planning routes and infrastructure, using refrigerated transport (trucks, trains, planes), hiring logistics support, preparing insulated containers, monitoring transit, depositing and distributing the snow, and documenting the outcome. It also adds an impact assessment—environmental, social, and economic—showing it can attach real-world considerations to an absurd premise.

What does multimodal capability add compared with text-only prompting?

Multimodal input lets GPT-4 interpret images alongside text instructions. Examples include describing a multi-panel image “panel by panel” (a lightning cable adapter package with different connector views), reasoning over a diagram/graph to answer a question about average daily meat consumption for regions, and identifying an unusual detail in a photo (ironing on an ironing board attached to a moving taxi). This expands tasks from writing and planning into visual understanding and image-grounded answers.

What access and usage constraints matter when trying GPT-4 in ChatGPT Plus?

ChatGPT Plus access includes a GPT-4 message cap of 100 messages every four hours. The interface also offers multiple model options, including a slower mode positioned as stronger for reasoning. That means experimentation may require choosing the right model for the task and working within the rate limit.

Review Questions

How do the 8,192-token and 32k-token context claims affect what kinds of prompts can be completed in one pass?
Describe the specific categories of weaknesses GPT-4 identified in the dragon-rider story, and explain how those categories influenced the rewrite.
Give two examples of image-based tasks GPT-4 can handle from the transcript, and explain what the model had to infer from the visuals.

Key Points

1
GPT-4’s announced context length is 8,192 tokens, with an additional 32k context mode framed as about 50 pages of text.
2
ChatGPT Plus provides GPT-4 access with a cap of 100 messages every four hours, and model choices that trade speed for reasoning.
3
A critique-and-rewrite loop can work effectively: GPT-4 can identify story problems (predictability, character development, pacing, clichés, inconsistencies) and then revise the prose accordingly.
4
Complex, multi-step instructions—even unrealistic ones—can be converted into structured plans with practical steps and impact considerations.
5
GPT-4’s multimodal capability allows it to interpret images and answer questions about diagrams, product images, and unusual real-world scenes.
6
OpenAI highlights months of safety and alignment work to make GPT-4 outputs more usable and aligned with user intent.

Highlights

GPT-4’s context window is positioned as a leap: 8,192 tokens by default and an announced 32k mode (about 50 pages), enabling much longer single-pass tasks.

In a story test, GPT-4 provided targeted critique—predictability, weak character development, pacing, clichés, and inconsistencies—then produced a more developed rewrite.

A multimodal example showed GPT-4 describing a multi-panel image “panel by panel,” demonstrating image-grounded understanding beyond text-only chat.

The snow-from-Norway-to-the-Sahara prompt produced a logistics checklist (permits, routes, refrigerated transport, monitoring, and impact assessment), turning a joke into an actionable plan.