23 Ways ChatGPT Still Sucks After 3 years (And How to Fix Them)

TL;DR

Chatbots need to shift from idea generation to workbench behavior: shareable, shippable outputs that teams can collaborate on and maintain.

Briefing Cornell Notes

Briefing

Chatbots are still strong at generating ideas but weak at turning those ideas into shippable work—and the gap shows up across sharing, intent, exports, trust, retrieval, and day-to-day usability. After roughly three years of rapid progress, the biggest frustration isn’t raw model quality; it’s product design. People can’t easily collaborate on the same thread, can’t branch experiments without breaking the original, and can’t reliably carry outputs into the tools where work actually gets done.

Collaboration and sharing are the first fault line. Users want to share a relevant slice of a conversation with someone else, start a new thread from that point, and keep editing together—rather than being locked into a single, linear chat. The current workflow often becomes “copy paste ping pong,” with tasks and decisions buried in scrollback. The same problem appears when trying to point others to a single “banger” answer: instead of a simple permalink to one message, users must share the entire back-and-forth to preserve context.

Intent handling is another bottleneck. Instead of typing a gist and letting the system convert it into a clean, structured request, users often have to prompt extensively to get reliable results. The desired future is a chat experience that dynamically generates interfaces—checkboxes, variable controls, and guided inputs—so people can confirm what matters without writing perfect prompts. Alongside that, users want control over “agenticness,” meaning the ability to dial how autonomous the assistant should be for a given task: short loops for quick back-and-forth, or longer “work for hours” behavior when appropriate.

Even when answers look right, turning them into real work remains fragile. Exports to tools like Docs and Notion frequently break formatting, especially code blocks and headings, forcing users to spend more time repairing output than producing it. Action items also need better surfacing—highlighted, calendar-aware, and visually tracked—rather than disappearing into chat. For stakeholders, there’s a demand for publishable, read-only summaries that show the best parts of a messy thread without exposing every intermediate step.

Keeping outputs updated without babysitting is still underdeveloped. Users want scheduled or source-aware updates that refresh automatically when underlying information changes, plus “diffs” that show what changed since the last run rather than dumping everything again. Trust and control also lag: people want clear receipts for claims (verifiable sources and the ability to validate URLs), transparent memory behavior (which memories are being pulled and editable), and better controls for sensitive work such as thread sharing and regional access.

Finally, retrieval and quality-of-life features are missing. Finished work gets buried because chats aren’t automatically organized into projects, pinned at the top, or searchable with context-aware “global search” inside a conversation. Users also want structured input modes—like dynamically generated forms when uploading PDFs—better tone controls with saved “fingerprints,” and quick restore/version rollback for canvases and artifacts. The overall prescription is consistent: move from clever chat toward a real workbench that’s shareable, shippable, self-maintaining, and easier to reuse—so the assistant doesn’t just talk, but helps teams build and maintain outcomes over time.

Cornell Notes

The core complaint is that modern chatbots generate ideas well but still struggle to convert them into durable, collaborative work. The most repeated pain points are product-level: sharing only works as whole-thread copy/paste, intent requires long prompting instead of structured, dynamically generated inputs, and exports often break formatting—especially code. Users also want better control loops: adjustable agenticness, source-aware updates with diffs, transparent citations (“receipts”), and editable memory. Finally, retrieval and usability lag, with weak project organization, limited in-chat search, clunky tone controls, and missing version rollback. The result is a workflow that feels like babysitting and repairing outputs rather than shipping work.

Why does collaboration feel broken in today’s chatbot workflows, and what would “slice sharing” change?

Collaboration currently treats a chat as a single locked thread: users can’t share just the relevant portion, branch off from that point, and continue editing together. The desired behavior is like editable social posts—select a message range, share a link for view/comment/edit, then start a new thread tied back to the main conversation. That would reduce “copy paste ping pong” and make it easier to iterate on specific decisions without losing the original context.

What does “intent” mean here, and why is long prompting still a problem?

Intent is the structured goal behind a request—what the user actually wants the assistant to do and under what constraints. The complaint is that LLMs don’t reliably infer intent from a gist, so users must write pages of prompting to get consistent results. The proposed fix is dynamic interfaces inside chat: checkboxes and variable controls that users can confirm, similar to how Comet generates a dynamic email-sending UI.

How should users control how autonomous an assistant is for a task?

Instead of one fixed assistant personality, users want a dial for “agenticness” per request. For example, a high setting would let the assistant work for hours; a low setting would keep interactions in short loops. The goal is productivity control (how much the system acts) rather than temperament control (how eager or cautious it feels).

What’s wrong with turning answers into work, beyond model accuracy?

Even correct drafts can fail at the handoff stage. Exports to Docs/Notion often break formatting—code blocks, headings, and structure—so users spend time repairing output. Action items also get lost in chat scroll instead of being surfaced as tasks with calendar context. Stakeholders face another issue: they don’t want the messy thread, so the assistant should generate a publishable, read-only summary of the best messages.

What would “smart updates” look like without constant reruns?

Users want scheduled tasks that update dynamically as sources change, not static outputs. If an agent is promised to act, it should detect when inputs shift (e.g., news changes vertical) and proactively refresh results. When updates happen, users want diffs—what changed since the last version—rather than a full dump that forces rereading.

What trust and control features are missing for safe, verifiable use?

Trust requires receipts: claims should list sources in a way users can copy as markdown and validate by running separate checks on real URLs. Memory needs transparency and editing—users want to see which memories are being pulled and override them. For sensitive work, users also want better thread controls (auto-deleting, non-shareable threads, region-locked access) and clearer cost/time visibility so usage limits don’t become surprises.

Review Questions

Which chatbot capability is most emphasized as still weak: idea generation or converting ideas into shippable work—and what product features drive that conclusion?
How would dynamic, checkbox-based interfaces reduce the need for long prompting? Give an example from the transcript.
What combination of “receipts,” editable memory, and diff-based updates would most improve trust and maintenance over time?

Key Points

1
Chatbots need to shift from idea generation to workbench behavior: shareable, shippable outputs that teams can collaborate on and maintain.
2
Enable collaboration via message-level slice sharing, branching, and live co-editing instead of forcing whole-thread copy/paste workflows.
3
Replace long prompting with dynamically generated structured inputs (checkboxes/variable controls) so users can confirm intent quickly.
4
Make outputs operational: preserve formatting in exports, surface action items as tasks, and generate stakeholder-friendly publishable summaries.
5
Support source-aware updates with proactive refresh and diff views so users don’t have to rerun everything and reread the entire result.
6
Build trust through verifiable citations (“receipts”), transparent editable memory, and stronger controls for sensitive threads.
7
Improve retrieval and quality-of-life with project auto-grouping, smarter in-chat search, tone “fingerprints,” and quick version rollback/checkpoints.

Highlights

The central frustration is not model intelligence—it’s the missing product layer that turns chat into shippable work.

Users want message-level permalinks and branching collaboration, not whole-thread sharing and endless copy/paste.

Exports that break code blocks and headings force users to fix formatting instead of shipping outcomes.

Trust features should include receipts you can validate and editable memory that shows exactly what’s being reused.

A real workbench needs diffs, proactive source-aware updates, and version rollback—not just new chat responses.

Topics

Chatbot Collaboration
Intent and Structured Inputs
Exports and Workflows
Trust and Citations
Retrieval and Versioning

Mentioned

Nate B Jones