Claude vs. GPT: Which is best for note-taking?

TL;DR

Reflect Notes lets users switch between Claude 3.5 Sonnet (Anthropic) and GPT-4o (OpenAI), and the default can change as providers update models.

Briefing Cornell Notes

Briefing

Note-taking workflows in Reflect Notes can switch between Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o, and the practical takeaway from side-by-side tests is that Claude 3.5 Sonnet tends to produce better-quality text—especially when the task demands nuance, counterarguments, or clearer rewriting—while GPT-4o more consistently improves formatting and sometimes compresses content more aggressively.

The comparison starts with Reflect Notes’ model controls: users can leave the AI provider on the default (Anthropic), or manually set it to Anthropic (Claude 3.5 Sonnet) or OpenAI (GPT-4o). Reflect Notes also updates its default model over time, so the “best” choice can shift as providers roll out new versions. Still, when forced to pick one model for note-taking tasks, the guidance is to stick with Claude 3.5 Sonnet.

In a short summarization test, both models deliver usable summaries, but Claude 3.5 Sonnet retains more information while GPT-4o condenses more. The narrator credits Claude for being the more informative summary, even if it could stand to compress further.

When asked to generate writing—specifically drafting a corporate-style email—the results diverge. Claude’s output includes more “fluff” and a more generic opening, which can slow down real-world use if the goal is a clean, ready-to-send message. GPT-4o, by contrast, produces a cleaner email with better formatting and more directly usable structure, even though it may be shorter.

The biggest quality gap appears in logic-focused work. Given a claim that AI will fully automate 30% of jobs by 2030, Claude responds with a structured set of counterarguments, framing the prediction as likely overstated and listing concrete reasons. GPT-4o answers in a more paragraph-like form and leans toward softer phrasing. For tasks that require argumentation and persuasive clarity, Claude’s list-style counterpoints come across as more compelling.

For “simplify and condense” tasks, GPT-4o again shows strength in presentation: it turns complex material into a step-by-step process with clear formatting. Claude also simplifies well, but GPT-4o’s reformatting makes the content easier to follow. In the CNC manufacturing example, both models help a non-expert understand the topic, yet Claude is still favored overall because it produces a more useful distilled explanation.

Finally, in a rephrasing test using the narrator’s own paragraph, the models land close. GPT-4o’s version is judged worse, while Claude’s is judged better, though the critique notes that the evaluation is limited because the prompt was a simple rephrase rather than a targeted rewrite.

Overall, the recommendation is to keep Reflect Notes on the default model (Claude 3.5 Sonnet) for note-taking, while recognizing that GPT-4o can be preferable when formatting and step-by-step structure matter most. A follow-up comparison is promised for “chatting with your notes,” where the model choice may affect retrieval and conversational behavior differently.

Cornell Notes

Reflect Notes lets users choose between Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o for AI-assisted note-taking. Side-by-side examples suggest Claude 3.5 Sonnet usually wins on content quality—especially for logic-heavy tasks like generating counterarguments—while GPT-4o often wins on formatting and sometimes produces more concise outputs. Summaries: Claude keeps more information; GPT-4o compresses more. Emails: GPT-4o’s formatting is cleaner, while Claude can add generic “fluff.” Simplification: both help, but GPT-4o’s step-by-step structure is particularly readable. Overall guidance: keep the default (Claude 3.5 Sonnet) unless formatting/structure is the top priority.

How does Reflect Notes’ model switching work, and why does it matter for note-taking quality?

Reflect Notes includes an AI provider setting where users can toggle between the default provider and manual selection. The default is tied to Anthropic, and Reflect Notes updates the underlying model over time, so the “best” model can change as providers improve. Manually selecting Anthropic uses Claude 3.5 Sonnet, while selecting OpenAI uses GPT-4o. Because the transcript’s tests show consistent differences in output style (Claude for nuance, GPT-4o for formatting), the chosen model directly affects what ends up in notes.

In a short summarization task, which model preserves more meaning and which compresses more?

For a short summary prompt, both outputs are considered “pretty decent,” but Claude 3.5 Sonnet retains more information, while GPT-4o condenses more aggressively. The evaluation credits Claude for being the more informative summary, even though it might not compress as much as the user would like.

When generating an email from bullet points, what differences show up?

Claude’s email draft includes more generic corporate phrasing (for example, a greeting and a conventional opening like “I hope this email finds you well”), which adds filler. GPT-4o produces a shorter email with better formatting and a more directly usable structure. The transcript’s verdict is that GPT-4o is objectively better for this specific writing task.

Why does the logic/counterargument example favor Claude?

Given a claim about AI fully automating 30% of jobs by 2030, Claude responds by challenging the prediction and listing counterarguments in a structured way. GPT-4o keeps the response more in paragraph form and is judged to be less compelling. The key difference isn’t just correctness; it’s the clarity and persuasive structure of the counterpoints.

For simplifying complex material, how do Claude and GPT-4o differ in usability?

Both models simplify and make the content easier to understand. GPT-4o stands out by reformatting into a step-by-step process with clear structure, which improves readability for non-experts. Claude also simplifies effectively, but the transcript favors Claude overall in the CNC manufacturing example—suggesting that while GPT-4o formats well, Claude’s distilled explanation can be more useful.

How do the models perform on rephrasing someone’s own paragraph?

The rephrasing test is harder to judge because it uses a simple rephrase prompt rather than a targeted rewrite. Still, Claude’s rephrasing is rated better than GPT-4o’s. The critique notes Claude’s wording choices (e.g., “daunting”) aren’t perfect, but the overall rewrite is considered closer to what the user would want.

Review Questions

When summarizing, what trade-off did the transcript identify between Claude 3.5 Sonnet and GPT-4o (information retention vs compression)?
Which task types in the transcript favored Claude 3.5 Sonnet, and which favored GPT-4o? Provide one example each.
What formatting-related advantage did GPT-4o show in the simplification task, and how did that affect perceived usefulness?

Key Points

1
Reflect Notes lets users switch between Claude 3.5 Sonnet (Anthropic) and GPT-4o (OpenAI), and the default can change as providers update models.
2
Claude 3.5 Sonnet generally produces higher-quality summaries by retaining more information, even if it compresses less than GPT-4o.
3
GPT-4o is often better for drafting emails because its formatting is cleaner and less cluttered with generic filler.
4
Claude 3.5 Sonnet performs strongly on logic and counterarguments, using structured lists that read as more persuasive and concrete.
5
GPT-4o’s simplification outputs are especially readable when they can be reformatted into step-by-step processes.
6
For note-taking, the practical recommendation is to keep Reflect Notes on the default (Claude 3.5 Sonnet) unless formatting/structure is the main priority.

Highlights

Claude 3.5 Sonnet kept more meaning in summaries, while GPT-4o compressed more aggressively.

GPT-4o produced a cleaner, more usable email draft with better formatting than Claude’s more generic version.

In the job-automation counterargument example, Claude’s list-style reasoning was judged more compelling than GPT-4o’s paragraph framing.

GPT-4o excelled at turning complex material into step-by-step instructions, improving readability for non-experts.

Topics

Model Selection
Note-Taking
Summarization
Email Drafting
Counterarguments