I Stopped Drowning in AI Slop—Prompts That Saved Me 100+ Hours (Demo Inside)
Based on AI News & Strategy Daily | Nate B Jones's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Treat AI slop as a quality-gating problem: scale generation, but enforce review criteria that humans can’t manually apply to every draft.
Briefing
AI slop isn’t a detection problem—it’s a quality-gating problem. With AI churning out more PRDs, blog posts, emails, and drafts than any team can realistically review, organizations need a reliable way to decide what deserves scarce human attention. The core fix offered here is to treat large language model (LLM) “attention” as the dominant attention channel and use AI to do the bulk of the reading, while humans spend their limited time on the small fraction of outputs that truly matter.
The argument starts from a practical shift: AI has moved many workflows from “can I write one good thing?” to “can I generate fifty things fast?” That creates a new bottleneck—quality assessment. People either skim with their eyes or skip review entirely because there’s no consistent quality gate. But simply asking an AI “is this good?” leads to inconsistent results because quality judgments vary by model (e.g., ChatGPT vs. Claude) and by what the model already “believes” is good based on training and prior context. The proposed solution is therefore not generic prompting; it’s robust, use-case-specific prompting designed to function as a filter.
A key mindset is borrowed from Andrej Karpathy’s suggestion that LLMs should handle roughly 98% of attention, leaving humans with the 2% that matters. In this framework, the goal becomes: surface the highest-quality pieces—such as the two best blog posts out of a hundred, or the PRD that is truly promotable—while filtering out the rest. That requires prompts tailored to each artifact type and job family, because the criteria for a strong PRD differ from the criteria for a customer announcement email or a sales follow-up.
To make the approach concrete, the transcript walks through a sample prompt for evaluating a product requirements document (PRD). The prompt sets a clear “role” and stakes: determine whether an engineering team can build the PRD without needing three clarifying meetings. It then defines evaluation axes such as completeness, acceptance criteria, edge cases, and explicit non-goals. A scoring rubric maps quality to measurable signals—for example, a higher score corresponds to testable, well-documented edge cases and non-goals, while a low score corresponds to untestable vagueness.
The prompt also checks whether the document is testable and readable (with examples like readability targets for sales emails), whether scope is clear, and whether key elements are present. It includes dependency mapping and an “elements check” to ensure nothing critical is missing. Rather than relying on a vague verdict, the output is structured (using JSON) to produce a grading score plus plain-English feedback, including actionable revision guidance. The emphasis is on feedback that a writer can immediately act on—such as being explicit about the Stripe API version—so the system supports an ongoing improvement loop.
Finally, the transcript argues that “AI slop” is partly overhyped: sloppy work existed long before AI. The real opportunity is raising accountability and quality standards using AI as a scalable quality filter. There’s no single magic prompt, but a prompt pack—built for marketing, customer success, sales, product, and engineering—can help organizations stop drowning in low-quality output and start placing human attention where it has the most impact.
Cornell Notes
AI slop is framed as a quality-control failure, not an AI-detection failure. Because AI can generate far more drafts than humans can review, the solution is to use LLMs as an “attention filter”: let AI read and grade most outputs, then route only the top candidates to the 2% of human attention that matters. The transcript demonstrates a PRD-specific grading prompt that scores completeness, testability, scope clarity, and other criteria, using a rubric tied to concrete signals (e.g., measurable edge cases and explicit non-goals). It also uses structured output (JSON) to produce actionable, plain-English feedback and accept/reject thresholds. The approach is meant to be adapted per artifact type and job family, since quality criteria differ across PRDs, emails, blog posts, and more.
Why does “AI slop” become a management problem once generation scales up?
What does it mean to use LLMs as an “attention filter” rather than a writing helper?
Why is a generic prompt like “is this good?” unreliable?
How does the sample PRD prompt define stakes and evaluation criteria?
What makes the feedback actionable instead of just a score?
How should the filter differ across job families and artifact types?
Review Questions
- What workflow bottleneck emerges when AI generation increases output volume, and how does the proposed filter address it?
- In the PRD grading example, which rubric dimensions are used to distinguish testable, promotable work from vague work?
- Why does the transcript argue that “AI slop” can’t be solved by a single magic prompt or by AI detectors alone?
Key Points
- 1
Treat AI slop as a quality-gating problem: scale generation, but enforce review criteria that humans can’t manually apply to every draft.
- 2
Use LLMs as an attention filter—aim for LLMs to handle most reading and triage while humans review only the top candidates.
- 3
Avoid generic judgments like “is this good?”; quality scoring should be tied to artifact-specific rubrics and measurable signals.
- 4
Build prompts per job family and per artifact type (PRDs vs. blog posts vs. emails) because the definition of “good” changes with context.
- 5
Use structured outputs (e.g., JSON) to produce scores plus accept/reject decisions and plain-English, actionable revision feedback.
- 6
Design the rubric around concrete outcomes (e.g., whether engineering can build without multiple clarifying meetings) rather than vague impressions.
- 7
Focus on raising accountability and quality standards regardless of who wrote the work, since sloppy output predates AI.