Agent Skills: Code Beats Markdown (Here's Why)
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Agent Skills use conditional context loading: skill.md provides initial capability metadata, and scripts are brought in only when the model selects the skill.
Briefing
Agent Skills—an open standard for “agent skills” used by models and coding harnesses—are gaining momentum because they let systems do tasks with code execution instead of relying on bulky, error-prone markdown instructions. The practical shift is toward structured skills that load lightweight metadata first, then pull in the full skill definition only when a model decides it needs a specific capability. That conditional, token-efficient workflow is why multiple companies have adopted the approach and why ecosystems like skills.sh and skillsmpp.com are emerging to publish and distribute reusable skills.
At the core of a skill is a skill.md file that acts as metadata and instruction set. Around that, skills can include references, templates, examples, and assets, but the decisive upgrade comes from scripts: code files that run inside a sandbox. With sandbox access, a model can either reuse scripts as-is or rewrite them, execute them, and then feed results back into the model as new context—or return final outputs directly to users. This turns skills into structured instruction sets backed by real tooling: bash and command-line interfaces, Python execution, and API calls.
Where things often go wrong is efficiency. Many scraping skills rely on overly generic scripts that either fail unpredictably or waste tokens. One costly mistake is using a web fetch tool to pull entire HTML pages and then returning that raw content to the model. Even a single page can balloon into tens of thousands of characters and thousands of tokens. A simple optimization—skipping script/style/nav/footer tags and similar boilerplate—can cut token usage by nearly 90%, which matters dramatically when scraping dozens or hundreds of pages.
Another common failure mode is forcing the model to rediscover page structure every run. If the target site’s CSS selectors and fields are known, the script should filter directly and return only the needed data. If selectors aren’t known yet, the model can be used once to extract the relevant CSS class selectors and field mappings (titles, URLs, scores, comment counts, etc.), after which the scraper can run deterministically with far less repeated context.
The best-performing pattern described treats skill.md as an orchestrator while scripts do the heavy lifting. Scripts should output in model-friendly formats—typically JSON (or markdown when appropriate)—and ideally enforce a strict output schema in code. For multi-site scraping, the schema should at least normalize comparable fields like product name, URL, price, and discount.
Efficiency also depends on operational controls: run tasks in parallel using threads or batching to avoid sequential round trips, set explicit limits and stop conditions to prevent runaway pagination, and support incremental runs by checking prior saved reports and scraping only what’s new. Finally, hardcode stable parameters in scripts—such as proxy settings and known categories or usernames—so the model isn’t spending tokens on decisions that code can handle. The throughline is deliberate token management: decide what enters the context window, what leaves it, and what should never touch it—especially when token costs are a recurring business expense.
Cornell Notes
Agent Skills are an open standard for giving models task-specific capabilities through structured skill packages. A skill starts with skill.md (metadata plus core instructions), and only when needed does the system load additional context like references and scripts. The biggest efficiency gains come from running task logic in sandboxed code and returning compact, structured outputs (often JSON), rather than pushing full HTML or vague instructions into the model. For scraping, common token-wasting mistakes include returning entire pages via web fetch and repeatedly rediscovering page structure. Better results come from filtering with known CSS selectors, batching/parallelizing work, enforcing pagination limits, and supporting incremental runs so each execution does only the new work.
Why do agent skills work better than “markdown-only” instructions?
What is the most expensive scraping mistake mentioned, and how can it be fixed?
How should a scraper avoid wasting tokens on repeated page-structure discovery?
What output format and schema strategy makes skills more efficient?
What operational tactics prevent scraping skills from flooding the context window?
Review Questions
- When should a skill load full scripts into context, and what role does skill.md play before that happens?
- Why does returning full HTML via web fetch often cause token blowups, and what kinds of tags should be excluded?
- What combination of batching, stop conditions, and incremental mode best prevents runaway scraping and repeated work?
Key Points
- 1
Agent Skills use conditional context loading: skill.md provides initial capability metadata, and scripts are brought in only when the model selects the skill.
- 2
Sandboxed scripts (bash/Python/CLI/API calls) should do the heavy lifting, while the model receives compact results rather than raw pages.
- 3
Avoid sending entire HTML back to the model; filter out boilerplate tags (script/style/nav/footer) to cut token usage dramatically.
- 4
Don’t make the model rediscover page structure every run—extract CSS selectors once, then run deterministic filtering using those selectors.
- 5
Return structured outputs (preferably JSON) and enforce a strict schema in code or specify a normalized schema in skill.md.
- 6
Use batching/threads for parallel searches to reduce sequential round trips and repeated context growth.
- 7
Add guardrails: pagination limits, stop conditions, and incremental scraping so each run processes only new data.