Agent Skills: Code Beats Markdown (Here's Why)

TL;DR

Agent Skills use conditional context loading: skill.md provides initial capability metadata, and scripts are brought in only when the model selects the skill.

Briefing Cornell Notes

Briefing

Agent Skills—an open standard for “agent skills” used by models and coding harnesses—are gaining momentum because they let systems do tasks with code execution instead of relying on bulky, error-prone markdown instructions. The practical shift is toward structured skills that load lightweight metadata first, then pull in the full skill definition only when a model decides it needs a specific capability. That conditional, token-efficient workflow is why multiple companies have adopted the approach and why ecosystems like skills.sh and skillsmpp.com are emerging to publish and distribute reusable skills.

At the core of a skill is a skill.md file that acts as metadata and instruction set. Around that, skills can include references, templates, examples, and assets, but the decisive upgrade comes from scripts: code files that run inside a sandbox. With sandbox access, a model can either reuse scripts as-is or rewrite them, execute them, and then feed results back into the model as new context—or return final outputs directly to users. This turns skills into structured instruction sets backed by real tooling: bash and command-line interfaces, Python execution, and API calls.

Where things often go wrong is efficiency. Many scraping skills rely on overly generic scripts that either fail unpredictably or waste tokens. One costly mistake is using a web fetch tool to pull entire HTML pages and then returning that raw content to the model. Even a single page can balloon into tens of thousands of characters and thousands of tokens. A simple optimization—skipping script/style/nav/footer tags and similar boilerplate—can cut token usage by nearly 90%, which matters dramatically when scraping dozens or hundreds of pages.

Another common failure mode is forcing the model to rediscover page structure every run. If the target site’s CSS selectors and fields are known, the script should filter directly and return only the needed data. If selectors aren’t known yet, the model can be used once to extract the relevant CSS class selectors and field mappings (titles, URLs, scores, comment counts, etc.), after which the scraper can run deterministically with far less repeated context.

The best-performing pattern described treats skill.md as an orchestrator while scripts do the heavy lifting. Scripts should output in model-friendly formats—typically JSON (or markdown when appropriate)—and ideally enforce a strict output schema in code. For multi-site scraping, the schema should at least normalize comparable fields like product name, URL, price, and discount.

Efficiency also depends on operational controls: run tasks in parallel using threads or batching to avoid sequential round trips, set explicit limits and stop conditions to prevent runaway pagination, and support incremental runs by checking prior saved reports and scraping only what’s new. Finally, hardcode stable parameters in scripts—such as proxy settings and known categories or usernames—so the model isn’t spending tokens on decisions that code can handle. The throughline is deliberate token management: decide what enters the context window, what leaves it, and what should never touch it—especially when token costs are a recurring business expense.

Cornell Notes

Agent Skills are an open standard for giving models task-specific capabilities through structured skill packages. A skill starts with skill.md (metadata plus core instructions), and only when needed does the system load additional context like references and scripts. The biggest efficiency gains come from running task logic in sandboxed code and returning compact, structured outputs (often JSON), rather than pushing full HTML or vague instructions into the model. For scraping, common token-wasting mistakes include returning entire pages via web fetch and repeatedly rediscovering page structure. Better results come from filtering with known CSS selectors, batching/parallelizing work, enforcing pagination limits, and supporting incremental runs so each execution does only the new work.

Why do agent skills work better than “markdown-only” instructions?

Skills rely on context engineering: providing the model the right information at the right time. A small skill.md file first tells the model what tools/skills exist and what each skill is for. When the model chooses a skill, the system loads the full skill definition (including scripts) into context. Because the model can execute code in a sandbox, the heavy task logic happens in scripts (bash, command-line tools, Python, API calls), and only the results need to be passed back—reducing token waste and improving reliability.

What is the most expensive scraping mistake mentioned, and how can it be fixed?

A major mistake is using a web fetch tool to retrieve an entire HTML page and returning all that raw markup to the model. The transcript gives an example where fetching the Hacker News front page reaches about 34,000 characters and roughly 8,000+ tokens. Filtering out non-essential tags like script, style, nav, and footer can drop the result to under 1,000 tokens—nearly a 90% reduction. The fix is to have the script extract only the relevant content instead of sending full HTML back into context.

How should a scraper avoid wasting tokens on repeated page-structure discovery?

If the target site’s structure is known, the script should directly filter by the CSS selectors and fields needed (e.g., article title, URL, points, comments). If selectors are unknown, the model can be used once to identify the correct CSS class selectors and field mappings; then the scraper runs deterministically using those selectors. This avoids paying the token cost of having the model re-figure out structure every run.

What output format and schema strategy makes skills more efficient?

Scripts should return data in the easiest format for the model to use—typically JSON (or markdown when appropriate). The transcript emphasizes defining a strict output schema, ideally enforced directly in the script. If strict code-level schemas aren’t feasible (e.g., scraping many pricing sites), the skill.md should still specify a normalized set of fields for comparison, such as product name, URL, price, current price, and discount.

What operational tactics prevent scraping skills from flooding the context window?

Use parallelism and explicit limits. Parallelism means batching searches and using threads to avoid sequential round trips that repeatedly add conversation context. Limits and stop conditions prevent runaway pagination: skill.md should cap the maximum number of web searches or web fetch calls and control how deep pagination goes. The transcript also recommends incremental runs—checking for an existing previous report and scraping only items newer than what was saved—so each execution does incremental work rather than starting from scratch.

Review Questions

When should a skill load full scripts into context, and what role does skill.md play before that happens?
Why does returning full HTML via web fetch often cause token blowups, and what kinds of tags should be excluded?
What combination of batching, stop conditions, and incremental mode best prevents runaway scraping and repeated work?

Key Points

1
Agent Skills use conditional context loading: skill.md provides initial capability metadata, and scripts are brought in only when the model selects the skill.
2
Sandboxed scripts (bash/Python/CLI/API calls) should do the heavy lifting, while the model receives compact results rather than raw pages.
3
Avoid sending entire HTML back to the model; filter out boilerplate tags (script/style/nav/footer) to cut token usage dramatically.
4
Don’t make the model rediscover page structure every run—extract CSS selectors once, then run deterministic filtering using those selectors.
5
Return structured outputs (preferably JSON) and enforce a strict schema in code or specify a normalized schema in skill.md.
6
Use batching/threads for parallel searches to reduce sequential round trips and repeated context growth.
7
Add guardrails: pagination limits, stop conditions, and incremental scraping so each run processes only new data.

Highlights

A single full-page HTML fetch can reach ~34,000 characters and ~8,000+ tokens; filtering out non-content tags can cut that to under ~1,000 tokens.

The most reliable scraping pattern is: skill.md orchestrates, scripts extract, and scripts return JSON with a defined schema.

Parallel batching (threads) prevents the “15 sequential searches” problem where each round trip adds more context.

Token costs can be controlled with explicit pagination limits and incremental runs that scrape only what’s new since the last saved report.

Topics

Agent Skills
Context Engineering
Sandbox Scripts
Scraping Efficiency
Token Optimization