Be Careful w/ Skills

TL;DR

Skills are markdown-based instructions that can translate into real command execution by LLM agents, so unreviewed third-party skills create direct security risk.

Briefing Cornell Notes

Briefing

“Skills” — markdown files fed into LLMs to grant extra context and let the model take actions — are becoming a new attack surface, and the ecosystem is moving faster than safety practices. The core warning is blunt: handing an LLM full permissions to execute commands from unreviewed, third-party text can turn small mistakes or outright malicious payloads into system-level harm.

A major risk highlighted is supply-chain manipulation. A developer created a fake skill on a Claude Hub–style marketplace, then engineered it to look highly popular and widely downloaded. When it ran, the payload wasn’t the “safe” behavior users expected; the incident underscored how marketplaces can be gamed and how quickly trust can be manufactured. The deeper problem wasn’t just the fake skill itself, but the fact that the assistant (described as a personal agent that receives sensitive keys) can act on a user’s behalf. If the assistant executes an unverified markdown skill, the user’s “keys to the kingdom” become the leverage point.

Another layer of danger comes from stealth and rendering tricks. Malicious instructions can be hidden in HTML comments inside markdown. Many viewers render markdown in ways that strip away or obscure what’s actually inside, meaning a user might “look at” a skill on a repository page and still miss the harmful commands. The transcript frames this as “hidden in plain sight”: the raw content may contain executable directives that normal browsing workflows fail to reveal.

The most alarming failure mode described is the spread of hallucinations through the skill supply chain. Skills are often produced by LLMs, and if one agent hallucinates a command, that incorrect command can be copied into other skills. Over time, the hallucination becomes a shared dependency. A specific example centers on an imaginary npm command resembling “npx react code shift,” which would fail when executed, yet still propagate. The response to that propagation was itself a new kind of vulnerability: someone published a package on npm so that when people attempted to run the fake command, the execution would route to the attacker.

The transcript also points to distribution mechanisms that lower user scrutiny. A “find skills” capability is described as querying available skills and then placing them on a user’s computer for execution. If skills are essentially pointers to GitHub content, a skill can start as benign and later become malicious, or simply be replaced by a bad actor. The result is a race toward a least-secure ecosystem: users “raw dog” text to an LLM with broad permissions, often without reading what will run.

Finally, the discussion broadens to include crypto-themed scams and the broader “AI raises the floor” promise. While the benefits of easier creation are acknowledged, the transcript argues that rapid capability growth outpaces users’ ability to understand and manage risk. The proposed mitigation is practical and old-fashioned: read the skill content directly (not through HTML-rendering conveniences), inspect what commands will execute, and only then decide whether to run it—because in this ecosystem, trust is cheap and verification is the difference between automation and compromise.

Cornell Notes

“Skills” are markdown-based instructions that feed context to LLMs so they can run tasks with higher accuracy, but they also create a new supply-chain and execution-risk channel. The transcript highlights multiple failure modes: fake skills that look popular, malicious commands hidden in HTML comments, and hallucinated commands that spread across skills until hundreds of repositories share the same imaginary dependency. Distribution features like “find skills” can automatically pull and execute third-party content, reducing user review. The takeaway is that automation with full permissions demands verification—users must inspect the raw skill content before execution, not rely on rendered previews or marketplace trust.

What makes “skills” risky compared with ordinary code review?

Skills are markdown files that an LLM can use to perform actions on a user’s behalf. That means the “instruction text” can translate into real commands executed with broad permissions. If a user doesn’t read the raw content, they may miss hidden directives (e.g., payloads embedded in HTML comments) or trust marketplace popularity that can be artificially boosted.

How does supply-chain manipulation work in the skills marketplace example?

A developer created a fake skill on a Claude Hub–style hub and then made it appear highly downloaded and popular. The transcript emphasizes that marketplace ranking can be gamed, so users may select a skill believing it’s widely vetted. When executed, the assistant can act using sensitive credentials (“keys to the kingdom”), turning a fake skill into a direct compromise path.

Why do HTML comments matter for security in markdown-based skills?

Markdown viewers often render content in ways that hide or strip what’s inside HTML comments. The transcript describes malicious commands being placed in HTML comments so they won’t be visible during casual inspection. That creates a gap between what a user thinks they reviewed and what the LLM actually executes.

How can LLM hallucinations become a systemic vulnerability?

Skills are frequently generated by LLMs. If one agent hallucinates an npm command, that incorrect command can be copied into other skills. Over time, many skills can share the same imaginary dependency. The transcript cites a case where 237 skills on GitHub contained an imaginary “npx react code shift”-like command that would fail, but still propagated through the ecosystem.

What is “hallucination squatting,” and what does it enable?

After hallucinated commands spread, an attacker can publish a real npm package that matches the fake command’s name. Then, when users (or agents) attempt to execute the hallucinated command, execution routes to the attacker-controlled package. The transcript frames this as a new escalation: the hallucination stops being harmless failure and becomes an entry point.

How does “find skills” increase exposure?

The transcript describes a workflow where “find skills” queries available skills when asked how to do something, then outputs those skills onto the user’s computer for execution. Because skills are effectively pointers to GitHub content, any skill can be benign until it isn’t—either through later changes or malicious replacements—while users may never perform meaningful review.

Review Questions

Which specific mechanisms allow malicious or incorrect skills to bypass casual inspection (e.g., rendering behavior, marketplace ranking, or hidden payloads)?
How does hallucination propagation turn an LLM error into a supply-chain dependency, and why does that make “squatting” possible?
What verification step does the transcript recommend, and how does it address the risks introduced by HTML-rendering markdown viewers?

Key Points

1
Skills are markdown-based instructions that can translate into real command execution by LLM agents, so unreviewed third-party skills create direct security risk.
2
Marketplace popularity can be manipulated, making fake skills look trustworthy and increasing the odds of execution.
3
Malicious payloads can be hidden in HTML comments inside markdown, defeating casual “read it in the browser” checks.
4
LLM hallucinations can propagate through skill creation, causing many skills to share the same incorrect command dependency.
5
Attackers can exploit hallucination propagation by publishing real npm packages that match imaginary commands, turning failures into compromise.
6
Automated “find skills” workflows can pull and execute third-party content with minimal user scrutiny, widening the attack surface.
7
The transcript’s mitigation centers on manual verification: inspect raw skill content (not rendered previews) before running anything with permissions.

Highlights

Skills can hide executable instructions in HTML comments, so rendered markdown can make harmful commands invisible during review.

Hallucinated npm commands can spread across skills until hundreds of repositories share the same imaginary dependency—creating a predictable target for attackers.

“Hallucination squatting” turns an LLM error into an exploit by publishing a real npm package that matches the fake command name.

Automated skill discovery (“find skills”) can place third-party instructions directly onto a user’s machine for execution, reducing the chance of inspection.

Supply-chain trust is fragile when marketplace ranking can be gamed and agents can act with sensitive credentials.

Topics

LLM Skills
Supply-Chain Attacks
Hallucination Propagation
npm Squatting
Markdown Security