Can We Rank Developers ?

TL;DR

A martial-arts-style “black belt means teacher” analogy doesn’t map cleanly to software because promotion incentives and job roles differ.

Briefing Cornell Notes

Briefing

Software-engineering ranks—whether framed as “belt colors” like martial arts or as numeric levels—can’t be made truly fair or universal because programming skill is too context-dependent and too multidimensional. The most concrete proposal discussed is a 0.0–3.0 scale that ties competence to the type of work (additive contributions, infrastructural multipliers, and global multipliers), but the discussion repeatedly returns to a hard constraint: the same person can look like a different “rank” depending on environment, domain, and what’s being measured.

The conversation starts by borrowing the logic of martial arts ranking: a black belt doesn’t mean mastery of everything; it signals readiness to teach and continued growth. That analogy becomes the jumping-off point for programming. A “universal programming measurement” is floated as a way to assess developers more systematically—e.g., a 1.0 threshold for typical corporate employability, 2.0 as a “10x” tier, and higher levels representing compounding impact. The scale is paired with a reliability curve idea: a 1.8 engineer might be roughly 85% competent at level-2 tasks, with reliability improving as the level rises.

But the pushback is immediate and persistent. First, ranks in software already exist in practice (junior/senior/staff/VP/CTO), and they’re often miscalibrated—sometimes based on time served, politics, or how well someone sells their ideas. Second, teaching incentives distort behavior: promotion systems and compensation often reward people for instruction and “learn in public,” which can lead to lots of low-value teaching or premature teaching by those who aren’t ready. Third, “black belt = teacher” doesn’t map cleanly to programming, because many high performers are hired to earn a living and ship work, not to mentor.

The discussion also challenges the meaning of “10x.” Even if a developer is exceptional, the value gap between levels can vary wildly by environment. Greenfield projects may amplify differences; corporate environments can compress them. Some participants argue that “10x” often means producing more work rather than solving problems 10x better, and that different specialties (UI vs. infrastructure, embedded C vs. SaaS, Lisp macro builders vs. UI refactors) don’t fit a single ladder.

Ultimately, the most pragmatic conclusion is cynical but actionable: if engineers don’t build their own ranking or assessment systems, businesses will rank them anyway—often using criteria that don’t match technical merit. Yet the group remains skeptical that any single, organization-independent, universal belt system can work. Programming skill is shaped by domain, failure tolerance in the culture, and the specific tasks being evaluated. The best “rank” systems may therefore be internal, role-aware, and culture-aware rather than universal—and even then, they’ll produce anomalies. The core takeaway is that measuring developers is less about finding a perfect ladder and more about acknowledging what can’t be standardized.

Cornell Notes

The discussion tests whether developer “belt colors” or numeric levels can fairly rank programmers the way martial arts ranks do. A proposed 0.0–3.0 scale links competence to task type: additive business value (level 1), infrastructural multipliers (level 2), and global multipliers (level 3), with a reliability/competence curve (e.g., a 1.8 engineer being ~85% competent at level-2 work). Major objections stress that programming skill is context-dependent (environment, domain, and what tasks are being measured), and that “black belt = teacher” doesn’t translate well to software incentives. The result is skepticism toward any universal, organization-independent ranking system, even while acknowledging that ranking will happen regardless—so better internal assessment may be necessary.

Why does the martial-arts analogy break when applied to software engineering ranks?

Martial arts belts are tied to a tradition where higher ranks often imply readiness to teach, and the meaning of “expert” is relatively stable. In software, high performers are frequently hired to ship and solve business problems, not to teach. That mismatch matters because teaching incentives can also distort behavior: people may be rewarded for instruction or “learn in public” even when they aren’t the best fit to mentor. So a “black belt = teacher” mapping can misidentify what a developer is actually good at.

What is the proposed 0.0–3.0 programming scale, and how does it try to quantify competence?

The scale defines level 1 as additive contributions that produce frontline business value, level 2 as multiplicative infrastructural contributions, and level 3 as global multipliers over multipliers. It also introduces a competence curve idea: reliability improves with level, so a person at 1.8 might be about 85% competent at level-2 tasks. The framework further claims level 1.0 is a threshold for typical corporate employability, while level 2.0 corresponds to a “10x” tier—though the discussion later questions how consistent that “10x” meaning is.

Why is “10x” considered unreliable as a universal metric?

The gap between developers doesn’t behave uniformly across contexts. Differences can be large on greenfield projects, smaller in corporate environments, and sometimes nearly nonexistent in less accommodating settings. Also, “10x” may reflect output volume rather than proportional problem-solving quality. Specialty matters too: someone strong in UI refactors may not be “10x” at infrastructure, and someone strong in embedded C may not be “10x” at SaaS product work.

How does culture and environment affect developer “rank” outcomes?

Experience and internal network effects can amplify performance inside a company. An early employee building simple things can gain tribal knowledge and influence that later makes them more effective than someone equally skilled but outside that environment. More broadly, a culture that allows failure can accelerate learning and multiplier-level contributions. The same developer can therefore look higher or lower depending on how the organization supports experimentation and risk.

What critique targets teaching and “learn in public” incentives?

Teaching is often paid and rewarded, which can lead to “goofiness” in how people pursue recognition. The discussion suggests that some people teach too early, some can’t teach well despite being good at engineering, and others teach a lot because it’s incentivized rather than because it’s the highest-value use of their time. Even if teaching can be valuable, the incentive structure can skew who gets promoted and how competence is perceived.

If universal ranking seems impossible, what practical motivation remains?

A recurring argument is that ranking will happen anyway—businesses will impose their own systems if engineers don’t. The fear is that corporate ranking criteria may be worse at detecting technical merit, especially when managers can’t reliably distinguish true engineering depth from persuasive communication. So the practical motivation is to build better internal assessment, even if a single universal ladder can’t exist.

Review Questions

What would a “level 2” developer need to consistently produce under the proposed scale, and why might that not translate across domains?
How do teaching incentives distort promotion or perceived rank in software compared with martial arts?
Which factors in the discussion make “10x” a shaky universal benchmark?

Key Points

1
A martial-arts-style “black belt means teacher” analogy doesn’t map cleanly to software because promotion incentives and job roles differ.
2
A proposed 0.0–3.0 scale ties competence to contribution type (additive value, infrastructural multipliers, global multipliers) and uses a competence/reliability curve (e.g., 1.8 ≈ ~85% at level-2 tasks).
3
“10x” performance varies by environment and project type, so a single universal metric can misclassify developers.
4
Specialization (UI vs. infrastructure, embedded C vs. SaaS, macro/DSL design vs. refactoring) makes one ladder inherently unfair.
5
Teaching can be over-incentivized in software, leading to premature or low-value instruction and skewed perceptions of competence.
6
Existing corporate titles already rank people imperfectly, often influenced by time, politics, and communication skill rather than pure technical ability.
7
Engineers may still need internal, role-aware assessment systems because businesses will rank developers regardless—often using criteria that don’t match technical merit.

Highlights

A “universal programming belt system” runs into a core obstacle: programming skill depends heavily on domain and context, so the same person can look different across environments.

The proposed 0.0–3.0 framework defines levels by contribution type—additive value (1), infrastructural multipliers (2), and global multipliers (3)—and tries to quantify competence with a reliability curve.

Teaching incentives in software can distort rank: people get rewarded for instruction, which may not align with who is best suited to mentor or who is best at engineering.

“10x” isn’t a stable concept across settings; the value gap can widen on greenfield work and shrink in corporate environments.

Even if universal ranking is rejected, the discussion argues ranking will happen anyway—so better engineer-driven assessment may be necessary.

Topics

Developer Ranking
Belt Systems
Competence Curves
10x Performance
Teaching Incentives