Get AI summaries of any video or article — Sign up free
A brief history of text markup languages - Tony Ibbs thumbnail

A brief history of text markup languages - Tony Ibbs

Write the Docs·
6 min read

Based on Write the Docs's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Markup languages evolved from physical typewriter conventions (like underlining for italics) into software systems that encode formatting intent.

Briefing

Text markup languages evolved from manual typewriter conventions into systems that separate meaning from appearance, then into formats designed for different workflows—device-independent output, machine-readable structure, and human-friendly authoring. The through-line is that markup keeps getting pulled toward two goals at once: making documents easier to write and making them easier for computers (and people) to interpret.

Early on, people already used “markup” in the physical sense: double-spacing for proofreading, underlining to signal italics, and uppercase-underlined text to indicate headings. Once computers entered the picture, the same impulse became software commands and conventions. A key early milestone was runoff (starting around 1964), created by Jerome H. Saltzer for the Compatible Time Sharing System. Runoff used line-based commands (notably lines starting with a dot) to control pagination and alignment—centering, left/right justification, and list constructs—so authors didn’t have to do tedious formatting by hand. It also introduced ideas that later formats reused: abbreviations for commands, optional start/end tags, and the ability to switch formatting modes within text.

By the late 1960s, markup shifted toward formal definitions. Charles Goldfarb, Edward Mosher, and Raymond Lorie at IBM developed GML (Generalized Markup Language), aiming not just to define a markup language but to define a way to define markup languages. Their approach used a document-type-definition style description (with elements, optional tags, and rules for nesting) that resembles how HTML’s structure is often taught. HTML itself later benefited from this “starter set” idea: early HTML versions shipped with predefined structures so authors could focus on content rather than building a full schema from scratch.

Another major thread was device independence. In the Unix world, roff (and its relatives) let the same source produce output for terminals or typesetters. Macros and escape sequences enabled authors to write once and render appropriately across devices, with different preprocessors handling terminal versus typesetter capabilities. This line also influenced later documentation ecosystems, including groff (GTH), which retains a steep command learning curve but has comparatively strong modern documentation.

For scientific and mathematical writing, TeX and LaTeX represented a different kind of breakthrough. Donald E. Knuth built TeX after copy editors repeatedly broke his equations while typesetting his work; the result was a system that could place mathematical symbols precisely and handle page layout algorithmically to avoid common formatting failures. LaTeX then wrapped TeX with macros so authors could write emphasis, quotes, and typographic conventions without dealing with low-level commands.

As the web and large-scale publishing grew, semantic markup became more prominent. TEI (Text Encoding Initiative) focused on marking up the meaning of literary and linguistic text—capturing rhyme schemes, identifiers, and structural relationships so information could be extracted. DocBook emphasized structured document metadata (authors, sections, article info) designed for searchability, including in industrial and military contexts. XML arrived as a simplifying standard: it removed HTML’s optional-tag ambiguity to make parsing and tooling more reliable.

Finally, human-friendly “lightweight” markup emerged as a counterweight to heavy syntax. Wiki-style plain-text conventions (Ward Cunningham’s WikiWikiWeb) used minimal rules—indentation for structure, simple quote counts for emphasis/bold, and deliberate restrictions—to keep discussions readable and writable. reStructuredText (from David G. J. Goodger) prioritized readability and output agnosticism, while Markdown (by John Gruber) aimed to make HTML authoring easier—though its ecosystem fragmented into flavors like GitHub Flavored Markdown and GitLab Flavored Markdown.

Taken together, the history shows markup languages repeatedly trading off three things: expressiveness, machine interpretability, and author ergonomics. The “best” choice depends on whether the priority is precise layout, semantic extraction, large-document search, or fast human writing with minimal friction.

Cornell Notes

Markup languages started as conventions for communicating formatting intent—like underlining for italics—and evolved into computer systems that separate meaning from appearance. Early tools such as runoff automated pagination and alignment using command lines, while GML formalized the idea of defining markup languages through structured definitions. Unix roff and related systems pushed device-independent output, letting the same source render well on terminals and typesetters. TeX and LaTeX focused on high-precision typesetting and reliable page layout, especially for mathematics. Later semantic and structured standards (TEI, DocBook, XML) targeted machine-readable meaning and searchability, while lightweight formats (Wiki syntax, reStructuredText, Markdown) optimized for human readability and quick authoring.

How did runoff (1964) turn formatting into something authors could delegate to software?

Runoff introduced line-based commands (notably lines starting with a dot) so authors could control pagination and alignment—centering, left/right justification—without manually doing it on a teletype or typewriter. It also included list constructs and supported abbreviations (e.g., LS for list and LE for list element). The system could switch formatting modes within text (using commands to shift in/out of cases like upper/lower), and it supported optional command parts so authors didn’t have to write everything verbosely.

What was the conceptual leap behind GML (Generalized Markup Language)?

GML’s key idea was meta-definition: instead of only defining one markup language, it defined a method for defining markup languages. Goldfarb, Mosher, and Lorie used a DTD-like structure to describe elements, optional start/end tags, required content counts, and nesting rules (including list items and nested lists). The result was a more formal way to specify document structure—similar in spirit to how HTML structure is later described.

Why did roff-style systems matter for documentation workflows?

roff-style systems made documentation source more reusable by supporting device-independent output. The same input could be processed to produce terminal-friendly monospace output (via macros like nroff) or typesetter/proportional output (via troff), while ignoring irrelevant device-specific commands when needed. Escape sequences and macro processors let authors write higher-level markup and have it expanded into complex formatting commands automatically.

What problem pushed Donald E. Knuth toward TeX, and what did TeX solve beyond math?

Knuth needed to typeset complex equations correctly, but copy editors repeatedly broke them when his work was typeset without access to his original tooling. TeX addressed precise placement of mathematical symbols and also introduced serious paragraph and pagination algorithms to avoid layout failures like widows/orphans and badly broken page structures. It also enabled literate programming, where code and documentation are interleaved and later reordered for presentation.

How do TEI and DocBook differ in what they emphasize?

TEI is primarily semantic: it marks up the meaning of literary and linguistic text so information can be extracted—such as rhyme schemes, labeled rhyme words, and identifiers tied to structural roles. DocBook is structured and metadata-heavy: it marks document components like article info and sections to enable searchability, and it was used for large, complex documents (including in industrial and military settings).

Why did lightweight markup formats like Wiki syntax, reStructuredText, and Markdown gain traction?

They reduce friction for authors by keeping the source readable and the rules simple. Wiki syntax (Ward Cunningham’s WikiWikiWeb) used minimal conventions—indentation for structure, quote counts for emphasis/bold, and deliberate limits (no tables/headers/links)—to keep discussions easy to write and read. reStructuredText (David G. J. Goodger) prioritized readability and output agnosticism, stopping short of overly complex features unless a more specialized system is used. Markdown (John Gruber) aimed to make HTML authoring easier, but its lack of thorough documentation and the inclusion of raw HTML led to many variants (e.g., GitHub Flavored Markdown, GitLab Flavored Markdown).

Review Questions

  1. Which early markup system introduced command lines for pagination and alignment, and what specific command style did it use?
  2. What does it mean for a markup approach to be “semantic” rather than “presentational,” and give one example from TEI or DocBook.
  3. Compare TeX/LaTeX with roff: what each prioritized (precision/layout vs device-independent output) and why that mattered for authors.

Key Points

  1. 1

    Markup languages evolved from physical typewriter conventions (like underlining for italics) into software systems that encode formatting intent.

  2. 2

    Runoff (Jerome H. Saltzer) helped automate pagination and alignment using command lines, including list constructs and optional command parts.

  3. 3

    GML (Goldfarb, Mosher, Lorie) pushed the idea of defining markup languages through structured definitions, not just inventing one fixed syntax.

  4. 4

    roff-style tooling made documentation more reusable by supporting device-independent output via different preprocessors and macro expansion.

  5. 5

    TeX and LaTeX centered on precise mathematical typesetting and reliable page layout, with LaTeX providing higher-level macros for authors.

  6. 6

    TEI and DocBook represent semantic/structured markup priorities—meaning extraction for TEI and search-friendly document structure for DocBook.

  7. 7

    Lightweight formats (Wiki syntax, reStructuredText, Markdown) optimized for human readability and fast authoring, often trading away some expressiveness or standardization.

Highlights

Runoff turned boring manual alignment and pagination into delegated commands, using a distinctive dot-command style to separate instructions from text.
GML’s “define a way to define” approach treated markup as something you can formally specify—anticipating how later document types and schemas work.
TeX emerged from a real typesetting failure: equations kept getting mangled, leading to a system built for exact placement and robust pagination.
TEI marks up literary meaning (like rhyme schemes and labeled rhyme words), while DocBook marks up document structure and metadata for searchability.
Markdown’s popularity came with ecosystem fragmentation: limited documentation and the ability to embed HTML helped produce many “flavors.”

Topics

  • History of Markup Languages
  • Device-Independent Typesetting
  • Semantic vs Presentational Markup
  • TeX and LaTeX
  • Lightweight Markup
  • Document Type Definitions

Mentioned

  • Jerome H. Saltzer
  • Charles Goldfarb
  • Edward Mosher
  • Raymond Lorie
  • Donald E. Knuth
  • Leslie Lamport
  • David G. J. Goodger
  • Ward Cunningham
  • John Gruber
  • HTML
  • DTD
  • GML
  • IBM
  • XML
  • TEI
  • RFC
  • ASCII
  • CV
  • LaTeX
  • TeX
  • roff
  • GTH
  • PDF