Insane Vulnerability In OpenSSH Discovered

TL;DR

OpenSSH sshd can become exploitable when SIGALRM runs asynchronously after the login grace time and invokes async-signal-unsafe code, enabling heap inconsistency.

Briefing Cornell Notes

Briefing

OpenSSH’s sshd has a remote-code-execution path tied to a signal-handler race: if an unauthenticated client fails to authenticate within the login grace time (120 seconds by default), sshd’s SIGALRM handler can run asynchronously and call functions that are not async-signal-safe. That mismatch—signal context interrupting unsafe heap/logging code—creates an inconsistent memory state that attackers can steer into arbitrary code execution, including remote root shells on affected Linux systems.

Researchers traced the issue to a regression of older OpenSSH signal-handler bugs (notably CVE-2006-5051). The regression was introduced by a 2020 change to OpenSSH’s logging infrastructure (OpenSSH 8.5 P1), which accidentally removed an ifdef that previously ensured “log safe” behavior inside the SIGALRM handler. As a result, OpenSSH versions from 4.4 up through 8.5 P1 are vulnerable in the default configuration, while versions outside that window are not—because the “log safe” guard is present again.

Exploitation hinges on timing and heap manipulation. The core strategy is to repeatedly trigger the SIGALRM handler while sshd is inside specific malloc/free code paths—especially those reached during public-key parsing. In older, easier-to-exploit OpenSSH builds, the work focused on interrupting free() calls inside parsing logic, leaving the heap in a state that can be exploited during a subsequent free() inside the SIGALRM handler. Reported success rates are low but real: experiments averaged about 10,000 attempts per successful race, translating to roughly a week to obtain a remote root shell under default settings (with parameters like max startups and the 120-second grace time).

The research then shows the attack is not limited to one code path or one OpenSSH generation. For Debian and Ubuntu builds that remain vulnerable to the earlier regression, attackers used different parsing targets (e.g., DSA public key parsing) and improved odds by expanding the “race window” into many smaller windows—turning one interrupt opportunity into dozens of free() opportunities within the same grace-time interval. Later, the team adapted to newer mitigations and different libc behavior by chaining heap exploitation techniques (including “unlink” style manipulation and “House of the Mind” style arena corruption) to gain control over function pointers.

On modern Linux, the SIGALRM handler’s behavior becomes the linchpin. The team found that glibc’s malloc-family functions can be reached from within the SIGALRM path via CIS log, and that the relevant libc code paths can be interrupted to create exploitable heap inconsistencies. For i386, they describe a method that overwrites a single-byte vtable offset in a glibc file-structure object (a TZ file read structure), redirecting execution through controlled function pointers during the SIGALRM-triggered cleanup.

Mitigations are practical but nuanced. A June 6, 2024 fix moves the problematic behavior out of the asynchronous signal handler by penalizing problematic client behavior and routing the unsafe work to a synchronous listener process. If updating isn’t possible, setting login grace time to zero prevents the remote-code-execution race (but can turn the issue into a denial-of-service risk by exhausting max startup connections). The write-up also notes that some platforms (notably OpenBSD) avoid the issue because their SIGALRM handler uses an async-signal-safe logging variant.

Overall, the core finding is that a small regression in “signal-safe logging” reintroduced an async-signal-unsafe execution path in sshd. With enough timing precision and heap shaping, that turns an authentication grace-time timeout into a pathway to remote root on affected Linux distributions.

Cornell Notes

The vulnerability is a regression in OpenSSH’s sshd where the SIGALRM handler (triggered when unauthenticated clients miss the login grace time) calls functions that are not async-signal-safe. That lets attackers interrupt malloc/free and related heap/logging code mid-operation, leaving the heap in an inconsistent state that can be exploited during later SIGALRM-triggered cleanup. Researchers linked the regression to CVE-2006-5051 and traced it to a 2020 OpenSSH logging change (OpenSSH 8.5 P1) that removed a “log safe” guard for the signal handler. Exploitation required heavy timing and heap grooming, but experiments achieved remote root shells on affected Linux systems with on the order of 10,000 attempts per successful race. Fixes include moving the unsafe work out of the signal handler and, as a fallback, setting login grace time to zero to prevent the RCE race (at the cost of potential DoS via connection exhaustion).

Why does an authentication timeout (login grace time) become a remote-code-execution trigger?

When a client fails to authenticate within the login grace time (120 seconds by default), sshd’s SIGALRM handler runs. In affected versions, that handler calls code that is not async-signal-safe—meaning it can interrupt execution while sshd is inside malloc/free-related logic. The interruption leaves heap structures inconsistent, and the subsequent SIGALRM-driven cleanup path can operate on attacker-influenced heap state, enabling control-flow hijacking.

What changed in OpenSSH that reintroduced the bug?

The issue is described as a regression of CVE-2006-5051. A logging-infrastructure revision introduced in October 2020 (OpenSSH 8.5 P1) accidentally removed an ifdef that ensured “log safe” behavior inside the SIGALRM handler. Without that guard, versions from 4.4 up through 8.5 P1 are vulnerable in default configuration; versions outside that range are not because the safe guard is present.

How do attackers make the race condition practical rather than purely theoretical?

They repeatedly trigger SIGALRM while sshd is in specific heap-manipulating code paths (notably public-key parsing). By shaping the heap and using precise timing, they aim to interrupt a free() (or related allocator step) during the grace-time window so that a later free() in the SIGALRM handler acts on corrupted metadata. Reported experiments averaged about 10,000 tries to win a race, with success times ranging from days to about a week depending on the target version and configuration.

How did the attack adapt to newer versions with stronger mitigations?

As straightforward “interrupt free and win” approaches became harder, the work expanded the effective race window into many smaller windows (e.g., leveraging multiple free opportunities within one grace interval). It also used different exploitation primitives depending on libc behavior—such as unlink-style heap manipulation and “House of the Mind”-style arena corruption—to gain control over function pointers. On i386, one described method overwrote a single-byte vtable offset in a glibc file-structure object to redirect execution during the SIGALRM-triggered TZ file read path.

What mitigations reduce risk, and what trade-offs remain?

The June 6, 2024 fix moves the problematic behavior out of the asynchronous signal handler into a synchronous listener process, preventing the unsafe interruption pattern. As a fallback, setting login grace time to zero prevents the SIGALRM-triggered RCE race, but it can enable denial-of-service by exhausting max startup connections. Some platforms (e.g., OpenBSD) are described as not vulnerable because their SIGALRM logging path uses an async-signal-safe variant.

Review Questions

Which specific condition triggers the vulnerable SIGALRM handler path in sshd, and what property of the handler makes it exploitable?
How did the 2020 logging change (OpenSSH 8.5 P1) affect the presence or absence of the “log safe” guard in the signal handler?
Why does expanding a “large race window” into many “small race windows” increase the attacker’s odds of winning the race?

Key Points

1
OpenSSH sshd can become exploitable when SIGALRM runs asynchronously after the login grace time and invokes async-signal-unsafe code, enabling heap inconsistency.
2
The vulnerability is a regression tied to CVE-2006-5051 and was reintroduced by a logging-infrastructure change in OpenSSH 8.5 P1 that removed a “log safe” guard for the signal handler.
3
Affected OpenSSH versions include those from 4.4 up through 8.5 P1 under default configuration, while versions outside that window are described as not vulnerable.
4
Exploitation relies on interrupting malloc/free-related operations during public-key parsing, then leveraging SIGALRM-triggered cleanup to act on corrupted heap metadata.
5
Attack reliability improves by turning one interrupt opportunity into many smaller race windows within the same grace-time interval.
6
Mitigations include the June 6, 2024 fix that routes unsafe work away from the signal handler into a synchronous process, and a fallback of setting login grace time to zero (which trades RCE risk for potential DoS).

Highlights

A timeout-based SIGALRM handler in sshd can call non-async-signal-safe functions, letting attackers interrupt heap operations and later exploit the resulting inconsistent state.

The regression was introduced by a 2020 OpenSSH logging change (OpenSSH 8.5 P1) that accidentally removed the “log safe” ifdef for the SIGALRM path.

Experiments report real-world feasibility: roughly 10,000 attempts per successful race, with remote root shells achievable on affected Linux targets.

A modern i386-focused technique described overwrites a single-byte vtable offset in a glibc file-structure object to redirect execution during SIGALRM-triggered TZ file read cleanup.

Topics

OpenSSH RCE
SIGALRM Race
Heap Exploitation
Async-Signal-Safety
Mitigations

Mentioned

RCE
CVE
CIS
SIGALRM
ASLR
NX
DoS
GSSAPI
DSA
TZ
IO
IOW
IOW file
Glibc
PAM
NX
ASLR
SIG
SIG die
SIG alarm