Github - You Can View Deleted Private Fork Data

TL;DR

GitHub can keep commit objects reachable through repository-network relationships even after a fork or upstream repository is deleted.

Briefing Cornell Notes

Briefing

GitHub’s fork and repository-network design can leave commit data accessible even after a repository or fork is deleted—meaning secrets embedded in commits may remain retrievable if someone knows (or can guess) the commit hash. The practical risk is a cross-fork object reference (C4) style issue: one fork can access sensitive data from another fork’s commit history, including commits that were supposed to disappear when the fork or upstream repo was removed.

A core workflow illustrates the problem. A user forks a public repository, adds a file like secrets.py containing “secret” content, commits it, and then deletes the fork. Even after deletion, the commit’s content can still be pulled from the original repository network by using the commit SHA. In the demo, refreshing the original repository page returns a 404 for the deleted fork, yet the commit remains reachable when the SHA from the secret commit is inserted into the appropriate GitHub URL pattern. The underlying reason is that Git stores objects (commits, trees, blobs) in a content-addressed way; deleting a ref doesn’t necessarily erase the underlying objects immediately, and GitHub’s repository-network relationships can keep those objects reachable.

The transcript also ties this behavior to Git internals and GitHub’s UI/URL mechanics. Git’s “plumbing” (including commands like git cat-file and the idea of reflog) helps explain why data can persist locally and why refs can be misleading. On GitHub, the key enabler is that commit hashes can be referenced directly. Users typically only need a short SHA in the interface—often as few as four characters in some cases—because GitHub and the Git protocol allow short hashes to resolve to full commits. That creates a brute-force angle: short SHA values expand the search space enough that attackers can attempt guesses through the UI until a valid commit is found.

The discussion goes further with scale and methodology claims. One example cited: researchers reportedly surveyed forked public repositories from a large AI company and found a large number of valid API keys originating from deleted forks. Another scenario: when an upstream repository is deleted, GitHub can reassign the “root node” of the repository network to a downstream fork, while commits from the original upstream remain accessible via any fork in that network. That means “deleting the repo” may not be equivalent to “removing the commit history from reach.”

For private-to-public release pipelines, the transcript describes a related visibility split. Commits made in a private fork before a project becomes public can become accessible through the public repository network if the commit objects remain reachable. The takeaway is blunt: remediation for leaked secrets on GitHub is effectively key rotation, not relying on deletion, squashing, or UI removal.

Even legal and compliance questions surface—such as how “deleted” data interacts with privacy requests—alongside the security framing that the only reliable fix is to treat leaked credentials as compromised and rotate them immediately.

Cornell Notes

GitHub’s fork and repository-network structure can keep commit objects accessible after a fork or upstream repository is deleted. In practice, a commit that added “secret” content can still be retrieved later if someone has the commit SHA and navigates directly to the commit URL, even when the repository page shows a 404. The risk is amplified by GitHub’s support for short SHAs in the UI, which can be brute-forced to discover valid commits. When upstream repos are deleted, GitHub may reassign the repository-network root to a downstream fork, leaving the original commits reachable. Because deletion doesn’t reliably remove underlying commit objects, leaked credentials should be remediated through key rotation rather than cleanup alone.

How can deleted fork content still be accessible on GitHub?

The transcript’s demo shows a fork where a commit adds secrets.py, then the fork is deleted. After deletion, the fork’s repository page can return a 404, but the commit’s content remains retrievable when the commit SHA from the secret commit is used in the commit URL. This works because GitHub’s repository-network relationships keep commit objects reachable even when refs/repository pages disappear.

What is the security mechanism being highlighted (C4 / IDOR-style access)?

The discussion frames the issue as a cross-fork object reference: one repository network path can expose sensitive commit data from another fork’s history. It’s likened to an insecure direct object reference (IDOR) because direct access hinges on knowing an identifier (the commit hash). With the right SHA, an attacker can navigate to commit data that would otherwise be hidden by normal UI permissions.

Why do short SHAs matter for exploitation?

GitHub’s UI and Git protocol allow short SHA values to resolve to full commits. The transcript notes that users may only need a few characters (e.g., four or five) to reach a valid commit, depending on collision/uniqueness. That reduces the barrier for discovery: instead of guessing a full 32-character SHA, an attacker can brute-force short prefixes until the UI resolves to a real commit.

What happens when an upstream repository is deleted?

A described scenario: a public upstream repo is forked, the fork falls behind (no syncing), and then the upstream is deleted. GitHub can reassign the repository-network root node to a downstream fork, while commits from the original upstream still exist and remain accessible through any fork in the network. The result is that “deleting the upstream repo” doesn’t necessarily remove the commit history from reach.

How does private-to-public release change the risk?

The transcript describes a pipeline where a private repo is forked to build internal features, then the upstream becomes public later. Commits made in the private fork during the internal phase can become viewable once the repository network is exposed publicly, because the commit objects remain reachable even if visibility changes split the network into private and public views.

What remediation does the transcript recommend for leaked secrets?

Key rotation. The argument is that deletion, squashing, or UI removal doesn’t reliably eliminate underlying commit objects from access paths. If an API key or credential may have been exposed via commit history, the only dependable fix is to rotate the credential (and assume it’s compromised), rather than trying to “delete it away.”

Review Questions

What specific role does the commit SHA play in retrieving data that appears deleted or 404’d?
How does GitHub’s handling of short SHAs change the difficulty of discovering commit objects?
Why might deleting an upstream repository fail to remove commit history from downstream forks?

Key Points

1
GitHub can keep commit objects reachable through repository-network relationships even after a fork or upstream repository is deleted.
2
Direct commit access using the commit SHA can bypass what normal repository pages and standard Git operations hide.
3
Short SHA support in GitHub’s UI can enable discovery by brute-forcing short prefixes that resolve to real commits.
4
When an upstream repo is deleted, GitHub may reassign the repository-network root to a downstream fork, leaving upstream commits accessible.
5
Private commits made in internal forks can become accessible after public visibility changes, depending on repository-network reachability.
6
The practical security remediation for leaked credentials on GitHub is key rotation, not relying on deletion or squashing to remove exposure.
7
Assume that “deleted” on GitHub may not mean “unrecoverable” if commit identifiers are known or guessable.

Highlights

A fork can be deleted and still have its secret commit content retrievable later by using the commit SHA, even when the repository page returns a 404.

GitHub’s short SHA behavior lowers the barrier to finding commit objects, turning commit-hash guessing into a plausible attack path.

Deleting an upstream repository can leave its commit history accessible because GitHub can re-root the repository network to a downstream fork.

For leaked secrets, deletion isn’t a reliable fix—rotation is presented as the only dependable remediation.

Topics

GitHub Forks
Deleted Repositories
Commit Hashes
API Key Exposure
Cross-Fork Access

Mentioned

C4
IDOR