the FrankeNAS - (Raspberry Pi, Zima Board, Dell Server, Ugreen) // a CEPH Tutorial

TL;DR

Ceph-based “Seth” storage turns many computers into one NAS-like system by treating each drive as an independent OSD rather than building one RAID array per box.

Briefing Cornell Notes

Briefing

A DIY “FrankenNAS” built from mismatched hardware—Raspberry Pi boards, laptops, a Zima board, and an old Dell server—can act like one unified network storage system thanks to Ceph, deployed as software-defined storage with the open-source platform called “Seth” (Ceph’s ecosystem). The core payoff is scale without vendor lock-in: instead of managing one appliance that tops out, storage capacity and performance can grow by adding more machines and drives, while the cluster keeps data replicated, balanced, and resilient.

The build starts with the motivation: a previous Synology NAS delivered collaboration and speed but hit a storage ceiling. Buying another appliance would create a second, separate system that can’t be seamlessly combined, and future expansion would turn into a management tangle across brands and boxes. The FrankenNAS approach replaces that model with a storage cluster where multiple computers contribute storage and work together as a single pool.

At the heart of the design is Ceph’s decentralized architecture. A cluster is organized around a “manager” node that runs the dashboard and coordinates the system, plus monitor nodes that form quorum and keep the cluster healthy. Storage itself is not treated as one giant RAID array. Instead, each physical drive becomes an Object Storage Daemon (OSD)—a service that can be started, stopped, and scaled independently. Data is written as objects, distributed across OSDs, replicated for fault tolerance, and rebalanced as the cluster changes.

Ceph’s data placement is managed through a layered model: storage pools define rules (for example, “current projects” on SSDs with triple replication, and “archives” on HDDs using erasure coding for space efficiency). Placement groups (PGs) sit between pools and OSDs, acting as the unit that determines which OSDs hold replicas of a given object. When objects are stored, Ceph uses the CRUSH algorithm (Controlled Replication Under Scalable Hashing) and a “crush map” to decide where replicas go and how to find them later. Adding drives triggers intelligent rebalancing with minimal data movement; losing drives triggers recovery and redistribution.

After explaining the theory, the tutorial walks through deployment on “junk pile” hardware. Hosts run Ubuntu (Ubuntu 2204 for most machines, with Ubuntu 20.04 on Raspberry Pi due to container compatibility). The setup includes preparing drives (wiping them so Ceph can claim them), installing Docker, ensuring LVM2 is available, and synchronizing time via NTP. SSH root access is configured using certificates so the manager can orchestrate new nodes without password prompts.

Ceph is bootstrapped using Seth ADM on the manager node, then additional nodes are adopted through orchestration commands. Once hosts join, the remaining step is turning available devices into OSDs. The cluster health moves from warnings (no OSDs yet) to a clean state once dozens of OSD containers come online. From there, a CephFS file system is created (SethFS), which automatically provisions metadata and data roles and placement groups.

Finally, the storage is put to work in two ways: mounting CephFS on Linux for native kernel performance, and exposing it over SMB for Windows access. File transfers and reads are shown distributing across different OSDs rather than funneling through a single server, demonstrating the practical benefit of a decentralized storage fabric. The result is a scalable, mixed-hardware NAS that can expand by adding more servers and drives, while Ceph handles replication, recovery, and balancing in the background.

Cornell Notes

The FrankenNAS concept uses Ceph (via “Seth”) to turn many mismatched machines into one storage cluster. Instead of one RAID array inside a single NAS, each drive becomes an Object Storage Daemon (OSD), and files are stored as distributed objects across OSDs with replication or erasure coding. Ceph’s CRUSH algorithm and placement groups decide where objects live and how they’re retrieved, enabling fault tolerance and automatic rebalancing when hardware changes. The tutorial then shows how to deploy a cluster: prepare hosts (Ubuntu, Docker, time sync), wipe and claim drives as OSDs, bootstrap the manager with Seth ADM, adopt additional nodes, and create a CephFS file system. The storage is validated by mounting on Linux and sharing via SMB to Windows, with traffic spread across the cluster.

Why does the FrankenNAS use a storage cluster instead of adding another single-vendor NAS box?

A second appliance typically becomes a separate system that must be managed independently, and expansion across brands turns into a management and integration burden. With a Ceph-based cluster, multiple servers contribute storage to one logical system, so adding capacity means adding more nodes/drives rather than creating another siloed NAS.

What replaces “one big RAID array” in Ceph’s architecture?

Each physical drive is treated as its own Object Storage Daemon (OSD). OSDs run as services (daemons) and handle storing objects, replication, recovery, rebalancing, and reporting to monitors. This decentralization lets the cluster scale horizontally and avoids a single-device failure domain.

How do pools, placement groups, and OSDs work together when data is stored?

Storage pools define rules and device classes (e.g., SSD-only “current projects” with triple replication, HDD “archives” using erasure coding). Placement groups (PGs) are created per pool and determine which OSDs hold replicas for objects. When an object is written, Ceph maps it to a PG, selects a primary OSD within that PG, and replicates to the other OSDs in the same PG.

What role does CRUSH play in Ceph?

CRUSH (Controlled Replication Under Scalable Hashing) uses a crush map to decide where objects should be placed and how to find them later. It also supports intelligent rebalancing when OSDs are added or removed, aiming to minimize data movement while maintaining the desired replication/placement rules.

What does the tutorial require before bootstrapping the cluster?

Hosts need a supported OS (Ubuntu 2204 generally; Ubuntu 20.04 on Raspberry Pi for container compatibility), Docker installed, LVM2 available, and time synchronization via NTP. Drives must be wiped so Ceph can claim them. SSH root access is configured using certificates so the manager can orchestrate node adoption without password prompts.

How is the storage validated for real-world use?

The tutorial creates a CephFS file system (SethFS), then mounts it on Linux using kernel-native mounting and writes/reads files. It also exposes the same storage via SMB for Windows clients, sets permissions and an SMB user, and demonstrates file transfers and playback while Ceph distributes data placement across different OSDs.

Review Questions

In Ceph, what is the difference between a storage pool and a placement group, and why does that matter for data placement?
How does CRUSH influence both object placement and recovery/rebalancing when the cluster membership changes?
During deployment, what prerequisites must be satisfied on each host before drives can be converted into OSDs?

Key Points

1
Ceph-based “Seth” storage turns many computers into one NAS-like system by treating each drive as an independent OSD rather than building one RAID array per box.
2
Software-defined storage is hardware-agnostic in practice: mixed hardware can join the same cluster as long as the platform requirements are met.
3
CephFS file storage relies on a metadata server (MDS) role and data placement across OSDs, enabling parallelism and high availability.
4
Storage pools define performance/fault-tolerance policies (e.g., SSD pools with triple replication and HDD pools using erasure coding).
5
Placement groups (PGs) are the intermediate mapping layer that connects pools to specific OSD sets for each object.
6
CRUSH (Controlled Replication Under Scalable Hashing) and the crush map determine where objects go and how the system rebalances when OSDs are added or lost.
7
Operational validation can be done by mounting CephFS on Linux for native performance and by exposing it via SMB for Windows clients.

Highlights

The cluster’s “magic” is decentralization: each drive runs as an OSD daemon, and Ceph writes files as distributed objects across those daemons.

Pools + placement groups + OSDs form a layered placement model, so policies like SSD-only “current projects” and erasure-coded archives can coexist in one system.

CRUSH drives both placement and rebalancing, aiming to keep data evenly distributed while minimizing unnecessary data movement.

CephFS can be mounted on Linux for kernel-level access and shared to Windows via SMB, with clients reading/writing data that lands on different OSDs rather than one server bottleneck.

Topics

FrankenNAS
Ceph
CephFS
OSD
CRUSH

Mentioned

NAS
OSD
SMB
MDS
SSH
NTP
LVM2
OS
CPU
PG
CRUSH
Ceph
RADOS
RAID
USB
SSD
HDD
MDS
PGs