the FrankeNAS - (Raspberry Pi, Zima Board, Dell Server, Ugreen) // a CEPH Tutorial
Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Ceph-based “Seth” storage turns many computers into one NAS-like system by treating each drive as an independent OSD rather than building one RAID array per box.
Briefing
A DIY “FrankenNAS” built from mismatched hardware—Raspberry Pi boards, laptops, a Zima board, and an old Dell server—can act like one unified network storage system thanks to Ceph, deployed as software-defined storage with the open-source platform called “Seth” (Ceph’s ecosystem). The core payoff is scale without vendor lock-in: instead of managing one appliance that tops out, storage capacity and performance can grow by adding more machines and drives, while the cluster keeps data replicated, balanced, and resilient.
The build starts with the motivation: a previous Synology NAS delivered collaboration and speed but hit a storage ceiling. Buying another appliance would create a second, separate system that can’t be seamlessly combined, and future expansion would turn into a management tangle across brands and boxes. The FrankenNAS approach replaces that model with a storage cluster where multiple computers contribute storage and work together as a single pool.
At the heart of the design is Ceph’s decentralized architecture. A cluster is organized around a “manager” node that runs the dashboard and coordinates the system, plus monitor nodes that form quorum and keep the cluster healthy. Storage itself is not treated as one giant RAID array. Instead, each physical drive becomes an Object Storage Daemon (OSD)—a service that can be started, stopped, and scaled independently. Data is written as objects, distributed across OSDs, replicated for fault tolerance, and rebalanced as the cluster changes.
Ceph’s data placement is managed through a layered model: storage pools define rules (for example, “current projects” on SSDs with triple replication, and “archives” on HDDs using erasure coding for space efficiency). Placement groups (PGs) sit between pools and OSDs, acting as the unit that determines which OSDs hold replicas of a given object. When objects are stored, Ceph uses the CRUSH algorithm (Controlled Replication Under Scalable Hashing) and a “crush map” to decide where replicas go and how to find them later. Adding drives triggers intelligent rebalancing with minimal data movement; losing drives triggers recovery and redistribution.
After explaining the theory, the tutorial walks through deployment on “junk pile” hardware. Hosts run Ubuntu (Ubuntu 2204 for most machines, with Ubuntu 20.04 on Raspberry Pi due to container compatibility). The setup includes preparing drives (wiping them so Ceph can claim them), installing Docker, ensuring LVM2 is available, and synchronizing time via NTP. SSH root access is configured using certificates so the manager can orchestrate new nodes without password prompts.
Ceph is bootstrapped using Seth ADM on the manager node, then additional nodes are adopted through orchestration commands. Once hosts join, the remaining step is turning available devices into OSDs. The cluster health moves from warnings (no OSDs yet) to a clean state once dozens of OSD containers come online. From there, a CephFS file system is created (SethFS), which automatically provisions metadata and data roles and placement groups.
Finally, the storage is put to work in two ways: mounting CephFS on Linux for native kernel performance, and exposing it over SMB for Windows access. File transfers and reads are shown distributing across different OSDs rather than funneling through a single server, demonstrating the practical benefit of a decentralized storage fabric. The result is a scalable, mixed-hardware NAS that can expand by adding more servers and drives, while Ceph handles replication, recovery, and balancing in the background.
Cornell Notes
The FrankenNAS concept uses Ceph (via “Seth”) to turn many mismatched machines into one storage cluster. Instead of one RAID array inside a single NAS, each drive becomes an Object Storage Daemon (OSD), and files are stored as distributed objects across OSDs with replication or erasure coding. Ceph’s CRUSH algorithm and placement groups decide where objects live and how they’re retrieved, enabling fault tolerance and automatic rebalancing when hardware changes. The tutorial then shows how to deploy a cluster: prepare hosts (Ubuntu, Docker, time sync), wipe and claim drives as OSDs, bootstrap the manager with Seth ADM, adopt additional nodes, and create a CephFS file system. The storage is validated by mounting on Linux and sharing via SMB to Windows, with traffic spread across the cluster.
Why does the FrankenNAS use a storage cluster instead of adding another single-vendor NAS box?
What replaces “one big RAID array” in Ceph’s architecture?
How do pools, placement groups, and OSDs work together when data is stored?
What role does CRUSH play in Ceph?
What does the tutorial require before bootstrapping the cluster?
How is the storage validated for real-world use?
Review Questions
- In Ceph, what is the difference between a storage pool and a placement group, and why does that matter for data placement?
- How does CRUSH influence both object placement and recovery/rebalancing when the cluster membership changes?
- During deployment, what prerequisites must be satisfied on each host before drives can be converted into OSDs?
Key Points
- 1
Ceph-based “Seth” storage turns many computers into one NAS-like system by treating each drive as an independent OSD rather than building one RAID array per box.
- 2
Software-defined storage is hardware-agnostic in practice: mixed hardware can join the same cluster as long as the platform requirements are met.
- 3
CephFS file storage relies on a metadata server (MDS) role and data placement across OSDs, enabling parallelism and high availability.
- 4
Storage pools define performance/fault-tolerance policies (e.g., SSD pools with triple replication and HDD pools using erasure coding).
- 5
Placement groups (PGs) are the intermediate mapping layer that connects pools to specific OSD sets for each object.
- 6
CRUSH (Controlled Replication Under Scalable Hashing) and the crush map determine where objects go and how the system rebalances when OSDs are added or lost.
- 7
Operational validation can be done by mounting CephFS on Linux for native performance and by exposing it via SMB for Windows clients.