Proxmox + Ceph + PBS: Complete Guide for a Production-Ready Homelab
Why Proxmox + Ceph for a homelab
Proxmox VE is the serious open source alternative to VMware/ESXi. Enterprise-grade hypervisor, no license cost, with a web interface that doesn’t embarrass you. But where it really shines is when you add Ceph.
Ceph integrated into Proxmox gives you distributed storage without needing an external NAS or SAN. Your local disks become a unified storage pool that VMs consume as if it were an enterprise SAN. And the best part: you can separate fast disks (SSD) from capacity disks (HDD) into different pools, each with their own replication rules.
Add Proxmox Backup Server and you have the complete triangle: virtualization, tiered storage, and backups with deduplication — all on one platform, all open source.
This setup works the same with 1 node or 3+. The architecture scales without redesigning anything. Same CRUSH rules, same pools, same backup jobs. When you add nodes, Ceph distributes data automatically.
Hardware and planning
What you need:
- CPU with VT-x/VT-d — any modern server or desktop processor has it. VT-d is important if you plan to do PCI passthrough (GPUs, network controllers, etc.)
- RAM: 32GB minimum, 64GB recommended — Proxmox itself uses little, but VMs and Ceph OSDs add up. Each OSD reserves ~1GB of RAM by default. With 4 OSDs that’s already 4GB just for Ceph.
- Dedicated disks for Ceph — this is the most important decision
Disk planning:
| Disk | Use | Example |
|---|---|---|
| Small SSD/NVMe (128-256GB) | Proxmox OS | Boot disk |
| Dedicated SSD/NVMe | OSDs for fast pool | VMs, databases |
| Dedicated HDDs | OSDs for bulk pool | Backups, ISOs, templates |
For networking: on single-node, standard 1GbE is enough because Ceph traffic is local. On multi-node, a dedicated network for Ceph is practically mandatory — ideally 10GbE, or at minimum a separate VLAN.
Don’t use the system disk as a Ceph OSD. It seems tempting to make use of the space, but when Ceph fills up (and it will), the operating system runs out of space and you lose access to the node. The system disk is sacred — only for Proxmox OS and configuration.
Proxmox VE installation
Download the Proxmox VE ISO from the official website and install on the system SSD/NVMe. The installation is a standard wizard — select disk, configure network, root password, and done.
Post-installation
First step: adjust the repositories. If you don’t have a subscription (and for a homelab you don’t need one), switch to the no-subscription repo:
# Remove enterprise repo (requires subscription)
rm /etc/apt/sources.list.d/pve-enterprise.list
# Add no-subscription repo
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > \
/etc/apt/sources.list.d/pve-no-subscription.list
# Add Ceph no-subscription repo
echo "deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription" > \
/etc/apt/sources.list.d/ceph.list
# Update
apt update && apt full-upgrade -y
Proxmox works perfectly without a subscription. The no-subscription repository is stable and receives regular updates. The subscription gives you access to the enterprise repo (same versions but with additional testing before release) and technical support. For a homelab, the no-subscription repo is more than enough.
Verify that the network bridge vmbr0 is correctly configured. Proxmox creates it during installation, but confirm it has the correct static IP and gateway:
cat /etc/network/interfaces
You should see something like:
auto vmbr0
iface vmbr0 inet static
address 192.168.1.100/24
gateway 192.168.1.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
Ceph installation and configuration
Install Ceph
From the Proxmox node CLI:
pveceph install
This installs the version of Ceph that Proxmox natively supports (Squid on PVE 8.x). The integration is complete — Proxmox manages Ceph’s configuration, services, and monitoring.
Initialize the Ceph cluster
# Initialize Ceph with the cluster network
pveceph init --network 192.168.1.0/24
# Create the monitor (required for Ceph to function)
pveceph mon create
# Create the manager (required for dashboard and metrics)
pveceph mgr create
On multi-node you’d create monitors and managers on multiple nodes for high availability. On single-node, one of each is enough.
Single-node configuration
Ceph by default expects 3 replicas of each object, which requires 3 nodes. On single-node you need to adjust this:
# Configure Ceph for single-node
ceph config set global osd_pool_default_size 1
ceph config set global osd_pool_default_min_size 1
With size 1, there is NO data redundancy. If a disk dies, you lose whatever was on that OSD. That’s why PBS is critical in this setup — your backups are your redundancy. On multi-node with size 2 or 3, Ceph replicates data between nodes automatically and you can survive the loss of a disk or even an entire node.
Creating OSDs — SSD and HDD separately
An OSD (Object Storage Daemon) is a Ceph process that manages a physical disk. Each disk dedicated to Ceph becomes an OSD.
Ceph automatically detects the disk type and assigns a device class: ssd, hdd, or nvme. This classification is the foundation for separating storage into pools by performance.
Create the OSDs
From the CLI:
# Create OSD for each dedicated disk
pveceph osd create /dev/sdb # SSD
pveceph osd create /dev/sdc # SSD
pveceph osd create /dev/sdd # HDD
pveceph osd create /dev/sde # HDD
Or from the GUI: Ceph → OSD → Create OSD, select the disk and Proxmox handles the rest.
Verify device classes
ceph osd tree
$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.63689 root default -3 3.63689 host pve-node 0 ssd 0.46500 osd.0 up 1.00000 1.00000 1 ssd 0.46500 osd.1 up 1.00000 1.00000 2 hdd 1.81940 osd.2 up 1.00000 1.00000 3 hdd 0.88249 osd.3 up 1.00000 1.00000
Each OSD should show its correct class. If Ceph doesn’t detect the class properly (rare, but happens with some RAID controllers or disks behind an HBA), you can force it manually:
# Force device class (must remove current one first)
ceph osd crush rm-device-class osd.0
ceph osd crush set-device-class ssd osd.0
ceph osd crush rm-device-class osd.2
ceph osd crush set-device-class hdd osd.2
CRUSH Rules — Separating pools by disk type
This is the most important part of the setup and where many people get lost.
CRUSH (Controlled Replication Under Scalable Hashing) is the algorithm Ceph uses to decide where to store each object. CRUSH rules tell Ceph which disks it can use for each pool.
By default, Ceph has a single CRUSH rule (replicated_rule) that distributes data across all OSDs without distinguishing type. That means your production VM data could end up on a slow HDD, and your backups on an expensive SSD. We don’t want that.
Create CRUSH rules by device class
# Rule for SSDs — only uses OSDs with class 'ssd'
ceph osd crush rule create-replicated ssd-rule default host ssd
# Rule for HDDs — only uses OSDs with class 'hdd'
ceph osd crush rule create-replicated hdd-rule default host hdd
The syntax is: ceph osd crush rule create-replicated <name> <root> <failure-domain> <device-class>
- root:
default— the root node of the CRUSH map - failure-domain:
host— Ceph distributes replicas across different hosts - device-class:
ssdorhdd— only uses disks of this type
The host parameter as failure domain means Ceph tries to place each replica on a different host. On single-node this doesn’t apply (there’s only one host), but the rule works anyway because we have size 1. When you add nodes to the cluster and increase size to 2 or 3, cross-host replication activates automatically without changing anything — the rule is already prepared.
Verify the created rules:
ceph osd crush rule ls
$ ceph osd crush rule ls replicated_rule ssd-rule hdd-rule
Creating pools — SSD pool and HDD pool
With the CRUSH rules in place, now we create the pools that actually use those rules.
SSD pool — fast storage
For VM disks, databases, and any I/O-intensive workload:
# Create SSD pool with the SSD CRUSH rule
ceph osd pool create ssd-pool 64 replicated ssd-rule
# On single-node: no replication
ceph osd pool set ssd-pool size 1
ceph osd pool set ssd-pool min_size 1
# Enable as RBD (required for Proxmox to use it for VM disks)
ceph osd pool application enable ssd-pool rbd
HDD pool — bulk storage
For backups (PBS VM disk), ISOs, templates, and general storage:
# Create HDD pool with the HDD CRUSH rule
ceph osd pool create hdd-pool 64 replicated hdd-rule
# On single-node: no replication
ceph osd pool set hdd-pool size 1
ceph osd pool set hdd-pool min_size 1
# Enable as RBD
ceph osd pool application enable hdd-pool rbd
Calculating PG numbers
The number of Placement Groups (PGs) affects performance and data distribution. The general formula is:
PGs = (OSDs × 100) / replica size
For a small setup (2-4 OSDs per pool, size 1): 64 PGs per pool is enough. For larger setups, 128 or 256. Proxmox has a PG autoscaler that adjusts this automatically, but a reasonable initial value avoids warnings.
Add pools as storage in Proxmox
From the GUI: Datacenter → Storage → Add → RBD
For the SSD pool:
- ID:
ceph-ssd - Pool:
ssd-pool - Content: VM Disks, Containers
For the HDD pool:
- ID:
ceph-hdd - Pool:
hdd-pool - Content: VM Disks, Containers, ISO Images, Container Templates
Or via CLI by editing /etc/pve/storage.cfg:
rbd: ceph-ssd
pool ssd-pool
content images,rootdir
krbd 0
rbd: ceph-hdd
pool hdd-pool
content images,rootdir,iso,vztmpl
krbd 0
Proxmox Backup Server in a VM
PBS as a VM inside Proxmox itself is a pragmatic compromise. It’s not ideal — the backup lives on the same hardware as the data — but it’s infinitely better than having no backups. And for a homelab, it works.
Create the PBS VM
Download the PBS ISO from the Proxmox website and create a VM with these specs:
- CPU: 2 vCPUs
- RAM: 2-4 GB (PBS is very efficient)
- Disk: 50-100GB on the HDD pool (
ceph-hdd) — backups don’t need fast storage - Network: bridge
vmbr0, static IP
Install PBS normally. It’s a wizard similar to Proxmox — disk, network, password, and done.
PBS post-installation
Access the PBS web interface (port 8007) and configure:
- Datastore: the VM’s virtual disk is already mounted. Create a datastore pointing to it.
- Verification: enable automatic backup verification — PBS can verify the integrity of each backup after creation.
Integrate PBS with Proxmox
On the Proxmox node, add PBS as storage:
From the GUI: Datacenter → Storage → Add → Proxmox Backup Server
- ID:
pbs - Server: PBS VM IP
- Username:
root@pam - Password: the one you configured in PBS
- Datastore: the datastore name you created
- Fingerprint: PBS shows it at Dashboard → Show Fingerprint
Or generate an API token in PBS for passwordless authentication, which is recommended for automation.
Configure backup jobs
In Proxmox: Datacenter → Backup → Add
- Storage:
pbs - Schedule:
dailyfor critical VMs,weeklyfor the rest - Selection: select the VMs/CTs to include
- Mode:
Snapshot(doesn’t require shutting down the VM) - Retention (pruning): configure how many backups to keep
A reasonable retention scheme:
keep-daily: 7
keep-weekly: 4
keep-monthly: 3
This keeps the last 7 daily backups, 4 weekly, and 3 monthly. PBS applies pruning automatically after each backup.
PBS uses block-level deduplication — incremental backups are extremely efficient. After the first full backup, subsequent ones only transfer the blocks that changed. A 100GB VM that changes 2GB per day takes ~2GB per incremental backup, not 100GB. The space savings are massive.
Verification and monitoring
With everything set up, verify that the cluster is healthy.
Ceph status
ceph status
$ ceph status cluster: id: a1b2c3d4-e5f6-7890-abcd-ef1234567890 health: HEALTH_OK
services: mon: 1 daemons, quorum pve-node (age 4d) mgr: pve-node(active, since 4d) osd: 4 osds: 4 up (since 4d), 4 in (since 2w)
data: pools: 2 pools, 128 pgs objects: 1.24k objects, 48 GiB usage: 52 GiB used, 3.58 TiB / 3.64 TiB avail pgs: 128 active+clean
What matters: HEALTH_OK, all OSDs up and in, and all PGs active+clean.
Verify pools use the correct disks
ceph osd pool ls detail
Each pool should show the corresponding CRUSH rule. Look for crush_rule in the output — it should match ssd-rule or hdd-rule.
Space usage by pool
ceph df
$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 2.70 TiB 2.58 TiB 123 GiB 126 GiB 4.56 ssd 930 GiB 882 GiB 47 GiB 49 GiB 5.24 TOTAL 3.64 TiB 3.46 TiB 170 GiB 175 GiB 4.72
--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL ssd-pool 1 64 45 GiB 612 45 GiB 4.88 864 GiB hdd-pool 2 64 123 GiB 634 123 GiB 4.58 2.44 TiB
Here you can clearly see that ssd-pool uses only SSD disks and hdd-pool only HDDs. If you see a pool using disks of the wrong type, the CRUSH rule is not configured correctly.
Continuous monitoring
Proxmox integrates Ceph monitoring into its GUI: Ceph → Status shows IOPS, throughput, and latency per OSD in real time.
Configure notifications: Datacenter → Notifications — add an endpoint (email, Gotify, webhook) and activate alerts for:
- Ceph health warnings
- OSD down
- Pool usage > 80%
- Backup job failures
Multi-node: scaling the setup
The same design scales without changes. Here’s what you gain:
Adding a node to the cluster
From the new node’s GUI: Datacenter → Cluster → Join Cluster. Proxmox handles corosync configuration, certificates, and configuration replication.
Adding disks from the new node
Create OSDs on the new node’s disks just like before. Ceph detects them, assigns device classes, and integrates them into the CRUSH map automatically.
Enabling real replication
Now that you have more than one node, increase pool size:
# Replication across 2 nodes
ceph osd pool set ssd-pool size 2
ceph osd pool set ssd-pool min_size 1
ceph osd pool set hdd-pool size 2
ceph osd pool set hdd-pool min_size 1
Ceph starts rebalancing data automatically — each object gets replicated to an OSD on another node. The CRUSH rules we created with host as failure domain ensure replicas are on different nodes.
Moving PBS to another node
On multi-node, you can migrate the PBS VM to a different node from the one hosting production VMs. This way, a hardware failure on the production node doesn’t affect backups.
Tips and gotchas
Don’t fill Ceph above 80%. Performance degrades drastically as space runs out. At 85%, Ceph starts issuing warnings. At 95%, it blocks writes (nearfull/full flags) and recovering from that state is painful. Monitor usage and add capacity before you get there.
Scrubs and performance. Ceph performs periodic scrubs — data integrity checks. On single-node with HDDs, a deep scrub can cause noticeable latency on VMs. Schedule scrubs during low-activity hours:
# Scrubs only between 2:00 and 6:00
ceph config set osd osd_scrub_begin_hour 2
ceph config set osd osd_scrub_end_hour 6
Recovery after OSD failure. If an HDD dies, the HDD pool becomes degraded but the SSD pool remains intact — thanks to separate CRUSH rules. Each pool is independent. Replace the disk, create a new OSD, and Ceph rebuilds the data automatically (if you have replication > 1).
Snapshots before risky operations. Ceph RBD supports native snapshots that are instantaneous and take no space until the original data changes. Use them before upgrading a VM or changing critical configurations:
# Snapshot a VM disk
rbd snap create ssd-pool/vm-100-disk-0@pre-upgrade
Don’t mix workloads. Production VMs on the SSD pool, everything else on the HDD pool. The temptation of “just putting this small VM on the SSD” ends with the pool full and all your production VMs affected. Be disciplined about the separation.
Conclusion
With Proxmox + Ceph + PBS you have a complete virtualization platform: compute, performance-tiered storage, and backups with deduplication. All open source. The cost is just the hardware.
The setup scales from 1 node to a cluster without redesigning anything. CRUSH rules, pools, backup jobs — everything keeps working when you add nodes. You only change pool sizes to activate real replication.
PBS in a VM is a pragmatic compromise. For a homelab it’s perfect. For serious production, PBS should be on separate hardware, or at minimum backups should be replicated to a second destination — another remote PBS, an S3 bucket, or a rotated external USB drive (yes, seriously — an offline copy is worth more than a thousand online replicas if ransomware gets in).
If you want to run Kubernetes on top of this Proxmox, check out the first post on this blog — it covers the full K8s stack on a single server, which is exactly what you can do with a VM in this setup.