Proxmox + Ceph + PBS: Complete Guide for a Production-Ready Homelab

Why Proxmox + Ceph for a homelab

Proxmox VE is the serious open source alternative to VMware/ESXi. Enterprise-grade hypervisor, no license cost, with a web interface that doesn’t embarrass you. But where it really shines is when you add Ceph.

Ceph integrated into Proxmox gives you distributed storage without needing an external NAS or SAN. Your local disks become a unified storage pool that VMs consume as if it were an enterprise SAN. And the best part: you can separate fast disks (SSD) from capacity disks (HDD) into different pools, each with their own replication rules.

Add Proxmox Backup Server and you have the complete triangle: virtualization, tiered storage, and backups with deduplication — all on one platform, all open source.

This setup works the same with 1 node or 3+. The architecture scales without redesigning anything. Same CRUSH rules, same pools, same backup jobs. When you add nodes, Ceph distributes data automatically.

Hardware and planning

What you need:

CPU with VT-x/VT-d — any modern server or desktop processor has it. VT-d is important if you plan to do PCI passthrough (GPUs, network controllers, etc.)
RAM: 32GB minimum, 64GB recommended — Proxmox itself uses little, but VMs and Ceph OSDs add up. Each OSD reserves ~1GB of RAM by default. With 4 OSDs that’s already 4GB just for Ceph.
Dedicated disks for Ceph — this is the most important decision

Disk planning:

Disk	Use	Example
Small SSD/NVMe (128-256GB)	Proxmox OS	Boot disk
Dedicated SSD/NVMe	OSDs for fast pool	VMs, databases
Dedicated HDDs	OSDs for bulk pool	Backups, ISOs, templates

For networking: on single-node, standard 1GbE is enough because Ceph traffic is local. On multi-node, a dedicated network for Ceph is practically mandatory — ideally 10GbE, or at minimum a separate VLAN.

System disk ≠ OSD

Don’t use the system disk as a Ceph OSD. It seems tempting to make use of the space, but when Ceph fills up (and it will), the operating system runs out of space and you lose access to the node. The system disk is sacred — only for Proxmox OS and configuration.

Proxmox VE installation

Download the Proxmox VE ISO from the official website and install on the system SSD/NVMe. The installation is a standard wizard — select disk, configure network, root password, and done.

Post-installation

First step: adjust the repositories. If you don’t have a subscription (and for a homelab you don’t need one), switch to the no-subscription repo:

# Remove enterprise repo (requires subscription)
rm /etc/apt/sources.list.d/pve-enterprise.list

# Add no-subscription repo
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > \
  /etc/apt/sources.list.d/pve-no-subscription.list

# Add Ceph no-subscription repo
echo "deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription" > \
  /etc/apt/sources.list.d/ceph.list

# Update
apt update && apt full-upgrade -y

Proxmox subscription

Proxmox works perfectly without a subscription. The no-subscription repository is stable and receives regular updates. The subscription gives you access to the enterprise repo (same versions but with additional testing before release) and technical support. For a homelab, the no-subscription repo is more than enough.

Verify that the network bridge vmbr0 is correctly configured. Proxmox creates it during installation, but confirm it has the correct static IP and gateway:

cat /etc/network/interfaces

You should see something like:

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.100/24
    gateway 192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

Ceph installation and configuration

Install Ceph

From the Proxmox node CLI:

pveceph install

This installs the version of Ceph that Proxmox natively supports (Squid on PVE 8.x). The integration is complete — Proxmox manages Ceph’s configuration, services, and monitoring.

Initialize the Ceph cluster

# Initialize Ceph with the cluster network
pveceph init --network 192.168.1.0/24

# Create the monitor (required for Ceph to function)
pveceph mon create

# Create the manager (required for dashboard and metrics)
pveceph mgr create

On multi-node you’d create monitors and managers on multiple nodes for high availability. On single-node, one of each is enough.

Single-node configuration

Ceph by default expects 3 replicas of each object, which requires 3 nodes. On single-node you need to adjust this:

# Configure Ceph for single-node
ceph config set global osd_pool_default_size 1
ceph config set global osd_pool_default_min_size 1

No redundancy on single-node

With size 1, there is NO data redundancy. If a disk dies, you lose whatever was on that OSD. That’s why PBS is critical in this setup — your backups are your redundancy. On multi-node with size 2 or 3, Ceph replicates data between nodes automatically and you can survive the loss of a disk or even an entire node.

Creating OSDs — SSD and HDD separately

An OSD (Object Storage Daemon) is a Ceph process that manages a physical disk. Each disk dedicated to Ceph becomes an OSD.

Ceph automatically detects the disk type and assigns a device class: ssd, hdd, or nvme. This classification is the foundation for separating storage into pools by performance.

Create the OSDs

From the CLI:

# Create OSD for each dedicated disk
pveceph osd create /dev/sdb    # SSD
pveceph osd create /dev/sdc    # SSD
pveceph osd create /dev/sdd    # HDD
pveceph osd create /dev/sde    # HDD

Or from the GUI: Ceph → OSD → Create OSD, select the disk and Proxmox handles the rest.

Verify device classes

ceph osd tree

bash

$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.63689 root default -3 3.63689 host pve-node 0 ssd 0.46500 osd.0 up 1.00000 1.00000 1 ssd 0.46500 osd.1 up 1.00000 1.00000 2 hdd 1.81940 osd.2 up 1.00000 1.00000 3 hdd 0.88249 osd.3 up 1.00000 1.00000

Each OSD should show its correct class. If Ceph doesn’t detect the class properly (rare, but happens with some RAID controllers or disks behind an HBA), you can force it manually:

# Force device class (must remove current one first)
ceph osd crush rm-device-class osd.0
ceph osd crush set-device-class ssd osd.0

ceph osd crush rm-device-class osd.2
ceph osd crush set-device-class hdd osd.2

CRUSH Rules — Separating pools by disk type

This is the most important part of the setup and where many people get lost.

CRUSH (Controlled Replication Under Scalable Hashing) is the algorithm Ceph uses to decide where to store each object. CRUSH rules tell Ceph which disks it can use for each pool.

By default, Ceph has a single CRUSH rule (replicated_rule) that distributes data across all OSDs without distinguishing type. That means your production VM data could end up on a slow HDD, and your backups on an expensive SSD. We don’t want that.

Create CRUSH rules by device class

# Rule for SSDs — only uses OSDs with class 'ssd'
ceph osd crush rule create-replicated ssd-rule default host ssd

# Rule for HDDs — only uses OSDs with class 'hdd'
ceph osd crush rule create-replicated hdd-rule default host hdd

The syntax is: ceph osd crush rule create-replicated <name> <root> <failure-domain> <device-class>

root: default — the root node of the CRUSH map
failure-domain: host — Ceph distributes replicas across different hosts
device-class: ssd or hdd — only uses disks of this type

Failure domain 'host' on single-node

The host parameter as failure domain means Ceph tries to place each replica on a different host. On single-node this doesn’t apply (there’s only one host), but the rule works anyway because we have size 1. When you add nodes to the cluster and increase size to 2 or 3, cross-host replication activates automatically without changing anything — the rule is already prepared.

Verify the created rules:

ceph osd crush rule ls

bash

$ ceph osd crush rule ls replicated_rule ssd-rule hdd-rule

Creating pools — SSD pool and HDD pool

With the CRUSH rules in place, now we create the pools that actually use those rules.

SSD pool — fast storage

For VM disks, databases, and any I/O-intensive workload:

# Create SSD pool with the SSD CRUSH rule
ceph osd pool create ssd-pool 64 replicated ssd-rule

# On single-node: no replication
ceph osd pool set ssd-pool size 1
ceph osd pool set ssd-pool min_size 1

# Enable as RBD (required for Proxmox to use it for VM disks)
ceph osd pool application enable ssd-pool rbd

HDD pool — bulk storage

For backups (PBS VM disk), ISOs, templates, and general storage:

# Create HDD pool with the HDD CRUSH rule
ceph osd pool create hdd-pool 64 replicated hdd-rule

# On single-node: no replication
ceph osd pool set hdd-pool size 1
ceph osd pool set hdd-pool min_size 1

# Enable as RBD
ceph osd pool application enable hdd-pool rbd

Calculating PG numbers

The number of Placement Groups (PGs) affects performance and data distribution. The general formula is:

PGs = (OSDs × 100) / replica size

For a small setup (2-4 OSDs per pool, size 1): 64 PGs per pool is enough. For larger setups, 128 or 256. Proxmox has a PG autoscaler that adjusts this automatically, but a reasonable initial value avoids warnings.

Add pools as storage in Proxmox

From the GUI: Datacenter → Storage → Add → RBD

For the SSD pool:

ID: ceph-ssd
Pool: ssd-pool
Content: VM Disks, Containers

For the HDD pool:

ID: ceph-hdd
Pool: hdd-pool
Content: VM Disks, Containers, ISO Images, Container Templates

Or via CLI by editing /etc/pve/storage.cfg:

rbd: ceph-ssd
    pool ssd-pool
    content images,rootdir
    krbd 0

rbd: ceph-hdd
    pool hdd-pool
    content images,rootdir,iso,vztmpl
    krbd 0

Proxmox Backup Server in a VM

PBS as a VM inside Proxmox itself is a pragmatic compromise. It’s not ideal — the backup lives on the same hardware as the data — but it’s infinitely better than having no backups. And for a homelab, it works.

Create the PBS VM

Download the PBS ISO from the Proxmox website and create a VM with these specs:

CPU: 2 vCPUs
RAM: 2-4 GB (PBS is very efficient)
Disk: 50-100GB on the HDD pool (ceph-hdd) — backups don’t need fast storage
Network: bridge vmbr0, static IP

Install PBS normally. It’s a wizard similar to Proxmox — disk, network, password, and done.

PBS post-installation

Access the PBS web interface (port 8007) and configure:

Datastore: the VM’s virtual disk is already mounted. Create a datastore pointing to it.
Verification: enable automatic backup verification — PBS can verify the integrity of each backup after creation.

Integrate PBS with Proxmox

On the Proxmox node, add PBS as storage:

From the GUI: Datacenter → Storage → Add → Proxmox Backup Server

ID: pbs
Server: PBS VM IP
Username: root@pam
Password: the one you configured in PBS
Datastore: the datastore name you created
Fingerprint: PBS shows it at Dashboard → Show Fingerprint

Or generate an API token in PBS for passwordless authentication, which is recommended for automation.

Configure backup jobs

In Proxmox: Datacenter → Backup → Add

Storage: pbs
Schedule: daily for critical VMs, weekly for the rest
Selection: select the VMs/CTs to include
Mode: Snapshot (doesn’t require shutting down the VM)
Retention (pruning): configure how many backups to keep

A reasonable retention scheme:

keep-daily: 7
keep-weekly: 4
keep-monthly: 3

This keeps the last 7 daily backups, 4 weekly, and 3 monthly. PBS applies pruning automatically after each backup.

PBS deduplication

PBS uses block-level deduplication — incremental backups are extremely efficient. After the first full backup, subsequent ones only transfer the blocks that changed. A 100GB VM that changes 2GB per day takes ~2GB per incremental backup, not 100GB. The space savings are massive.

Verification and monitoring

With everything set up, verify that the cluster is healthy.

Ceph status

ceph status

bash

$ ceph status cluster: id: a1b2c3d4-e5f6-7890-abcd-ef1234567890 health: HEALTH_OK

services: mon: 1 daemons, quorum pve-node (age 4d) mgr: pve-node(active, since 4d) osd: 4 osds: 4 up (since 4d), 4 in (since 2w)

data: pools: 2 pools, 128 pgs objects: 1.24k objects, 48 GiB usage: 52 GiB used, 3.58 TiB / 3.64 TiB avail pgs: 128 active+clean

What matters: HEALTH_OK, all OSDs up and in, and all PGs active+clean.

Verify pools use the correct disks

ceph osd pool ls detail

Each pool should show the corresponding CRUSH rule. Look for crush_rule in the output — it should match ssd-rule or hdd-rule.

Space usage by pool

ceph df

bash

$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 2.70 TiB 2.58 TiB 123 GiB 126 GiB 4.56 ssd 930 GiB 882 GiB 47 GiB 49 GiB 5.24 TOTAL 3.64 TiB 3.46 TiB 170 GiB 175 GiB 4.72

--- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL ssd-pool 1 64 45 GiB 612 45 GiB 4.88 864 GiB hdd-pool 2 64 123 GiB 634 123 GiB 4.58 2.44 TiB

Here you can clearly see that ssd-pool uses only SSD disks and hdd-pool only HDDs. If you see a pool using disks of the wrong type, the CRUSH rule is not configured correctly.

Continuous monitoring

Proxmox integrates Ceph monitoring into its GUI: Ceph → Status shows IOPS, throughput, and latency per OSD in real time.

Configure notifications: Datacenter → Notifications — add an endpoint (email, Gotify, webhook) and activate alerts for:

Ceph health warnings
OSD down
Pool usage > 80%
Backup job failures

Multi-node: scaling the setup

The same design scales without changes. Here’s what you gain:

Adding a node to the cluster

From the new node’s GUI: Datacenter → Cluster → Join Cluster. Proxmox handles corosync configuration, certificates, and configuration replication.

Adding disks from the new node

Create OSDs on the new node’s disks just like before. Ceph detects them, assigns device classes, and integrates them into the CRUSH map automatically.

Enabling real replication

Now that you have more than one node, increase pool size:

# Replication across 2 nodes
ceph osd pool set ssd-pool size 2
ceph osd pool set ssd-pool min_size 1

ceph osd pool set hdd-pool size 2
ceph osd pool set hdd-pool min_size 1

Ceph starts rebalancing data automatically — each object gets replicated to an OSD on another node. The CRUSH rules we created with host as failure domain ensure replicas are on different nodes.

Moving PBS to another node

On multi-node, you can migrate the PBS VM to a different node from the one hosting production VMs. This way, a hardware failure on the production node doesn’t affect backups.

Tips and gotchas

Don’t fill Ceph above 80%. Performance degrades drastically as space runs out. At 85%, Ceph starts issuing warnings. At 95%, it blocks writes (nearfull/full flags) and recovering from that state is painful. Monitor usage and add capacity before you get there.

Scrubs and performance. Ceph performs periodic scrubs — data integrity checks. On single-node with HDDs, a deep scrub can cause noticeable latency on VMs. Schedule scrubs during low-activity hours:

# Scrubs only between 2:00 and 6:00
ceph config set osd osd_scrub_begin_hour 2
ceph config set osd osd_scrub_end_hour 6

Recovery after OSD failure. If an HDD dies, the HDD pool becomes degraded but the SSD pool remains intact — thanks to separate CRUSH rules. Each pool is independent. Replace the disk, create a new OSD, and Ceph rebuilds the data automatically (if you have replication > 1).

Snapshots before risky operations. Ceph RBD supports native snapshots that are instantaneous and take no space until the original data changes. Use them before upgrading a VM or changing critical configurations:

# Snapshot a VM disk
rbd snap create ssd-pool/vm-100-disk-0@pre-upgrade

Don’t mix workloads. Production VMs on the SSD pool, everything else on the HDD pool. The temptation of “just putting this small VM on the SSD” ends with the pool full and all your production VMs affected. Be disciplined about the separation.

Conclusion

With Proxmox + Ceph + PBS you have a complete virtualization platform: compute, performance-tiered storage, and backups with deduplication. All open source. The cost is just the hardware.

The setup scales from 1 node to a cluster without redesigning anything. CRUSH rules, pools, backup jobs — everything keeps working when you add nodes. You only change pool sizes to activate real replication.

PBS in a VM is a pragmatic compromise. For a homelab it’s perfect. For serious production, PBS should be on separate hardware, or at minimum backups should be replicated to a second destination — another remote PBS, an S3 bucket, or a rotated external USB drive (yes, seriously — an offline copy is worth more than a thousand online replicas if ransomware gets in).

If you want to run Kubernetes on top of this Proxmox, check out the first post on this blog — it covers the full K8s stack on a single server, which is exactly what you can do with a VM in this setup.