When ZFS and Ceph Problems Collide: Diagnosing Overlapping Failures on Proxmox
A routine ZFS scrub alert on harlan turned into a multi-hour debugging session when a hostid mismatch fix collided with a pre-existing Ceph OSD failure from a dead USB drive. Here's how overlapping storage problems can mask each other and how to untangle them.
Ceph OSD Recovery After Power Failure: SAN Switch Was Dead the Whole Time
A power outage knocked my Ceph cluster from 15 healthy OSDs down to 4. The recovery took days of debugging — heartbeat cascades, a ceph.conf misconfiguration, and a dead SAN switch hiding behind NO-CARRIER flags on every node.