McGarrah Technical Blog

Caddy Reverse Proxy for Ceph Dashboard

· 6 min read

The Ceph Dashboard has a frustrating quirk — it runs on whichever node is the active ceph-mgr, and that can change during failovers. One day it’s on https://192.168.86.12:8443, the next it’s on .13. Since I already have a Caddy reverse proxy LXC handling Proxmox Web UI access, adding the Ceph Dashboard as a second site block is straightforward and solves the floating-IP problem.

The Problem

As covered in Adding Ceph Dashboard to Your Proxmox Cluster, the dashboard follows the active ceph-mgr service. In my cluster, all six nodes run ceph-mgr, and the dashboard is only accessible on the currently active manager. When a failover happens, your bookmark breaks.

The fix: proxy through Caddy with health checks across all mgr nodes. Caddy automatically detects which node is serving the dashboard and routes traffic there. This is the same active/standby service discovery pattern you’d use for any floating-VIP service — database clusters, message brokers, or any HA service where the active endpoint moves between nodes on failover.

Prerequisites

Current State

Before this change, the Ceph Dashboard is accessible at whichever mgr node is active:

# Check which node has the active dashboard
ceph mgr services

Output shows something like:

{
    "dashboard": "https://192.168.86.12:8443/"
}

The dashboard uses HTTPS on port 8443 with self-signed certificates.

Adding Ceph Dashboard to the Caddyfile

SSH into the Caddy LXC (192.168.86.30) and edit /etc/caddy/Caddyfile to add a second site block. The Ceph Dashboard will listen on port 8443 on the proxy, keeping the same port convention:

# Proxmox Web UI (existing)
https://192.168.86.30 {
	reverse_proxy * {
		to 192.168.86.11:8006
		to 192.168.86.12:8006
		to 192.168.86.13:8006
		to 192.168.86.14:8006
		to 192.168.86.15:8006
		to 192.168.86.16:8006

		lb_policy ip_hash
		health_uri /
		health_interval 10s
		health_timeout 2s
		health_status 200

		transport http {
			tls_insecure_skip_verify
		}

		header_up Upgrade {http.request.header.Upgrade}
		header_up Connection {http.request.header.Connection}
	}
}

# Ceph Dashboard
https://192.168.86.30:8443 {
	reverse_proxy * {
		# All nodes run ceph-mgr
		to 192.168.86.11:8443
		to 192.168.86.12:8443
		to 192.168.86.13:8443
		to 192.168.86.14:8443
		to 192.168.86.15:8443
		to 192.168.86.16:8443

		lb_policy first
		health_uri /api/health
		health_interval 10s
		health_timeout 3s
		health_status 200

		transport http {
			tls_insecure_skip_verify
		}
	}
}

Why This Configuration Works

Apply the Configuration

# Validate the config first
caddy validate --config /etc/caddy/Caddyfile

# Reload without downtime
caddy reload --config /etc/caddy/Caddyfile

# Or restart the service
systemctl restart caddy

Verify it’s listening on both ports:

ss -tlnp | grep caddy

Expected output:

LISTEN 0  4096  127.0.0.1:2019  0.0.0.0:*  users:(("caddy",pid=133,fd=16))
LISTEN 0  4096          *:8443      *:*  users:(("caddy",pid=133,fd=17))
LISTEN 0  4096          *:443       *:*  users:(("caddy",pid=133,fd=20))
LISTEN 0  4096          *:80        *:*  users:(("caddy",pid=133,fd=19))

You’ll see listeners on :443 (Proxmox UI), :8443 (Ceph Dashboard), and 127.0.0.1:2019 — that last one is the Caddy Admin API, which is enabled by default and useful for future config management.

Verify It Works

  1. Open https://192.168.86.30:8443/ in your browser
  2. Accept the self-signed certificate warning
  3. You should see the Ceph Dashboard login page
  4. Log in with your Ceph Dashboard credentials

This is what you’ve been working toward — a stable, single URL for the full Ceph cluster overview, regardless of which node is currently the active ceph-mgr:

Ceph Dashboard Overview showing cluster health, OSD status, and storage utilization

The dashboard shows cluster health, OSD status, pool utilization, and active alerts — everything you need to know about your Ceph cluster at a glance, now accessible from one bookmark that never breaks.

Test Failover

To verify the proxy handles mgr failovers correctly:

# Check current active mgr
ceph mgr stat

# Force a failover
ceph mgr fail $(ceph mgr stat | jq -r '.active_name')

# Wait 10-15 seconds for health checks, then reload the dashboard
# It should still work through the proxy

Troubleshooting

Dashboard Returns 503

All mgr nodes are failing health checks. Verify the dashboard is actually running:

ceph mgr services

If the dashboard key is missing, the module may need to be re-enabled:

ceph mgr module disable dashboard
ceph mgr module enable dashboard

Slow Dashboard Loading

The Ceph Dashboard can be sluggish, especially the first load after a mgr failover. The health_timeout 3s setting accounts for this, but if you’re seeing consistent timeouts, increase it:

health_timeout 5s

Port Conflict

If something else is already using port 8443 on the LXC, pick a different port:

https://192.168.86.30:9443 {
    # ... same config
}

The Complete Caddyfile

The full /etc/caddy/Caddyfile combines both site blocks shown above — the Proxmox Web UI proxy (from the companion article) and the Ceph Dashboard proxy added here. Both live in the same file and Caddy serves them simultaneously on different ports (443 and 8443).

Future Improvements

Categories: technical, homelab

About the Author: Michael McGarrah is a Cloud Architect with 25+ years in enterprise infrastructure, machine learning, and system administration. He holds an M.S. in Computer Science (AI/ML) from Georgia Tech and a B.S. in Computer Science from NC State University, and is currently pursuing an Executive MBA at UNC Wilmington. LinkedIn · Substack · GitHub · ORCID · Google Scholar · Resume