McGarrah Technical Blog

Posts tagged with "homelab"

Upcoming Articles Roadmap: September - December 2025

I’ve got a pile of articles I want to get out before the end of 2025, and I’m trying to stick to at least one post per week. That’s roughly 16 more articles between now and December, which sounds doable if I don’t get distracted by shiny new projects.

Debian 12 SystemD nightly reboots on Dell Wyse 3040s

My super lean Proxmox 8.3 testbed cluster running Ceph occasionally just decides to lockup a node based on it being incredibly limited on RAM and CPU. As much as I hate rebooting Linux/UNIX systems, this is a case where a nightly reboot of the nodes might help with reliability.

Backlog of Posts from 2024

My past write up for a Backlog of Posts had all the things that I wanted to write about in mid-2024. It has been updated with links to the released posts that covered each as I finished up in 2024. I got a lot of them written but the backlog of drafts and things I wanted to write about also grew as I picked off drafts and added new posts.

Power Supply upgrade for GPUs in the Homelab

I want an extra ~350w of power available for a GPU that cannot run off PCIe bus power of 75w or 25w in some very old Dell Optiplex 990 Mini Tower nodes in my Proxmox cluster.

When one of my power supplies died earlier and I bought on eBay a NEW 750W Dell OptiPlex 9010 990 790 Power Supply Replace / Upgrade that was ~750w and the same form factor as those nodes PSU. This was just fast purchase to grab something that would ship the next day with no plan for an upgrade but I did pay attention that was both better and newer with a warranty.

So I have one machine that has the extra wattage available for a much better GPU like a Nvidia GeForce RTX 3060 12Gb.

Ceph Cluster Complete Removal on Proxmox for the Homelabs

My test Proxmox Cluster is used for testing and along the way I broke the Ceph Cluster part of it badly while doing a lot of physical media replacements. The test cluster is the right place to try out risky stuff instead of on my main cluster that is loaded up with my data. Fixing it often teaches you something but in this case I already know the lessons and just want to fast track getting a clean ceph cluster back online.

I need it back in place to test the Proxmox 8.2 to Proxmox 8.3 upgrade of my main cluster. So this is a quick guide on how to completely clean out your Ceph Cluster installation as if it never existed on your Proxmox Cluster 8.2 or 8.3 environment.

proxmox ceph install dialog

Linux Disk I/O Performance in the Homelab

I swapped my physical disks around in my low-end testing hardware cluster. I have a mixture of soldered to the motherboard eMMC and an external USB3 Thumbdrive serving for a root file systems and external /usr volumes now. I would like a quick performance check on reading and writing to those file systems. I also don’t want to setup a huge performance benchmark suite or additional tooling. I just want some quick results at this point.

My basic question is what did I loose in this decision to break out my /usr out to an external USB3 drive. How much performance did I loose?

Powerline Networking for the Homelabs

I inherited, from a stack of old junk hardware, two Netgear Powerline 500 Nano XAVB5101 plugs. I thought I would try it out for a quick network connection between two floors in my new house using the existing power cabling.

Powerline NIC

Wow did I learn a lesson in a combination of networking and electrical power the hard way… with a repeatedly blown breaker.

Proxmox VE 8.1 to 8.2 upgrade issues in the Homelabs

An extended power loss for my primary Proxmox 8 cluster, while I was remote, took half of my cluster nodes out of commission into an unbootable state. This unbootable half of the cluster would not show up on the network after the power came back even with manual physical rebooting. The other half would boot up and show on the network. All the nodes had a second problem that they would not open a PVE WebUI Console Shell or show any output on any of the video output ports for either the Nvidia PCIe GPU or the Intel iGPU. So I have to figure out what looks to be a set of overlapping issues and clean up this mess. There were several lessons learned and re-learned along the way.

First, I need a “crash cart” to recover these to a bootable state. What is a “crash cart”, well that is usually a rolling cart found in a data center that you roll up to a broken server. They typically include some sort of serial terminal and/or a monitor, keyboard and mouse with a lot of connectors and adapters to hook up to random port for the equipment you are fixing. Mine includes adapters for VGA, DVI, DisplayPort, HDMI and both USB and PS/2 keyboard and mice. I’ve even thrown in a spare known good Nvidia K600 video card for troubleshooting graphic cards. A trusty and up to date Ventoy Bootable USB is sitting on there as well. I have a laptop that I could use for a serial terminal if we get to that point but I was hoping I didn’t need it since those are mostly for network equipment.

Crash Cart

Here is my quickly thrown together trash can crash cart (TC3) for this adventure.

ProxMox 8.2.4 Upgrade on Dell Wyse 3040s

My earlier post for ProxMox 8.2.2 Cluster on Dell Wyse 3040s mentioned the tight constraints of the cluster both with RAM and DISK space. There are some extra steps involved in keeping a very lean Proxmox 8 cluster running on these extremely resource limited boxes. I am running Proxmox 8.2 and Ceph Reef on them which leaves them slightly under resourced as a default. So when the Ceph would not start up the Ceph Monitors after my upgrade from Proxmox 8.2.2 to 8.2.4, I had to dig a bit to find the problem.

Proxmox SFF Cluster

Ceph Monitor will not start up if there is not at least 5% free disk space on the root partition. My root volumes were sitting right at 95% used. So our story begins…

Tailscale on Dell Wyse 3040 with Debian 12

I have been using the Dell Wyse 3040 as awesome little systems for my Tailscale nodes in my multiple joint homelab networks. These systems are super low power consuming and physically small enough to just plug and go. Truly, deploying a WireGuard®-based VPN solution could not be any easier. I have four of these units connecting my homelab networks across three geographically diverse locations.

Proxmox Ceph settings for the Homelab

What is Ceph? Ceph is an open source software-defined storage system designed and built to address block, file and object storage needs for a modern homelab. Proxmox Virtual Environment (PVE) makes creating and managing a Hyper-Converged Ceph Cluster relatively easy for initially configuring and setting it up.

Why would you want a Hyper-Converged storage system like Ceph? So your PVE that runs Virtual Machines and Linux Containers has a highly available shared storage service making them portable between nodes in your cluster of machines and thus highly-available services.

There is a significant learning curve involved in understanding how the pieces of Ceph fit together which the Proxmox documentation does a decent job of helping you along. Proxmox VE sets some decent defaults for the Ceph Cluster that are good for an enterprise environment. What they do not do is help you set default to reduce wear and load on your Homelab system. This is where I am going to try out a few things to reduce load and wear on my Homelab equipment while maintaining a relatively high-availability environment.

My post on Ceph Cluster rebalance issue from earlier was from figuring out issues in an unbalanced cluster from a strange data loaded into a cluster. This post is focused on a regular running cluster that needs some optimization for the homelab.

ProxMox 8.2.2 Cluster on Dell Wyse 3040s

I want a place to test and try out new features and capabilities in Proxmox 8.2.2 SDN (Software Defined Networking). I would also like to be able to test some Ceph Cluster configuration changes that are risky as well. I do not want to do it on my semi-production Proxmox 8.2.2 Ceph enabled Cluster that I have mentioned in earlier posts. With 55TiB of raw storage and 29TiB of it loaded up with content, that would be painful to rebuild or reload if I made a mistake during my testing of SDN or Ceph capabilities.

Test in Prod, what could go wrong?

Dell Wyse 3040 CMOS CR2032 Battery Replacement

I have collected nine (9) mostly functional Dell Wyse 3040 thin clients for use in my experimentation with Proxmox Clusters and SDN and Site-2-Site VPN configurations with Tailscale. Yes, I know I have a problem. :)

Dell Wyse 3040 with bad cmos battery

On the upside, they are very small low power consuming Debian 12 servers that have a 1Gbps NIC and run headless nicely once you fix the BIOS settings and Debian configuration correctly. What is not nice is their CMOS batteries are all mostly dying on me and their connector is a odd type that is not supported by many vendors and are between $8-$12 USD to replace. For example the Rome Tech CR2032 CMOS BIOS Battery for Dell Wyse 3040 is about $9.89 USD as of posting this. This bothers me intensely as the actual CR2032 can be picked up for well under a dollar ($1 USD) each at LiCB CR2032 3V Lithium Battery(10-pack) for a pack of 10 for $6 USD. Also, I’m picking these units up with power adapter for between $20 and $45 on eBay and the $10 bite jacks my price per unit up a good bit. So what to do?

Backlog of Posts

I’ve got a backlog of posts I want to do on various topics. A couple of recent posts, I figured getting out something rough was better than not getting it out and just posted something that felt like it needed another draft or two. I’ll likely review those quickly posted items and update them as time permits.

ProxMox 8.2 for the Homelabs

I am in the process of building a Proxmox 8 Cluster with Ceph in an HA (high availability) configuration using very low-end hardware and questionable options for the various hardware buses. I’m going for HA, cheapfrugal and reuse of hardware that I’ve gathered up over the years.

Over the COVID lockdown, I was running a Plex Media Server (PMS) on an older Dell Optiplex 390 SFF Desktop that I cobbled into it several Seagate USB3 portable drives that I just slapped on it as I needed more space. It hosted my extensive VHS, DVD and BluRay library as I ripped them into digital formats. To improve the experience I threw a Nvidia Quadro P400 into the mix and a PCIe USB3 card for faster access to the drives. Eventually, I had some drive issues and wanted to get some additional reliability into the mix so tried out Microsoft Windows Storage Spaces (MWSS). Windows and the associated fun I had with MWSS left me incredibly frustrated and I was trying to make an enterprise product work in a low-end workstation with a bunch of USB drives. The thing that made me fully abandon MWSS was the recovery options when you had a bad drive. MWSS probably works well with solid enterprise equipment but was misery on the stuff I cobbled together. So exit Windows OS.

For about ten (10) years, I had run an VMWare ESXi server that let me play with new technology and host some content and services. I let it go awhile back while I was in graduate school and working full-time but have missed this as an option ever since. So adding a homelab server or cluster will let me get some of that back.

Ceph Cluster rebalance issue

This is rough draft that I’m just pushing out as it might be useful to someone not stay in my drafts folder forever… Good enough beats Perfect that never ships every time.

I think I have mentioned my ProxMox/Ceph combo cluster in an earlier post. A quick summary is it consists of a five (5) node cluster for ProxMox HA and three of those nodes have Ceph with three (3) OSDs each for a total of nine (9) 5Tb OSDs. They are in a 3/2 ceph configuration with three copies of each piece of data allowing for running if two nodes are active. Those OSD / hard drives have been added in batches of three (3) with one added on each node as I could get drives cleaned and available. So I added them piece meal in a sets of three OSDs, then three more and finally the last batch of three. I’m also committing the sin of not using 10Gbps SAN networking for the Ceph cluster and using 1Gbps so performance is impacted.

Adding them in pieces as I also loaded up the CephFS with media content is what is hurting me now. My first three OSDs that are spread across the three nodes are pretty full at 75-85% and as I added the next batches, the cluster has never fully caught up and rebalanced the initial contents. This impacts the results of my ‘ceph osd df tree’ results showing I have less space then I actually have available.

Something that I’m navigating is Ceph will go into read-only mode when you approach the fill limits which is typically 95% of space available. It starts alerting like crazy at 85% filled with warning of dire things coming. Notice in my OSD status below that I have massive imbalances between the initial OSDs 0,1,2 versus 3,4,5 and 6,7,8.

Ceph OSD Status

Buying a 10Gbps or higher network on a homelab budget

This is a project I’ve been thinking about for a long time… how to get 10gbpe+ networking in a homelab without breaking the bank.

First option is just getting some DAC Cables and dual port 10Gbpe NICs then build a point-to-point ring network. That is relatively cheap and would set me up for future switched networks. DACs could be swapped out for GBics that use copper (thermal issues) or fiber (delicate).

Next is a relatively cheap at a couple hundred dollars switch with likely a low number of SFP+ ports. This is ~$200-$500 with anywhere from 2 to 16 ports at 10Gbps. Often switches with 10Gbps advertised only have one or two ports at that speed so shop carefully. You still have the cost of the NICs and cabling but only need one port on the NIC.

Lastly, you could go all in with an enterprise switch like the HP ProCurve 5406zl which is a module hosting monster of a switch. These are massively upgradable but come with a lot of complexity to set up and manage. They are also incredibly loud (intended for server rooms) and suck a ton of power which generates lots of heat (thermal load). These are getting cheaper but are heavy to ship and still usually several hundred dollars with modules that can cost thousands. Don’t expect a warranty on these as they are being pushed out of enterprise usage as end of life.