OKD for the Homelab
Honestly, this is not the blog article I wanted to write. I was hoping to be able to give OKD a resounding endorsement and be able to tell you, reader, how awesome and easy to use it is. Unfortunately, my experience in using OKD - for my intended goal, which I'll elaborate on a bit further - has not been without it's challenges and pitfalls. That being said, is OKD a viable platform for general consumption in a homelab? Yes, absolutely it is. However, in the interest of complete transparency, I must tell you that OKD is not a drop in replacement for Openshift. Sure, the two share a common code base and implement more or less the same core components, but I cannot stress enough that OKD is not "free" Openshift.
Perhaps the real value of Openshift lies in the extensive catalog of 1st party and certified 3rd party operators that make it incredibly easy to extend the functionality and features of Openshift in dramatic and powerful ways. By comparison, OKD has only a very small subset of community operators with many of them simply not having a free version. It's pretty hit or miss, for instance there is a community ArgoCD operator, but there isn't one for Openshift Pipelines (upstream Tekton).
My Goal
Initially, my goal was to evaluate whether OKD could be a viable, and perhaps more importantly -zero cost- alternative to VMware for use in a homelab setting with the intent to compete head-to-head with ProxMox. In a previous article, I touched on the Broadcom acquisition of VMware and the ripple effect that that has had throughout the IT world, from enterprises down to the home labs of the engineers who use their software. An alternative to VMware that has garnered a lot of praise recently is ProxMox VE. Compared to Openshift and OKD, ProxMox VE is actually a much more suitable alternative when doing a head to head comparison of vSphere features. Like vSphere, ProxMox is a traditional hypervisor platform. ProxMox also offers a lot of enterprise grade features that VMware offers only with a paid enterprise license; a central management console, clustering services, high availability, live migration, software defined storage, VM templating, snapshots, backup and restore, etc.
That is not to say that Openshift, or OKD are not capable of doing many (if not all) of those things as well. But it must be pointed out that Openshift is a container orchestration platform first, and it just happens to be capable of running VM workloads. With the average VMware refugee in mind, at least on paper, Proxmox seems like a much more logical choice. However, I would argue that increasingly, many people leverage virtualization primarily as a way to host container platforms first, and for managing VM workloads themselves second. In a homelab environment, often times people will go with whatever the path of least resistance is; "how do I spin up this app in as few steps as possible?". In a lot of cases, this might be something as simple as a Docker plugin included on their NAS, and for a non-production environment that is perfectly adequate.
It all really comes down to what the purpose of your homelab is and what you intend to accomplish with it. For some, the goal may simply be a way to host a few specific workloads. Things like Plex/Jellyfin, Minecraft servers and Nextcloud are some of the more common services you'll see homelabbers self hosting, and there are about a dozen different ways to deploy all of them ranging from a straightforward install on a VM to fully automated and managed application lifecycle on Kubernetes using CI/CD. Granted, running Openshift and using things like Tekton pipelines to build custom images and ArgoCD to do rollouts is really overkill if the goal is simply spin up a Plex server to share your movie collection with friends. Don't get me wrong, if that's all you're after then go with whatever works for you.
However, while my lab does host all of the apps mentioned above, the real function of my lab is for learning. I hone my craft as a containerization and app modernization expert in my home lab where if I make a change that ends up breaking things, it doesn't cause a production outage costing a company money. Usually the worst that ever happens for me is my Internet might go down for a couple of minutes, and while the other people in my house seem to have an unreasonable expectation of a five nines SLA for getting on Youtube and Fortnite, they're not the ones paying the bills, so if I need to make a change that makes something go boom, that's up to me. Honestly, the best way to learn new technology is by breaking it and learning either how to fix it, or how to prevent from breaking it again the same way.
The fact is, the world is moving towards containers, microservices and cloud-native development. At this point, if you're not using Kubernetes, you're behind the power curve, and Openshift is the premier container orchestration platform - for many reason. Openshift is the most consistent, robust, easy to use and fully platform agnostic container solution available, which means that standardizing on Openshift allows you to deploy your applications anywhere from public or private cloud, on-prem, virtual or bare metal, or at the edge without having to spend any time refactoring your applications. Not only that, but with CNV (Container Native Virtualization, aka; kubevirt), it allows you run your containers and virtual machines side by side on a single cohesive platform, both providing IT organizations with a cost effective hosting solution for VMs, and also opening the door to new ways of deploying legacy workloads, allowing you to modernize in place, giving you more runway to work with so you can start reducing technical debt incrementally without having to fully refactor legacy applications from the ground up.
Why Enterprise?
You may be asking yourself why you would want to go to all the trouble of deploying a container orchestration platform in your homelab if your end goal is simply to run a couple of VMs? Well, as I already pointed out - the world is moving away from VMs. Not that VMs are going away, just that they're the old way of doing things, and containers are the modern way to design and deploy applications. VMs still have their uses and always will. If all you want is to be able to run a couple of VMs to host some applications and something like ProxMox checks enough boxes for you, then more power to you. Have at it. However, as I also pointed out, I use my lab to learn skills that are valuable to my job and the fact of the matter is - there isn't a single IT organization out there using ProxMox for hosting any kind of production workloads. There just aren't. So to my mind, I wonder why anyone would waste time and effort to learn a platform that provides zero real world value. Anecdotally, I used to be Director of Infrastructure at my last job, and if I saw a resume cross my desk and a candidate even mentioned ProxMox, I'd assume they have no actual experience in IT. Yeah, that's gonna be a no from me, dawg...
I'm not trying to bag on ProxMox - it's a fine solution for what it is. I'm just saying it's not a serious enterprise platform, even though proponents of the project like to tout it's list of 'enterprise features'. In my opinion, if you're going to go to the trouble of building and running a home lab, it only makes sense to use real enterprise solutions that provide actual marketable skills and value. As we say in the military, "train like you fight, fight like you train". For a long time, the general recommendation for homelab was to just pony up the cash for VMUG Advantage membership which got you access for one year to licenses for most of the products in VMware's portfolio, and that made sense because up until very recently, VMware was the dominant on-prem hosting platform. I mean, they still are, but with the recent Broadcom acquisition, pretty much everyone is looking for an exit strategy. For some that might mean going all-in on the cloud. For others that might mean looking at other virtualization solutions like Nutanix. However, I would argue that no other platform is better poised to provide a migration path off of vSphere than Openshift. First and foremost, we are core-for-core substantially less expensive than VMware. Second, the fact is that many (if not most) companies running vSphere on-prem are running a container platform on top of it, and in terms of pure market share, that container platform is more often than not Openshift. So it makes sense to cut out the middle man, avoid paying the Broadcom tax and just going all-in on Openshift.
Free as in beer... and freedom
So where does OKD fit into this conversation, and what does Broadcom have to do with my homelab? Well... OKD is the upstream of Openshift, meaning it is for all intents and purposes the same bits as Openshift. Like Openshift, OKD is FOSS (free open source software). However, unlike Openshift, OKD comes with no support or warranty intended or implied and it lacks some of the value-add pieces that come included with a paid Openshift subscription, like some certified operators. But under the hood, OKD has the same engine that runs on Openshift and it comes with the low-low price of - free. Gratis. Zero dollars. OKD is not "freemium", it does not come with an eval period. It's just free, like ProxMox is. However, unlike ProxMox, OKD is for all intents and purposes functionally identical to Openshift. Any workload you can deploy on OKD can be deployed on Openshift and vice-versa using the exact same tooling and configuration without modification. Oh, and what I said about Container Native Virtualization on Openshift? Yeah, OKD has that too.
So what I'm saying is this; if you are currently running vSphere in your home lab and are looking for a replacement, I would urge you to at least take OKD for a spin to see if it might be a good fit. It's not going to cost you anything, and you might learn a few things along the way.
MVP
In the world of sports, an MVP is the "most valuable player". While I'd argue that my main OKD cluster is the most important thing I run in my lab, what MVP means in the IT world is "minimum viable product". Meaning, what features or functionality must it have in order to be considered usable. For Openshift/OKD to be a viable alternative to vSphere for a home lab, the list (for me) looks like this:
- container orchestration
- RWX block and file CSI drivers
- virtualization
- bridged network for virtualization
- loadbalancer services
With the exception of the plaform itself of coures, every one of these things is available as an operator in the Operator hub on Openshift. On OKD, some of those things are available as operators, others need to be configured manually.
Platform
Deploying OKD is not like a typical operating system install. You can't just boot up an ISO, follow a wizard and reboot to a functioning cluster. Deploying OKD is done generally by crafting an install-config.yaml, generating ignition configs, booting into the FCOS live installer and applying the appropriate ignition config for the node. There are 3 kinds of nodes - a control plane, a compute node and a bootstrap. Technically, there is a 4th kind of node called an infrastructure node, but they are essentially just unschedulable compute nodes designated for infrastructure services like storage workloads, and do not have any relevance for the process of bootstrapping a cluster.
You can grab the necessary OKD binaries here https://github.com/okd-project/okd/releases. I've noticed a few bugs in the web UI with OKD 4.15, so it might be best for now to stick with OKD 4.14. You'll need to download the appropriate openshift-client
and openshift-install
packages for your OS and architecture. Unpack the tar files and copy the binaries oc
kubectl
and openshift-install
to somewhere in your $PATH
. I usually put them in /usr/local/bin
.
Once you've installed the binaries, you can download the appropriate boot files you'll need to deploy OKD on your nodes with the following command
Then scroll down until you see a section like this
"metal": {
"release": "38.20231002.3.1",
"formats": {
"4k.raw.xz": {
"disk": {
"location": "https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/38.20231002.3.1/x86_64/fedora-coreos-38.20231002.3.1-metal4k.x86_64.raw.xz",
"signature": "https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/38.20231002.3.1/x86_64/fedora-coreos-38.20231002.3.1-metal4k.x86_64.raw.xz.sig",
"sha256": "5d1250fbadefc9b787e45201a464825b62f2817b544be82adeddbc0b43c29339",
"uncompressed-sha256": "6597a3b7e9ae53d3a0af6b603f933e60a055af3abe90f93ec6679ebdc063c023"
}
},
"iso": {
"disk": {
"location": "https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/38.20231002.3.1/x86_64/fedora-coreos-38.20231002.3.1-live.x86_64.iso",
"signature": "https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/38.20231002.3.1/x86_64/fedora-coreos-38.20231002.3.1-live.x86_64.iso.sig",
"sha256": "d7839771d1810176f71e8b998d23d60eb580c941914264b5f5dec35798ac7503"
}
},
which gives you the URL of the live ISO you'll need (as well as the artifacts you'd need if you were to PXE boot your nodes). Download the ISO, eg;
wget https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/38.20231002.3.1/x86_64/fedora-coreos-38.20231002.3.1-live.x86_64.iso
From there, the process is relatively straightforward. Burn the ISO to a thumb drive, boot your nodes one at a time with the thumb drive, apply the appropriate ignition config and reboot. Then... you wait about 30 minutes.
I've outlined in much greater detail how to bootstrap a cluster here. The instructions are for regular OCP, however the process, and the networking prerequisites are identical.
Storage
I'm a big fan of Ceph storage for it's native integration with Kubernetes. I use ODF (Openshift Data Foundation) on my OCP clusters, which use an external Ceph cluster for storage. ODF can also be installed in such a way that it consumes local disks on your nodes to create a hyper-converged Ceph pool. Personally I don't like deploying ODF this way because it puts all your eggs in one basket, and if your Openshift cluster becomes unrecoverable, so does all your data. It's worth pointing out that ODF in either configuration is only available as an add-on to Openshift (either as part of Openshift Platform Plus, or ODF/ODF Advanced which are separate paid products). ODF is installed as an operator.
On OKD, you can use Rook (https://rook.io), the upstream project of ODF. Rook is not available through the operator hub, so you'll have to install it manually. You'll need to determine ahead of time whether you're deploying rook in internal (hyperconverged mode) or external (using a dedicated Ceph cluster) mode. I'm not going to cover deploying rook internal but the process is largely the same.
First you need to gather some information from your Ceph cluster.
# on the external Ceph cluster
ceph health
ceph fsid
ceph auth get-key client.admin
#output should be something like this
HEALTH_OK
86bf38ca-7f3d-11ee-a2d8-02cc0f00000b
AQBTP01l4nMEJxAAceVBUNVoAxUr+ZQgcAT6hA==
# Then you need to use the IP address of the primary MON node.
# export the proper values as aliases
# on your workstation
export NAMESPACE=rook-ceph
export ROOK_EXTERNAL_FSID=86bf38ca-7f3d-11ee-a2d8-02cc0f00000b
export ROOK_EXTERNAL_ADMIN_SECRET=AQBTP01l4nMEJxAAceVBUNVoAxUr+ZQgcAT6hA==
export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.168.20.41:3300
# clone the rook git repo
git clone https://github.com/rook/rook.git
# create the rook namespace
oc new-project rook-ceph
# create a service account with anyuid privs for rook to use
oc create serviceaccount rook-ceph-system -n rook-ceph
oc adm policy add-scc-to-user anyuid -z rook-ceph-system -n rook-ceph
# apply the appropriate yaml files
oc apply -f rook/deploy/examples/crds.yaml
oc apply -f rook/deploy/examples/common.yaml
oc apply -f rook/deploy/examples/operator.yaml
# wait for the operator pods to spin up
oc apply -f rook/deploy/examples/operator-openshift.yaml
# rename some references to rook-ceph-external
sed -i "s/rook-ceph-external/rook-ceph/g" rook/deploy/examples/common-external.yaml
oc apply -f rook/deploy/examples/common-external.yaml
# rename some references to rook-ceph-external
sed -i "s/rook-ceph-external/rook-ceph/g" rook/deploy/examples/import-external-cluster.sh
sh rook/deploy/examples/import-external-cluster.sh
Then you need to edit a secret
oc edit secret rook-ceph-mon -n rook-ceph
and delete these two lines
ceph-secret: ""
ceph-username: ""
# rename some references to rook-ceph-eternal
sed -i ’s/rook-ceph-external/rook-ceph/g’ rook/deploy/examples/cluster-external.yaml
oc apply -f rook/deploy/examples/cluster-external.yaml
#wait until all the pods come up, which will probably take a few minutes. you can monitor the status with this command
oc get pod -n rook-ceph
# you can check the status of your ceph cluster with this command
watch -n5 oc get CephCluster --all-namespaces
# once it says status is connected you can create your storage classes
NAMESPACE NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph rook-ceph 4h14m Connected Cluster connected successfully HEALTH_OK true 86bf38ca-7f3d-11ee-a2d8-02cc0f00000b
Example Ceph RBD storage class:
Example Ceph FS storage class:
#set rook-cephrbd as your default storage class
oc annotate storageclass rook-cephrbd "storageclass.kubernetes.io/is-default-class=true"
# create snapshot classes
oc apply -f rook/deploy/examples/csi/cephfs/snapshotclass.yaml
oc apply -f rook/deploy/examples/csi/rbd/snapshotclass.yaml
Test to make sure you can create PVCs
and while we're at it, let's make sure we can take snapshots too
Virtualization
Thankfully, it's very easy to deploy OKD's hypervisor called kubevirt
. It is deployed as an operator and the install is very straightforward. The only prerequisite is that you need to have a RWX block storage class set as your cluster default storage class (which we did in the previous step). Worth noting - if you use ceph-rbd as your default storage class for kubevirt, you must have the following two parameters specified in your storage class (the examples provided by rook don't):
mounter: rbd
mapOptions: "krbd:rxbounce"
Otherwise, you'll have alerts on the console that won't go away.
While logged in to the web UI as cluster admin, navigate to the Operator hub and search for kubevirt
Install the operater, which will take a few minutes. Once it's installed, create a hyperconverged
instance
You can simply accept the default values. The defaults are sensible and can be changed after the fact if needed. It's going to spin up a couple dozen pods and create some PVCs which will be used to create the boot sources for some of the VM templates (CentOS 7-9 and Fedora latest).
That's all there is to deploying the kubvirt operator.
Bridged Networking
Bridged networking is what every traditional virtualization platform uses to provide connectivity to VMs. VLANs are trunked into the hypervisor, a bridge interface is created, VM is attached to the bridge and gets an IP on the appropriate VLAN. By default, kubevirt does not provide this capability, probably because there's almost no way the upstream project could implement a solution that fits every single use case. However, Openshift, and by extension OKD, have only ever had two options, OpenshiftSDN and OVN-k, and one of them is deprecated and will eventually go away, so effectively the only network stack available on OKD is OVN-k. Thankfully, adding the functionality to kubevirt is easy. You'll need to install the kubernetes nmstate operater, which unfortunately on OKD is not available on the Operator hub.
You can deploy nmstate on your OKD cluster in one shot with the following commands
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/nmstate.io_nmstates.yaml
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/namespace.yaml
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/service_account.yaml
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/role.yaml
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/role_binding.yaml
oc apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.81.0/operator.yaml
cat <<EOF | oc create -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:
name: nmstate
EOF
Once installed, you will be able to create a couple new kinds of CRs, most importantly - NodeNetworkConfigurationPolicies, or nncp
In a nutshell, the way that bridge networking works on OKD with OVN-k is you create a bridge that is tagged with a VLAN, and then you attach VMs to that bridge without specifying the VLAN tag, effectively inheriting the 802.11Q tag from the bridge interface. It sounds more complicated than it really is, but here's an example. Let's use VLAN20
where enp0s31f6
is the name of the physical NIC on the nodes
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: worker-vlan20-bridge-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: vlan20
state: up
type: vlan
ipv4:
enabled: false
ipv6:
enabled: false
vlan:
base-iface: enp0s31f6
id: 20
- name: br20
description: bridge for vlan20
type: linux-bridge
state: up
ipv4:
enabled: false
ipv6:
enabled: false
bridge:
options:
stp:
enabled: false
port:
- name: vlan20
Then once you've created one or more nncp
, you can create a NetworkAttachmentDefinition
. These are namespaced CRs and can be created easily through the web UI. Here's an example using VLAN20. You just have to specify the name of the bridge interface, in this case br20
which was defined in the nncp
we just created
and for good measure, here's the yaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
description: vlan20
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br20
name: vlan20
namespace: default
spec:
config: '{"name":"vlan20","type":"cnv-bridge","cniVersion":"0.3.1","bridge":"br20","macspoofchk":false,"ipam":{},"preserveDefaultVlan":false}'
So now let's create a VM to test our bridge. Navigate to Virtualization > create Virtual Machine > From Template
and select Fedora.
Navigate to customize > configure > Network
and configure the default interface to use the network default/vlan20
we just created with Bridge
as the type
Click save, and create the virtual machine. It will provision the machine and boot up shortly - on my cluster this took less than a minute. Once the VM is booted qemu guest agent service should report the IP address in the console
Now let's try pinging it
Success!
Load Balancers
Now, this is one of the places where I actually struggled with OKD, but to be fair, for the first several years I was running Openshift, I had similar problems. In the cloud, a load balancer service is generated and can use the cloud APIs to grab an IP address out of a pool that you've already paid for. On prem, load balancers are a little more difficult. In the past, as a dirty hack/workaround, I would set a load balancer service external IP address to the ingress or API VIP of my Openshift cluster. It works in a pinch, but it's not the best idea, and quickly becomes a problem when we're talking about using services on a VM which are common to Openshift, eg; HTTP/HTTPS and SSH, ports 80, 443 and 22 respectively.
This is where MetalLB comes in. On Openshift, MetalLB is available as an operator. It's super easy to set up and use, you just install the operator, create an IPAddressPool and you're off to the races. However, on OKD there is no community MetalLB operator, and trying to deploy it from the official git repo, either using static manifests or as an operator fails with all kinds of problems.
With a little help from a colleague and a little bit of hacking on it, I was finally able to get MetalLB deployed on OKD. Shout out to Ken Moine who shared his (mostly) working kustomize with me. Granted it's targeted at vanilla k8s on Fedora, so it took some extra work to get it running on OKD
oc create ns metallb-system
&
oc apply -n metallb-system -k https://raw.githubusercontent.com/kenmoini/ansible-fedora-k8s/main/cluster-config/install-metallb/operator/kustomization.yml
Which pulls in everything for you and creates among other things, 2 service accounts, a deployment and daemonset, all of which have to be kicked around a little bit to get them to play nice with OKD. Basically I had to set the securityContext on both the ds and the deployment to privileged: true
and then patch the service accounts, speaker
and controller
with privileged
SCC. Not the best idea from a security standpoint and definitely not OKD friendly, but hey... it works.
oc create serviceaccount speaker -n metallb-system
oc adm policy add-scc-to-user anyuid -z speaker
oc adm policy add-scc-to-user privileged -z speaker
oc adm policy add-scc-to-user hostaccess -z speaker
oc create serviceaccount controller -n metallb-system
oc adm policy add-scc-to-user anyuid -z controller
oc adm policy add-scc-to-user privileged -z controller
oc adm policy add-scc-to-user hostaccess -z controller
controller deployment yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: metallb
component: controller
name: controller
namespace: metallb-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 3
selector:
matchLabels:
app: metallb
component: controller
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "7472"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: metallb
component: controller
spec:
containers:
- args:
- --port=7472
- --log-level=info
env:
- name: METALLB_ML_SECRET_NAME
value: memberlist
- name: METALLB_DEPLOYMENT
value: controller
image: quay.io/metallb/controller:v0.13.12
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: monitoring
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: controller
ports:
- containerPort: 7472
name: monitoring
protocol: TCP
- containerPort: 9443
name: webhook-server
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: monitoring
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: controller
serviceAccountName: controller
terminationGracePeriodSeconds: 0
volumes:
- name: cert
secret:
defaultMode: 420
secretName: webhook-server-cert
speaker daemonset yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
labels:
app: metallb
component: speaker
name: speaker
namespace: metallb-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: metallb
component: speaker
template:
metadata:
annotations:
prometheus.io/port: "7472"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: metallb
component: speaker
spec:
containers:
- args:
- --port=7472
- --log-level=info
env:
- name: METALLB_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: METALLB_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: METALLB_ML_BIND_ADDR
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: METALLB_ML_LABELS
value: app=metallb,component=speaker
- name: METALLB_ML_SECRET_KEY_PATH
value: /etc/ml_secret_key
image: quay.io/metallb/speaker:v0.13.12
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: monitoring
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: speaker
ports:
- containerPort: 7472
hostPort: 7472
name: monitoring
protocol: TCP
- containerPort: 7946
hostPort: 7946
name: memberlist-tcp
protocol: TCP
- containerPort: 7946
hostPort: 7946
name: memberlist-udp
protocol: UDP
readinessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: monitoring
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ml_secret_key
name: memberlist
readOnly: true
- mountPath: /etc/metallb
name: metallb-excludel2
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: speaker
serviceAccountName: speaker
terminationGracePeriodSeconds: 2
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
volumes:
- name: memberlist
secret:
defaultMode: 420
secretName: memberlist
- configMap:
defaultMode: 256
name: metallb-excludel2
name: metallb-excludel2
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
Eventually, I'll clean all this up and turn it into something that can be applied with a one liner with regular yaml files and not have to fiddle with things.
Once all the pods were up (there will be one controller pod and one speaker per node) you can create an IPAddressPool
iMac-Pro:metallb rcuda$ oc get pod
NAME READY STATUS RESTARTS AGE
controller-569cd88bcf-jll7s 1/1 Running 0 4h16m
speaker-4hrlv 1/1 Running 0 4h45m
speaker-5sr45 1/1 Running 0 4h45m
speaker-b2c4t 1/1 Running 0 4h45m
speaker-b6pnr 1/1 Running 0 4h45m
speaker-n7xp4 1/1 Running 0 4h45m
speaker-q9fxc 1/1 Running 0 4h45m
speaker-ql8wt 1/1 Running 0 4h45m
speaker-rh6tc 1/1 Running 0 4h45m
speaker-s7kxx 1/1 Running 0 4h45m
speaker-x5zlr 1/1 Running 0 4h45m
IPAddressPool yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: okd-201-250
namespace: metallb-system
spec:
addresses:
- 192.168.10.201-192.168.10.250
autoAssign: true
avoidBuggyIPs: false
And now we can try and create a load balancer service. I spun up another VM and exposed SSH as a service to test it out.
You can quickly create services from VMs using virtctl
. virtctl
can be downloaded from the Web console. Just click the ?
icon next to the username up in the top right corner and select Command Line Tools
from the drop down. Then download the appropriate binary for your OS and architecture
I usually put mine in /usr/local/bin/
but anywhere in you $PATH
is fine.
Expose a port on a VM as a service with a command like this
virtctl expose vm fedora-0n32r6e4cc80kwox --port=22 --type=LoadBalancer --name=fedora-0n32r6e4cc80kwox
which will create a load balancer service with the same name as the VM exposing port 22 (SSH).
We can see it created a service with the external IP of 192.168.10.201, so lets try to connect to it
Success!
Wrapping up
We've covered quite a few things thus far; deploying OKD, and configuring the addon services that I consider necessary for it to be a viable platform for general use in a home lab. Granted, there are still tons of other features that we could discuss like backup/restore (there's an operator for that called OADP), live migration (it works out of the box), high availability (there's an operator for that called Node Health Check) and CI/CD pipelines (it's a little more complicated), and some nice to haves like managing SSL certs (there's an operator for that called Cert Manager) or LDAP auth for user access and RBAC (you just need to configure an ID Provider). None of those are really within the scope of this article, I really wanted to limit my focus to what I consider MVP for a lab. In the end, I've got to say that as long as you're willing to forego some of the creature comforts of downstream Openshift and it's extensive Operator catalog and you're willing to get your hands dirty configuring some things manually, OKD is pretty damn good. If we're being honest, it's a hell of a lot better than trying to roll your own vanilla Kubernetes cluster and bolt on all the functionality that OKD provides. Ultimately, as a zero cost alternative to ProxMox, I'd argue that while admittedly a lot more complex, learning and working with a leading industry standard platform is just better, and having a strong community of tinkerers using OKD is only going to help make it even better.