Understanding OpenShift MachineConfigs and MachineConfigPools
By Mark DeNeve
Introduction
OpenShift 4 is built upon Red Hat CoreOS (RHCOS), and RHCOS is managed differently than most traditional Operating Systems. Unlike other Kubernetes distributions where you must manage the base Operating System as well as your Kubernetes distribution, with OpenSHift 4 the RHCOS Operating System and the Kubernetes platform are tightly coupled, and management of RHCOS including any system-level configurations is managed by MachineConfigs, and MachineConfigPools. These constructs allow you to manage system configuration and detect configuration drift on your Control Plane and Worker nodes.
MachineConfigs are responsible for creating and maintaining local RHCOS configuration settings on each node. These settings can be any of the following:
- user creation/deletion
- kernel configs
- file system directories and permissions
- configuration files
- systemd units
In this blog post, we will use a dummy configuration file as an example to distribute to our worker nodes. We will create a file called /etc/sensitive.conf which will contain the text “critical_config_data”. We will then distribute this to our worker nodes and see how the MachineConfigPool handles changes to this file both through the proper channels as well as through manual changes. We will then cover the creation of additional MachineConfigPools and see how these can be used to manage multiple pools of hardware for heterogeneous clusters.
Terminology
We will be working with a few kinds of Kubernetes objects in this blog.
- Machine - the object that describes the host for a node. A Machine has a providerSpec which describes the attributes of a host for a cloud provider, such as AWS, Azure, or vSphere.
- Node - Kubernetes construct that can run workloads as pods. A node can be a part of at most one MachineConfigPool
- MachineConfigs - MachineConfig objects define the configuration that you want to apply to a given machine. These can be things like kernel parameters, files, systemd units, etc. A full list of items that are configurable by MachineConfigs can be found here: Machine Configuration Tasks
- MachineConfigPools - A MachineConfigPool is a collection of MachineConfigs, selected based on labels that are defined on MachineConfigs. A MachineConfigPool can apply to only one type of machine. You can not apply multiple MachineConfigPools to any given machine. MachineConfigPools are responsible for pulling together all the MachineConfigs for a given type of node and applying them to Machines.
Prerequisites
- OpenShift Cluster 4.10 or later
- Cluster Admin privileges on an OpenShift Cluster
- oc command
Test MachineConfig
We will start by creating our test configuration file. We will create a file called /etc/sensitive.conf which will contain one line of data “critical_config_data”. Larger more complex files should be created using butane which helps simplify the creation of a MachineConfig file.
Create a new file called “100-critical-config.yaml” and put the following contents in it.
1---
2apiVersion: machineconfiguration.openshift.io/v1
3kind: MachineConfig
4metadata:
5 labels:
6 machineconfiguration.openshift.io/role: worker
7 name: 100-critical-config
8spec:
9 config:
10 ignition:
11 version: 3.2.0
12 storage:
13 files:
14 - contents:
15 source: data:,critical_config_data%0A
16 mode: 420
17 overwrite: true
18 path: /etc/sensitive.conf
Note lines 6 and 15. Line 6 is where we define which roles we want our configuration file applied to. We will start by only applying this config file to the “worker” role. The mode listed here is DECIMAL, not octal. Normally when setting file permissions in Linux one thinks of “0644” as being “-rw-r–r-”, but this Octal setting needs to be stored as a decimal in an Ignition file, which means that 0644 becomes 420. You can use your favorite Octal to Decimal calculator to make this easier.
With our MachineConfig file created, we will apply it to our OpenShift Cluster:
$ oc login
$ oc create -f 100-critical-config
machineconfig.machineconfiguration.openshift.io/100-critical-config created
With the new machineConfig applied to our cluster, we will look at our MachineConfigPools to see how it is applied. Run the following command and note that the “worker” MachineConfigPool shows as “Updating”.
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-6 True False False 3 3 3 0 8d
worker rendered-worker-b False True False 4 0 0 0 8d
Run oc get nodes
and notice that one of your worker nodes is now “NotReady,SchedulingDisabled”
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ocp410-zh4dg-master-0 Ready master 8d v1.23.3+e419edf
ocp410-zh4dg-master-1 Ready master 8d v1.23.3+e419edf
ocp410-zh4dg-master-2 Ready master 8d v1.23.3+e419edf
ocp410-zh4dg-worker-d98q7 NotReady,SchedulingDisabled worker 8d v1.23.3+e419edf
ocp410-zh4dg-worker-q8q2x Ready worker 8d v1.23.3+e419edf
ocp410-zh4dg-worker-qsv9s Ready worker 8d v1.23.3+e419edf
ocp410-zh4dg-worker-r2hcx Ready worker 14h v1.23.3+e419edf
Rerun the oc get mcp command again and you should now see that one of your machines has updated:
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-6 True False False 3 3 3 0 8d
worker rendered-worker-b False True False 4 1 1 0 8d
Allow this process to complete as it goes through each of your machines. Wait until the oc get mcp command shows a status of “UPDATED” being True before continuing to the next section.
Out of Band Change
Now that our file has been applied to all our worker nodes, let’s examine the file that we created and make a small change to the file locally. Use the oc debug node command to connect to one of your worker nodes:
$ oc debug node/ocp410-zh4dg-worker-d98q7
Starting pod/ocp410-zh4dg-worker-d98q7-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.16.25.127
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cat /etc/sensitive.conf
critical_config_data
sh-4.4#
Open a new terminal window, so that you can run the oc get mcp –watch and then go back to your oc debug command window and run the following command:
# echo "more data" >> /etc/sensitive.conf
You should note that the oc get mcp command output changes, showing that there is a DEGRADED state. To get additional data on this, run the oc describe mcp/worker command:
$ oc describe mcp/worker
...
Status:
Conditions:
Last Transition Time: 2022-03-14T19:50:08Z
Message:
Reason:
Status: False
Type: RenderDegraded
Last Transition Time: 2022-03-23T15:53:14Z
Message:
Reason:
Status: False
Type: Updated
Last Transition Time: 2022-03-23T15:53:14Z
Message: All nodes are updating to rendered-worker-af144fcfd50fb859d796318769bb4a66
Reason:
Status: True
Type: Updating
Last Transition Time: 2022-03-23T15:53:14Z
Message: Node ocp410-zh4dg-worker-d98q7 is reporting: "content mismatch for file \"/etc/sensitive.conf\""
Reason: 1 nodes are reporting degraded status on sync
Status: True
Type: NodeDegraded
Note that the status message shows that there is an issue with node “ocp410-zh4dg-worker-d98q7” and that the problem is “content mismatch…”. You can also see that Annotations have been added to the node machineconfiguration.openshift.io/state: Degraded indicating that there is an issue with this node:
$ oc describe node/ocp410-zh4dg-worker-d98q7
Name: ocp410-zh4dg-worker-d98q7
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
Annotations: csi.volume.kubernetes.io/nodeid: {"csi.vsphere.vmware.com":"ocp410-zh4dg-worker-d98q7"}
k8s.ovn.org/host-addresses: ["172.16.25.127","172.16.25.91"]
...
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-af144fcfd50fb859d796318769bb4a66
machineconfiguration.openshift.io/desiredConfig: rendered-worker-af144fcfd50fb859d796318769bb4a66
machineconfiguration.openshift.io/reason: content mismatch for file "/etc/sensitive.conf"
machineconfiguration.openshift.io/state: Degraded
...
One thing to note is that in this state, the MachineConfigOperator will not fix this issue on its own. Additional steps are required to remediate this out-of-band change.
Manually Forcing MCO to resync
$ oc debug node/ocp410-zh4dg-worker-d98q7
Starting pod/ocp410-zh4dg-worker-d98q7-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.16.25.127
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# touch /run/machine-config-daemon-force
sh-4.4#
You will need to wait for up to 15 minutes for the force command to take effect. The node will be de-scheduled and rebooted to re-apply the configuration.
Creating a new MachineConfigPool
So what if you needed to have a different configuration file on one of your nodes. How do you handle that? The best way to do this is to create a new MachineConfigPool that can be applied to only certain machines.
Start by creating a new MachineConfigPool that will apply to nodes with a role called “GPU”
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: gpu
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,gpu]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/gpu: ""
Now let’s create a new MachineConfig that will only target nodes with the role “gpu”.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: gpu
name: 100-gpu-config
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:,gpuenabled%0A
mode: 420
overwrite: true
path: /etc/gpu.conf
Apply these two changes to your cluster and then review the status of the MachineConfigPools
$ oc create -f gpu-mcp.yaml
machineconfigpool.machineconfiguration.openshift.io/gpu created
$ oc create -f 100-gpu-config.yaml
machineconfig.machineconfiguration.openshift.io/100-gpu-config created
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
gpu rendered-gpu-af14 True False False 0 0 0 0 47s
master rendered-master-67 True False False 3 3 3 0 8d
worker rendered-worker-af True False False 4 4 4 0 8d
You will see we now have a new MCP, called gpu but it has no machines. We can fix this by adding an additional role to one of our machines.
$ oc label node/ocp410-zh4dg-worker-d98q7 node-role.kubernetes.io/gpu=
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
gpu rendered-gpu-900e False True False 1 0 0 0 3m55s
master rendered-master-67 True False False 3 3 3 0 8d
worker rendered-worker-af True False False 3 3 3 0 8d
Notice now that the gpu MCP has one node in it, and the worker node MCP has decreased by one. The gpu mcp is now a superset of both the worker mcp as well as the gpu MCP. If we connect to this machine again we will see that not only does our /etc/sensitive.conf file exist, but it will also have a /etc/gpu.conf file. We can validate this by connecting to the machine we have tagged as “gpu” and see that both the /etc/sensitive.conf and the /etc/gpu.conf files are present.
$ oc debug node/ocp410-zh4dg-worker-d98q7
Starting pod/ocp410-zh4dg-worker-d98q7-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.16.25.127
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cat /etc/sensitive.conf
critical_config_data
sh-4.4# cat /etc/gpu.conf
gpuenabled
sh-4.4#
Cleanup
The MachineConfigs that we created here really do nothing, but we shouldn’t leave unused configurations lying around.
First we will remove the label from the machine that we tagged as node-role=gpu:
$ oc label node/ocp410-zh4dg-worker-d98q7 node-role.kubernetes.io/gpu-
node/ocp410-zh4dg-worker-d98q7 unlabeled
This will remove the node from the “gpu” MachineConfigPool. The node will reboot and the /etc/gpu.conf file will be removed from the node. We can then delete the “gpu” MachineConfigPool and the “100-gpu-config” MachineConfig and finally the “100-critical-config” MachineConfig from the cluster.
$ oc delete mcp/gpu
machineconfigpool.machineconfiguration.openshift.io "gpu" deleted
$ oc delete mc/100-gpu-config
machineconfig.machineconfiguration.openshift.io "100-gpu-config" deleted
$ oc delete mc/100-critical-config
The MachineConfigOperator will go through and remove the customizations we made to all of our worker nodes, deleting the “/etc/sensitive.conf” file one node at a time, just like when we added the file.
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-6 True False False 3 3 3 0 11d
worker rendered-worker-a False True False 4 0 0 0 11d
When the oc get mcp command returns that all nodes are Updated, your cluster is back to the same state it was at the beginning of this post.
Conclusion
Managing custom OS configurations on nodes in OpenShift is now handled by the Machine Config Operator. By using MachineConfigs and MachineConfigPools you can be sure that the configuration that you want to apply to your nodes is applied, and if there is drift, it is called out so that you can address and remediate the node.