Below you will find pages that utilize the taxonomy term “Infrastructure”
Post
OpenShift Machine Remediation
Kubernetes and thus OpenShift are designed to host applications in such a way that if a node hosting your application fails, it will reschedule the app on another node automatically, and everything “just keeps working”. This happens without any intervention by an administrator letting you continue on with your life, not getting bothered by some on-call alert system. But what about that node that failed? While the app may be up and running you have a node that is no longer pulling its weight, your cluster capacity is lessened and if you get enough of these failed nodes, other apps may be effected or your cluster may fail.