Upgrading worker nodes with Istio (node drain, pod disruption budgets, and pilot)

Ghazgkull · April 15, 2019, 10:09pm

My team is currently working on our worker node upgrade automation and we’ve hit a snag with the Istio control plane. A frequently suggested pattern for upgrading kubernetes versions on worker nodes is to:

Spin up a set of worker nodes on the new kubernetes version, doubling the capacity of the current cluster.
Cordon all old nodes so no new workloads are scheduled on them.
Drain each node one-by-one, causing the scheduler to start new pods on the new nodes.
Delete the drained nodes.

However, we’re finding that this process fails with Istio control plane components like Pilot. We’re seeing that pods like Pilot never get rescheduled to the new nodes and deleted; when we ask the node to drain, those pods just remain.

Our guess is that this is happening because those deployments have 1 active pod and a pod disruption budget of 1. And that the drain command isn’t working “smartly” with the pod disruption budget. If so, it would seem like a general kubernetes problem… but I’m still interested if the Istio community is aware of this behavior and has any suggested solutions?

Does anyone out there have a strategy for moving components like Pilot to new worker nodes?

Dinar_Valeev · April 16, 2019, 9:10am

We solved this by increasing number of replicas

Ghazgkull · April 16, 2019, 3:38pm

Out of curiosity, are you setting min replicas on the HPA to 2 “permanently” when you deploy Istio? Or do you temporarily bump the replicas up for worker node migration and then move it back down?

Dinar_Valeev · April 16, 2019, 6:27pm

We just set it permanently

dwradcliffe · April 18, 2019, 4:47pm

Running just 1 replica of the control plane components is a bad idea anyway, so I’d suggest changing the minimum to 2 or 3 permanently.

Topic		Replies	Views
Kubectl drain with istiod 1.5.10	1	1137	November 6, 2020
Istio control plane not accessible (theory) Networking	0	304	March 17, 2023
Bizarre istio-ingressgateway/istiod scheduling with HPA.min > 1 and topologySpreadConstraints on AWS Config	4	1575	June 1, 2021
Running Control Plane on Masters	0	394	September 25, 2019
Upgrade Policy for Istio Config	5	888	April 15, 2019

Upgrading worker nodes with Istio (node drain, pod disruption budgets, and pilot)

Related topics