Enabling Multi-Primary without Downtime

Hi all!

We’re running 6 K8s clusters (2 prod, 2 dev, 2 test) and are in the process of enabling multicluster between the 2 clusters in the respective environment.
These clusters are all “in production” and already run workloads.

When rolling out the shared CA cert to the dev clusters, we noticed that all Pods, on the cluster where the new CA cert had been deployed, lost communication to the control plane - as their sidecars still had the cert from the “old”, non-multicluster setup.

In hindsight, this was kinda to be expected, but it took us a moment to notice and fix by restarting all Pods with sidecars - which seems to be the only way to make them talk to the control plane again?
(FYI: both the test and the dev clusters have it enabled now, and work as expected)

Before we now also roll out a new, shared CA cert to our 2 prod clusters, and also get an outage there until all pods with sidecars have been restarted, we were wondering if there is any chance of doing this with zero downtime - e.g. with the old and the new CA certs present in the control plane at the same time, so that the already running sidecars, who are only trusting the old CA cert, can still talk to the control plane that has the new CA cert.

The steps are:

  1. Extract the istio auto generated ca.key & ca.cert
  2. Generate the new root certificate
  3. Prepare the intermediate certificate with the new root certificate
  4. Prepared a combined root certificate & empty cert-chain.pem file (as autogenerated has no cert-chain.pem)
  5. Delete cacert secret if exists & Create new cacert secret in istio-system with auto generated ca-cert, ca-key & cert-chain.pem (empty) and combined root certificate
  6. Restart istiod to propagate the new certificate to individual namespace
  7. Restart all the workload to get the new certificate
  8. Recreate cacert in istio-system, with the intermediate ca-cert, ca-key & cert-chain.pem and combined root certificate
  9. Restart istiod to propagate the certificate to individual namespace
  10. Restart all the workloads to get the new certificates
  11. (optional clean-up): Recreate cacert in istio-system, with the intermediate ca-cert, ca-key, root-cert & cert-chain.pem
  12. Restart istiod to propagate the new certificates to individual namespaces
  13. Restart all the workloads to get the new certificates