Istiod/pilot out of sync with virtual services defined in cluster

We recently upgraded from Istio 1.11 to Istio 1.15, since then we have had a number of times where VirtualServices are added to the cluster, and they either
A) Don’t program the istio-ingressgateway at all or
B) Only program SOME of the istio-ingressgateways (even worse IMO)

I believe it is not an issue with the ingressgateways (which show RDS: synced), but with istiod (pilot) - the istio.pilot.virt_services metric shows that some pilots report more virtual services than others.

Some screen shots below.

What would cause istiod to not track cluster state appropriately? Any debugging tips? Right now our remediation is to rolling restart istiod.


The main thing that looks like would cause this is Istio losing the watch to k8s api-server and either failing to retry OR not noticing it was lost, so not attempting to restart, but also not getting new updates. I haven’t seen the latter occur, personally.

To verify this is the issue you could:

  • istioctl x internal-debug configz, see if it knows about the virtual services
  • Create/modify a VS and look for logs in istiod. If it doesn’t log about an update to a VS, then it is likely this issue
1 Like