Pilot failed to sync updates to sidecars

We’re trying to use Istio 1.0.5 in our production cluster with ~750 services and ~8.000 pods, with only Pilot and Citadel enabled. (Actually we want to disable Citadel, if possible, but our testing in a PoC cluster showed that Citadel is needed by Pilot.) Despite large number of pods, we only injected the sidecar to a service with 5 pods.

After three days running, suddenly our service could not be accessed. All HTTP calls returned 5xx errors. We opened istioctl proxy-status and it showed that LDS and RDS information in the sidecars were stale:

image2019-1-17_14-11-10

We then looked at Pilot logs showed many errors with this message: Error adding/updating listener <address>: unable to read file: /etc/certs/root-cert.pem.

image2019-1-17_13-57-44

Immediately, to recover service, we deleted Citadel pod to restart it, and several seconds after Citadel restarted, we confirmed that the pods were able to serve traffic again.

Because this is a production environment, we didn’t gather much logs while troubleshooting this; our focus was to restore service.

With this information, could anyone help with this:

  • How to prevent this to happen again in the future? (e.g. similar error happen again)
  • Could we completely disable Citadel?

Any help? We’re trying to fully deploy Istio in production and hoping to get this clear.

You can disable citadel by using “istio” instead of “istio-auth”.

If you are using helm, you can go through “istio-1.0.6/install/kubernetes/helm/istio/” and change the values.yaml file to get the istio setup you want