Pilot failed to sync updates to sidecars

#1

We’re trying to use Istio 1.0.5 in our production cluster with ~750 services and ~8.000 pods, with only Pilot and Citadel enabled. (Actually we want to disable Citadel, if possible, but our testing in a PoC cluster showed that Citadel is needed by Pilot.) Despite large number of pods, we only injected the sidecar to a service with 5 pods.

After three days running, suddenly our service could not be accessed. All HTTP calls returned 5xx errors. We opened istioctl proxy-status and it showed that LDS and RDS information in the sidecars were stale:

We then looked at Pilot logs showed many errors with this message: Error adding/updating listener <address>: unable to read file: /etc/certs/root-cert.pem.

Immediately, to recover service, we deleted Citadel pod to restart it, and several seconds after Citadel restarted, we confirmed that the pods were able to serve traffic again.

Because this is a production environment, we didn’t gather much logs while troubleshooting this; our focus was to restore service.

With this information, could anyone help with this:

  • How to prevent this to happen again in the future? (e.g. similar error happen again)
  • Could we completely disable Citadel?
#2

Any help? We’re trying to fully deploy Istio in production and hoping to get this clear.

#3

You can disable citadel by using “istio” instead of “istio-auth”.

If you are using helm, you can go through “istio-1.0.6/install/kubernetes/helm/istio/” and change the values.yaml file to get the istio setup you want