Stale EDS in ingressgateway when pilot get killed

tomoyat1 · March 12, 2020, 12:54pm

Hi,

At our company we are having this issue in which ingressgateways fail to receive updated EDS data after one or more pilot instances get killed.

After pilot stops, either by being manually scaled in or through a AWS spot instance termination, the remaining ingressgateways do not get sufficiently updated.
A quick check with istioctl proxy-status reveals that EDS is stuck at "STALE (Never Acknowledged).
However, all other sidecars are show to be “SYNCED”
This state persists after the pilot(s) are brought back, which is confusing us since we believe that all the envoy sidecar/gateways should reconnect to whatever existing pilot instances.

This behaviour being caused by a spot instance termination causes other pods to be killed as well, and since the ingressgateways are not properly updated to reflect the fact it leads to a “no healthy upstream” error being returned to our users.

Is this sort of behaviour expected out of pilot and ingressgateway, or is it something abnormal?

We run Istio on EKS version v1.14.9-eks-502bfb with spot instances as nodes.

Any help debugging this problem would be very much appreciated.

tomoyat1 · April 7, 2020, 6:04am

We worked around this problem by scheduling our pilot and ingessgateway to an ondemand instance.
We were also able to somewhat reliably reproduce this problem by restarting application pods while under load.

Still not sure what the root cause was.

Topic		Replies	Views
Istioctl proxy-status keeps running STALE across services in the mesh, causing sporadic 404s for requests through the ingressgateway’s blackhole:80 route Networking	3	2777	May 23, 2019
Istio Ingress-Gateway Always Stale Networking	3	1802	July 9, 2019
Istio Ingress Gateway RDS sync state STALE (Never acknowledged) Config	2	3365	April 30, 2020
Istio ingressgateway not syncing LDS with pilot ( for 443 configuration )	0	841	July 11, 2019
Istiod/pilot out of sync with virtual services defined in cluster	1	575	October 28, 2022

Stale EDS in ingressgateway when pilot get killed

Related topics