Isto 503 no healthy upstream after pod killed

Ryan_Harley · December 16, 2020, 9:20pm

We are using Istio 1.5.4 and have started using a headless service to get direct pod to pod communication working with Istio mTLS. This is all working fine, but we have recently noticed that after killing one of our pods we get 503 no healthy upstream errors for a very long time afterwards (many minutes). If we go back to a ‘normal’ service we get a few 503 errors and then the problem is fixed very quickly. We have traced the communications of the envoy container using kubectl sniff and can see that connections are maintained for a long period after receiving 503’s, and even that new connections are established to the previously killed pod IP.

We have got the circuit breaker configuration on a destination rule for the service in question, and that doesn’t seem to have helped either. We have also tried setting ‘PILOT_ENABLE_EDS_FOR_HEADLESS_SERVICES’ which seemed to improve the 503 errors situation, but strangely interfered with pod to pod direct IP configuration.

Does anyone have any suggestions on why we were receiving the 503 errors or how to avoid them?

Jonathan_Perlow · May 11, 2023, 10:13pm

Did you ever figure out the issue here? We’re observing the exact same behavior.

Topic		Replies	Views
Istio Give 503 error with no healthy upstream when pods get evicted Networking	5	15299	August 3, 2020
503 errors even when pod is healthy	0	1082	May 5, 2020
Pod that return 503 are not called Networking	1	831	September 18, 2019
503 errors even with ISTIO_MUTUAL DestinationRule Networking	4	2431	September 10, 2019
How to handle downstream keep alive connections not closed when upstream closes	3	2636	May 3, 2023

Isto 503 no healthy upstream after pod killed

Related topics