I’m seeing all the inbound TCP connections to my app (on kubernetes) being terminated every 30 minutes or so since we upgraded istio from 1.1 to 1.3.
Both the downstream client and my app (the upstream server) see it as the remote initiating the connection closure. Upon looking at the TCP packet capture, it seems that the proxy is sending FIN to both the downstream and the upstream.
I don’t understand what might be triggering it. All I have from istio are the access logs.
Help?
It looks like this might be the issue: Established TCP connections were destroyed when Envoy receives configuration from Pilot
I’m consistently seeing xDS updates being pushed the app just a second before the connections terminate.
The issue that @zhaohuabing reported seems to have been a consul thing.
According to the comment on the issue, it seems that the termination of downstream connections is to be expected.
I don’t what could be causing the xDS updates, tho. We don’t use consul.
Still looking for solutions to this problem. Could anyone please point me to something that might help?
Hi. the updates to the sidecar can come under a variety of conditions. If new services are brought online, or new configurations are added, the sidecars need to be updated so that the app can reliably talk to the desired destination.
If you feel that unrelated updates may be causing envoy drains [thats what causes the tcp connection breakage], then use the Istio Sidecar resource to define the exact set of dependencies of the app. For example, if the app does not depend on any service in another namespace, then you can specify “./*” in the egress section [along with istio-system]. Or if you want to be further restrictive, just define the exact set of services that the app communicates with in the Sidecar resource.
If none of these dependencies change, then the app’s sidecar will not get any updates and your long running connections will still be maintained as is.
Hi Shriram. Thanks for the reply.
The strange thing is that when I dumped the envoy config before and after the connections were drained, I didn’t see any meaningful diff other than timestamps and reordering of keys.
I suspect that it’s the LDS PUSH that’s causing the drain, but I don’t see actual changes to any of the listeners.
The behaviour that you are describing is caused by Any matching done in Envoy to decide whether a listener has changed and hence need to be drained - With inconsistent hashing of Any, we run in to the issue. This was fixed as part of https://github.com/istio/istio/issues/17139
Wow. That seems to be it. Thanks so much!