We are using Istio 1.2.3 on EKS and finding we are getting a high rate of push errors. We are getting a 5-8% error rate every 5 to 10 minutes.
Im using this query to calculate the error % and its frequently 5-8%
sum(rate(pilot_xds_push_errors{job="pilot"}[1m])) / sum(rate(pilot_xds_pushes{job="pilot"}[1m]))
Also I notice that our pushes spike as high as 900 ops and are frequently 300ops every 5 minutes
sum(rate(pilot_xds_pushes{type!~".*_senderr"}[1m]))
When i check the pilot logs I frequently see entries like this:
transport: http2Server.HandleStreams failed to read frame: read tcp 172.99.99.xx:15010->172.99.99.99:36838: use of closed network connection
istio-pilot-aaa-bbbb discovery 2019-08-21T14:00:54.823655Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
istio-pilot-aaa-bbbb discovery 2019-08-21T14:00:54.823703Z info ads ADS: "172.99.99.xx::36838" sidecar~172.99.99.99~podname.svc.cluster.local-34312 terminated rpc error: code = Canceled desc = context canceled
What more can I do to debug this issue or what may cause this