What controls Envoy's xds_grpc retry rate?


I ran into a scenario where the Istio Proxy was logging timeouts while trying to talk to the xds_grpc Pilot cluster. On each timeout, the envoy.cluster.xds_grpc.http2.rx_reset metric would increase by one.

gRPC config stream closed: 2, stream timeout

These log messages would occur every 5 minutes and 10 seconds. The timeout for the cluster is set to 10 seconds so I assume the retry rate is every 5 minutes.

I’m interested in submitting a PR upstream to add jitter to this retry rate, but I’m having trouble hunting down the relevant code.


Where is this 5 minute retry rate configured by Istio and where is the Envoy code that handles the retry rate?

The only 5 minute related value I can find in the Envoy’s /config_dump is this:

"upstream_connection_options": {
  "tcp_keepalive": {
    "keepalive_time": 300