I have a gRPC Virtual Service configured to retry on UNAVAILABLE or RESOURCE_EXHAUSTED. However, it is failing fast with UF upstream_reset_before_response_started which appears to the client code as an UNAVAILABLE gRPC failure.
Why isn’t this connection reset being retried by envoy?
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
annotations:
meta.helm.sh/release-name: pryon
meta.helm.sh/release-namespace: pryon
creationTimestamp: "2020-11-25T20:45:22Z"
generation: 1
labels:
app: d2dgpuexchange
app.kubernetes.io/managed-by: Helm
host: int-d2dgpuexchange
name: mesh-internal-http-int-d2dgpuexchange-grpc
namespace: pryon
resourceVersion: "37272196"
selfLink: /apis/networking.istio.io/v1beta1/namespaces/pryon/virtualservices/mesh-internal-http-int-d2dgpuexchange-grpc
uid: 2daa13f9-1085-43d8-a2db-30afebddc8bf
spec:
hosts:
- int-d2dgpuexchange-grpc.pryon.svc.cluster.local
http:
- match:
- port: 8082
retries:
attempts: 100
perTryTimeout: 15s
retryOn: resource-exhausted,unavailable
route:
- destination:
host: int-d2dgpuexchange-grpc.pryon.svc.cluster.local
timeout: 15s
This is the access log from the downstream/clientside sidecar.
jsonPayload: {
duration_request: "0"
start_time: "2020-11-30T10:57:13.364339Z"
duration_response: "-"
istio_policy_status: "-"
upstream_host: "10.2.2.3:8082"
route_name: "-"
user_agent: "grpc-python/1.31.0 grpc-c/11.0.0 (linux; chttp2)"
downstream_remote_address: "10.2.18.54:53794"
protocol: "HTTP/2"
upstream_transport_failure_reason: "-"
upstream_service_time: "-"
duration_tx: "-"
response_code_details: "upstream_reset_before_response_started{connection failure}"
requested_server_name: "-"
bytes_received: "283"
request_id: "276979d9-7d96-40ba-969c-3488584b9066"
upstream_local_address: "-"
method: "POST"
duration: "0"
bytes_sent: "0"
response_code_grpc: "-"
response_code: "200"
authority: "int-d2dgpuexchange-grpc.pryon.svc.cluster.local:8082"
upstream_cluster: "outbound|8082||int-d2dgpuexchange-grpc.pryon.svc.cluster.local"
response_flags: "UF"
downstream_local_address: "10.193.194.16:8082"
x_forwarded_for: "-"
path: "/pryon.int.oe.v1alpha1.SectionProcessing/CreateSectionExchange"
}
routeconfig from envoy config_dump
{
"name": "int-d2dgpuexchange-grpc.pryon.svc.cluster.local:8082",
"domains": [
"int-d2dgpuexchange-grpc.pryon.svc.cluster.local",
"int-d2dgpuexchange-grpc.pryon.svc.cluster.local:8082",
"int-d2dgpuexchange-grpc",
"int-d2dgpuexchange-grpc:8082",
"int-d2dgpuexchange-grpc.pryon.svc.cluster",
"int-d2dgpuexchange-grpc.pryon.svc.cluster:8082",
"int-d2dgpuexchange-grpc.pryon.svc",
"int-d2dgpuexchange-grpc.pryon.svc:8082",
"int-d2dgpuexchange-grpc.pryon",
"int-d2dgpuexchange-grpc.pryon:8082",
"10.193.194.16",
"10.193.194.16:8082"
],
"routes": [
{
"match": {
"prefix": "/",
"case_sensitive": true
},
"route": {
"cluster": "outbound|8082||int-d2dgpuexchange-grpc.pryon.svc.cluster.local",
"timeout": "15s",
"retry_policy": {
"retry_on": "resource-exhausted,unavailable",
"num_retries": 100,
"per_try_timeout": "15s",
"retry_host_predicate": [
{
"name": "envoy.retry_host_predicates.previous_hosts"
}
],
"host_selection_retry_max_attempts": "5"
},
"max_grpc_timeout": "15s"
},
"metadata": {
"filter_metadata": {
"istio": {
"config": "/apis/networking.istio.io/v1alpha3/namespaces/pryon/virtual-service/mesh-internal-http-int-d2dgpuexchange-grpc"
}
}
},
"decorator": {
"operation": "int-d2dgpuexchange-grpc.pryon.svc.cluster.local:8082/*"
}
}
],
"include_request_attempt_count": true
}