VirtualService per-destination timeouts and retries

maxenglander · October 24, 2020, 8:36pm

Hello

I would like to request/contribute a new feature. The CONTRIBUTING doc said to

Discuss your idea with the appropriate working groups on the working group’s mailing list.

I don’t see anywhere where the mailing lists are…so I’m assuming this discussion board is it? If not, please point me to the right place and I’ll move this topic there

I would like to be able to set per-destination timeouts and retries in a VirtualService.

Current state

Right now, Istio allows you to set timeouts and retries at the route-level, like so:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
  - ratings
  http:
  - route:
    - destination:
        host: ratings
    retries:
      attempts: 3
      perTryTimeout: 2s
    timeout: 5s

Feature request

Ideally, I would like to be able to set timeouts at the destination-level as well, e.g.:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
  - ratings
  http:
  - route:
    - destination:
        host: ratings
      retries:
        attempts: 3
        perTryTimeout: 2s
      timeout: 5s
      weight: 99
    - destination:
        host: ratings
      retries:
        attempts: 5
        perTryTimeout: 1s
      timeout: 5s
      weight: 1

Why is this useful?

The reason I would like this is because I can’t always anticipate the effect of setting a retry/timeout configuration on a service. Certain configurations work better than others, and it can take a bit of trial and error to find one that works well and has the best impact on overall service reliability.

Additionally, when adding retries/timeouts to a service for the first time, especially when adopting Istio as a technology overall, being able to A/B test the retry/timeout logic is useful and helps de-risk adoption.

Current workaround

Although Istio doesn’t have first-class support for per-destination retries/timeouts, I can currently work around this by directly setting Envoy headers like so:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
    - ratings
  http:
  - route:
    - destination:
        host: ratings
      weight: 99
      headers:
        request:
          set:
            x-envoy-upstream-rq-timeout-ms: "5000"
            x-envoy-max-retries: "3"
            x-envoy-upstream-rq-per-try-timeout-ms: "2000"
    - destination:
        host: ratings
      weight: 1
      headers:
        request:
          set:
            x-envoy-upstream-rq-timeout-ms: "5000"
            x-envoy-max-retries: "5"
            x-envoy-upstream-rq-per-try-timeout-ms: "1000"

Shortcomings of workaround

I don’t hate the workaround. However, as my org is at the first stage of adopting Istio, we want to provide the best developer ergonomics possible. Learning how to configure various Istio controls has been challenging for us, and an inconsistent interface (sometimes Istio config, sometimes Envoy headers) can add unnecessarily to the learning curve.

Topic		Replies	Views
Timeout and Retries configuration problem Networking	1	248	August 29, 2023
Fault and Retry on VirtualService	0	1213	February 7, 2019
Retry not working for 5xx error in istio 1.4.3	0	975	July 21, 2020
How to test the Retry traffic management feature using sample bookinfo or any other application Test and Release	0	517	November 6, 2020
Timeout on a VirtualService	0	389	February 7, 2019