How should one fine-tune circuit breaker values in the context of dynamic service scalability?

Hi guys,

I’m working on adding Istio circuit breakers on a bunch of gRPC microservices that can scale dynamically to various numbers of instances. I’m curious how can you configure the http2MaxRequests threshold considering that circuit breaking is enforced on the Envoy clients and that once the number of clients for a service increases the circuit breaker threshold should be configured to a smaller value and when the number of clients decreases the threshold should increase.

The simplest example I encountered is to have two services A and B. Service A calls service B and the number of instances for service A can increase or decrease. If the threshold is set to http2MaxRequests=1 and #pods(A)=1 then service A can make at most 1 concurrent request to service B. If we scale and get #pods(A)=3, then service A is now able to make at most 3 concurrent requests to service B which is not the desired behaviour.

              - A(pod) -
            /           \
           /             \
A(service) -->  A(pod) --> B(service)
           \             /
            \           /
              - A(pod) - 

Consequently my questions are:

  • Are there any best practices for configuring circuit breaker thresholds specifically for Istio?
  • Is automatically scaling the configured circuit breaking thresholds to be proportional to the number of clients something sane to do? Can Istio offer this feature?

Thanks!

Did you ever figure this out? I’m also interested in folks’ experience with this.