Hi,
I am trying to keep cluster high availability. According to the scenario, one of the nodes with the policy pod may fall, but the cluster cannot lose the request.
I testing the scenario, the user sends a request for a service for which JWT authorization is configured using Istio. When a node crashes, I often get error 503. I found on the official site of Istio that something is connected with an error connecting to the mixer:
UNAVAILABLE
: Envoy cannot connect to Mixer and the policy is configured to fail close.
I tried to set up a retry policy for this case, but it looks like Istio is not handling this case. I try:
*retries:*
-
attempts: 30*
-
perTryTimeout: 1s*
-
retryOn: 500,503,504,retriable-status-codes,unavailable*
- timeout: 30s*
But I still get the error. I also tried to modify the FAIL_CLOSE policy by adding additional retries for HTTP and TCP config:
“typed_config”: {
“@type”: “type.googleapis.com/istio.mixer.v1.config.client.HttpClientConfig”,
“transport”: {
“network_fail_policy”: {
“policy”: “FAIL_CLOSE”,
“max_retry”: 10,
“base_retry_wait”: “0.100s”,
“max_retry_wait”: “1.300s”
}
“name”: “mixer”,
“typed_config”: {
“@type”: “type.googleapis.com/istio.mixer.v1.config.client.TcpClientConfig”,
“transport”: {
“network_fail_policy”: {
“policy”: “FAIL_CLOSE”,
“max_retry”: 10,
“base_retry_wait”: “0.100s”,
“max_retry_wait”: “1.300s”
}
I did not find any changes from this configuration at all. I set up outlier detection and it looks like it really works and throws unheatly endpoint out of the connection pool, however, will I still get a 503 error on the first request.
I am new to Istio and would like to ask the community if there are ways to solve my problem and why the retries for this error are ignored, while I set up the rule in retry-on?