These are the versions of the tools we are currently using:
Istio v1.4.3 (setup through official helm chart)
Kubernetes v1.15.7 (setup through kops)
We have setup a Kubernetes cluster on AWS using kops
We are using the aws-alb-ingress-controller helm chart for provisioning our ALB loadbalancer as our ingress into the cluster
We terminate our SSL connections on the ALB using ACM
The istio-ingressgateway service is of type NodePort and exposes the traffic port (80) and the status port (15020)
It has externalTrafficPolicy set to Cluster so that all (5) nodes report as healthy to the ALB
Our ALB is configured to forward traffic to the istio-ingressgateway traffic port and perform healthchecks on the status port (HTTP /healthz/ready)
The setup seems to be working fine. We see all nodes as healthy in the ALB and traffic is distributed over all backends.
However, when we apply load to the system (+/- 50 r/s in a randomized fashion) by mocking long-running requests (ie. curl https://some-service.our.domain/wait/750 which simply blocks for 750ms), we see that when we scale down the istio-ingress gateway deployment (eg: from 3 to 2 replicas) itself, in-flight connections are dropped and the loadbalancer (ALB) returns 502 responses.
I was under the impression that the istio-ingressgateway pod would handle the SIGTERM that was sent to it by Kubernetes correctly and would stop accepting new requests on the pod that was terminating, while waiting for gracePeriod seconds before forcefully killing the pod with SIGKILL. However, we see that the pod is immediately killed, causing the 502 connections returned from the ALB
The istio-ingressgateway pods are configured with a readinessProbe:
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15020
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
Are we missing something in our setup which would provide the expected behaviour? Or am I misunderstanding how istio should handle this downsizing?
Thanks in advance