Hello, I am using istio v1.10.3 and currently having AWS NLB as the load balancer for istio-ingressgateway by using “service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip” annotation that registers the ip addresses of istio-ingressgateway pods as targets in the AWS NLB and I need to provide graceful shutdown of istio-ingressgateway pods due to a limitation of AWS NLB that it can send traffic to drained targets up to 180 seconds according to AWS. They advised to fail the health check and wait for a while to avoid this limitation.
I tried TERMINATION_DRAIN_DURATION_SECONDS
environment variable but it didn’t work since it has been removed from istio/pilot.go at a48d843bd3302ede4a9dadd491742b562f3f376f · istio/istio · GitHub.
Also I tried http://localhost:15000/healthcheck/fail
admin interface specified at endpoint /healthz/ready does not show draining or terminating state · Issue #32703 · istio/istio · GitHub but it didn’t work either due to the cache issue in pilot as shown in istio/probe.go at a48d843bd3302ede4a9dadd491742b562f3f376f · istio/istio · GitHub.
And I tried to wait for a long time by using the preStop hook in the lifecycle but it didn’t go well either.
As a final resort, I used below IstioOperator config which modifies drainDuration, parentShutdownDuration, terminationDrainDuration and terminationGracePeriodSeconds and it can make the pods stay more but it also shows some failures in the NLB side.
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: istiocontrolplane
spec:
profile: default
components:
ingressGateways:
- enabled: true
k8s:
overlays:
- kind: Deployment
name: istio-ingressgateway
patches:
- path: spec.template.spec.terminationGracePeriodSeconds
value: 310
podAnnotations:
proxy.istio.io/config: '{ "drainDuration": 301s, "parentShutdownDuration":
302s, "terminationDrainDuration": 303s }'
replicaCount: 2
service:
ports:
- name: https
port: 443
protocol: TCP
targetPort: 8443
serviceAnnotations:
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz/ready
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "15021"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: HTTP
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "6"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
name: istio-ingressgateway
So I wonder what’s the best way to make graceful shutdown of the istio-ingressgateway pods in the istio v1.10.3 or above for AWS NLB. Unfortunately, I can not change from “nlb-ip” to “nlb” for some reasons.
Thank you,
Eric