Graceful Shutdown of istio-ingressgateway for AWS NLB

Hello, I am using istio v1.10.3 and currently having AWS NLB as the load balancer for istio-ingressgateway by using “service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip” annotation that registers the ip addresses of istio-ingressgateway pods as targets in the AWS NLB and I need to provide graceful shutdown of istio-ingressgateway pods due to a limitation of AWS NLB that it can send traffic to drained targets up to 180 seconds according to AWS. They advised to fail the health check and wait for a while to avoid this limitation.

I tried TERMINATION_DRAIN_DURATION_SECONDS environment variable but it didn’t work since it has been removed from istio/pilot.go at a48d843bd3302ede4a9dadd491742b562f3f376f · istio/istio · GitHub.
Also I tried http://localhost:15000/healthcheck/fail admin interface specified at endpoint /healthz/ready does not show draining or terminating state · Issue #32703 · istio/istio · GitHub but it didn’t work either due to the cache issue in pilot as shown in istio/probe.go at a48d843bd3302ede4a9dadd491742b562f3f376f · istio/istio · GitHub.
And I tried to wait for a long time by using the preStop hook in the lifecycle but it didn’t go well either.

As a final resort, I used below IstioOperator config which modifies drainDuration, parentShutdownDuration, terminationDrainDuration and terminationGracePeriodSeconds and it can make the pods stay more but it also shows some failures in the NLB side.

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istiocontrolplane
spec:
  profile: default
  components:
    ingressGateways:
    - enabled: true
      k8s:
        overlays:
        - kind: Deployment
          name: istio-ingressgateway
          patches:
          - path: spec.template.spec.terminationGracePeriodSeconds
            value: 310
        podAnnotations:
          proxy.istio.io/config: '{ "drainDuration": 301s, "parentShutdownDuration":
            302s, "terminationDrainDuration": 303s }'
        replicaCount: 2
        service:
          ports:
          - name: https
            port: 443
            protocol: TCP
            targetPort: 8443
        serviceAnnotations:
          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz/ready
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "15021"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: HTTP
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "6"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
          service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
          service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
      name: istio-ingressgateway

So I wonder what’s the best way to make graceful shutdown of the istio-ingressgateway pods in the istio v1.10.3 or above for AWS NLB. Unfortunately, I can not change from “nlb-ip” to “nlb” for some reasons.

Thank you,
Eric

3 Likes

I have the same issue. I’m at wits end.

Not sure if you have tried this method.

I’ll try it myself on the next upgrade

I think I figured it out, and here is an explanation, and example manifests for istio operator.