Service temporarily unresponsive after re-running "istioctl install"

Should I expect istioctl install to be an idempotent, undisruptive action if I’m not making changes to the IstioOperator? I’m currently finding it to cause consistent disruption of service.

After recently upgrading to Istio 1.6 and migrating from the Helm-based install to Istioctl, I’m having trouble maintaining a stable environment as managed via a CICD pipeline, because of this. Whenever my pipeline runs, istioctl install takes my service down for a brief outage. I’m able to reproduce the same problem from my local machine, if I authenticate and run the same command as my pipeline.

The command is this (literal values redacted):
istioctl install -r "1-6-4" --set spec.meshConfig.defaultConfig.tracing.custom_tags.workspace.literal.value="foo" --set spec.meshConfig.defaultConfig.tracing.zipkin.address="bar:9411" -f istio-operator.yaml -f istio-operator-nlb-overlay.yaml

Just repeatedly applying the same IstioOperator manifest yields the same result… after running this command I’m unable to access my http service for a couple minutes. curl in the terminal comes back with curl: (52) Empty reply from server and Postman says Error: socket hung up.

Is this normal? I have Istio deployed in EKS and I recently moved from classic load balancers to using NLBs.

I am experiencing the same issue, did you ever find a solution?

We found a workaround, yes. Thanks to this discussion: Istio Operator 1.6.8, install issues, NLB + Target groups being recreated

It seems like the issue came down to NLB and the way it connects to backend targets via NodePorts. Anytime our Istio Ingress got assigned a new NodePort, the NLB would temporarily drop connections.

So we solved the problem by adding specific node ports to our configuration, so that we’d just avoid the problem. Here’s an example showing what we added to our config… note that I’ve edited this down to just show the relevant bits:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  components:
    ingressGateways:
      - enabled: true
        k8s:
          service:
            ports:
              - name: status-port
                port: 15021
                targetPort: 15021
                nodePort: 30000 # we made this number up, range is 30000-32767
              - name: http2
                port: 80
                targetPort: 8080
                nodePort: 30001 # we made this number up, range is 30000-32767
              - name: https
                port: 443
                targetPort: 8443
                nodePort: 30002 # we made this number up, range is 30000-32767
2 Likes

Great, thank you!

For future readers this was still an issue in 1.7.0