Istio ELB - Nodes Out of Service

Hello,

My current set up is AWS EKS with Istio.

EKS Version: 1.11
Istio Version: 1.0.5 (I have tried with 1.0.6 with the same results).
I’m using the most recently updated EKS AMI’s for the worker nodes too (also had an older image with the same result).

The fun part here is I do the exact same set up in my PROD account and it seems to be working fine.

I follow all the steps to create an EKS cluster and then install istio (via helm). I tag my DMZ subnets with the cluster name and I can see the ELB get created. Immediately the nodes are out of service. I deploy my applications to the cluster. At this point I would expect things to work correctly.

I have verified that the ELB health check port is open on the nodes. I have noticed some oddities like the logs on the istio-ingressgateway have an error gRPC config stream closed: 14 no healthy upstream | Unable to establish new stream. Or checking proxy-status just returns a weird error and no data (looks empty).

I’m very new at Kubernetes and Istio so if anyone could help me figure this out I would greatly appreciate it. I’ve spend a lot of time spinning my wheels.

Thanks! Lee

Something I noticed working with EKS and Istio is that nodes appear our of service when:

  1. You don’t have a gateway which listens to the same port the health check is in. For example, on the default installation, I think it’s port 80. This was hard to debug as, just like your scenario, the ports were opened, but the ingress gateway was refusing connections unless a gateway was defined.

  2. You set up the external traffic policy to Local, which causes nodes which don’t have the ingress gateway to fail their checks.

Hopefully my experience is of use to you.

as @fernando says the culprit is the ELB health check port. What I usually do is to create a dummy gateway on port 80 with no virtual services attached to the gateway. this causes gateway to return 404… but I configure the ELB to treat gateway port 80 as a TCP port. So, a HTTP 404 returned by the gateway is seen as some data being returned on port 80. This keeps ELB health checks happy.

I able to solve this by adding the correct port.
istio 1.7.4
After externalTrafficPolicy was changed from cluster to local, the HealthCheck NodePort was changed:

    kubectl describe svc istio-ingressgateway -n istio-system | grep 'HealthCheck NodePort'
    HealthCheck NodePort:     30549

image