I’m trying to debug a random error when a pod talks to another pod using the ‘public’ name in EKS.
From what I can gather up to now: the orignal pod does a dns request, get the external load balancer (LB) IP and send the requests there.
Now the LB choose a random node and forward the request to the nodeport, the ingressgateway pod there takes the request, define the target pods and forward the request.
But sometimes, we end up with either a 503 originated by calling envoy, or just an “empty response from server”, meanwhile we observe TCP Resets at the load balancer level which I can’t determine if they orignate from the node or the istio gateway pod (the request never reach the target envoy proxy if I read properly the envoy logs).
So my questions are:
- Shouldn’t an external name defined in the gateway be resolved internally and never go out of the cluster ?
- How can I debug further where the reset packet comes from and why ?
I’ve been unable to join slack, the invite link is no longer valid.