Unable to reach internal ingress when pod is running in the same node of the ingress

Nicholas · August 18, 2023, 2:27pm

Hi there,

I have a weird networking issue in my EKS clusters (I found this issue replicable in two different EKS clusters terraformed the same way). This started like a random networking issue occurring on some pods; then, I found a reliable way to replicate it.

My cluster has two Ingress Gateways:

default-private-gateway
default-public-gateway

Both using NLB. The private ingress is reachable only internally, while the public one is exposed to the public Internet. The mesh has the PeerAuthentication set to strict. Sidecars are injected by default on all namespaces. The ingresses’ pods are running in the same istio-ingress namespace. Each ingress has only one pod running. Then, the cluster has at least three nodes (one for each AZ). Let’s call “NodeA” the node where the private ingress runs.

I deploy an httpbin service with a VirtualService using the private gateway (with a DNS like httpbin.int.mydomain.com). httpbin is now running on NodeB.

I run two Debian pods. DebianA runs on NodeA, while DebianB runs on another node.

Here is what happens:

curl httpbin using its Service from DebianA works fine.
curl httpbin using its Service from DebianB works fine.
curl httpbin using the VirtualService from DebianA fails with curl: (35) Recv failure: Connection reset by peer
curl httpbin using the VirtualService from DebianB work fine.

It seems like when a pod runs in the same node with the private ingress, it cannot reach it. In comparison, the ingress works without problems when I try from other nodes.

I tried running httpbin on NodeA: it is always reachable. So it seems an “outbound problem” only.

I found that nothing is logged on the private ingress pod (with debug level) when trying the “DebianA to httpbin” case. While in all other cases, I see the traffic being correctly logged. So, I suspect the network never reaches the ingress in the first place.

My Istio knowledge is limited, and so far I have no further ideas on how to investigate (and solve) this issue.

Any suggestion is welcome!

Thanks.

rsalmond · August 25, 2023, 5:14pm

Can you elaborate on what you mean by “curl using Service” vs “curl using VirtualService”?

Nicholas · August 28, 2023, 9:09am

With Service I’m referring to the usual Kubernetes Service resource, while by VirtualService the Istio one.

Anyway, after some digging I found the issue was due to the NLB with the “IP preservation” enabled (service.beta.kubernetes.io/aws-load-balancer-target-group-attributes). With this option disabled it works.

Topic		Replies	Views
Issue with using an internal ingress	3	1282	March 27, 2019
How to debug ingress gateway for hanging requests Networking	1	1461	August 5, 2022
Istio 1.6.6: pods can't access kubernetes.default.svc.cluster.local Networking	2	1335	August 1, 2020
404 status when curl from pod where istio injection is enabled and works fine in non istio namespace	5	1973	October 31, 2022
Istio internal resolution/routing to understand sporadic TCP resets from pod to pod Networking	0	422	February 17, 2022

Unable to reach internal ingress when pod is running in the same node of the ingress

Related topics