This is a simple question: when handling in mesh service call, does Envoy route the call directly to one of the pod endpoints, or does it route to Kubernetes service ClusterIP, and let it (the iptables redirect rules on the host which were setup by Kube-proxy) route to one of the destination pod IP?
I think Envoy chooses the destination pod directly based on the routing rules we setup with VirtualServices and DestinationRules, without going through kube service ClusterIP, otherwise service ClusterIP would route round robin and you loose the fine traffic control ability promised by Istio.
But I just want to make sure my understanding is correct. The reason I have a little doubt is that I dumped out pilot eds and config json files, for a single service (say the “details” service comes with the BookInfo sample app), I saw the true “details” POD IP in endpoints section, but I also saw the service ClusterIP in dynamic_route_configs section.
What is the point of including ClusterIP in the xDS data if Envoy doesn’t use it for routing? Is the purpose that just in case some client uses the ClusterIP (say 10.96.225.126:9080) rather than the service name /details:9080 in the URL, Istio still knows how to route it to the POD endpoint, without truly going through ClusterIP?
I did some digging and found the answer: according to this Istio document, step 5, from the matching service (with version if applicable), Enovy further looks for the endpoints that it obtained from Pilot via ADS using the matching service name as key, and directly sends the request to one of the endpoints, without going through the Kubernetes service ClusterIP.
To be sure, I got into the productpage Envoy sidecar container and did “lsof -i” to get all connections and service ClusterIP was not there, but rather the direct pod IPs.
This confirms my believe and it does make sense. If Enovy indeed routed to the service ClusterIP, then it would lose the ability for L7 finer traffic control.
I guessed I was fooled by this Istio Kiali picture of the Istio sample app bookInfo call flow:
The triangle represents the service and square represents the pod. It gives an impression that any web service call will go through the service first and then the service will route the request to the backend pod. That is certainly true for normal Kube (without Enovy sidecar) flow but a little misleading for Istio service mesh flow.
It gives an impression that any web service call will go through the service first and then the service will route the request to the backend pod.
The flow is a bit different. The request for a service is first processed at the source proxy and routed to a destination workload based on any applicable route rules. The triangle represents the resulting destination service. The square in this case doesn’t represent a pod but rather a versioned app based on app and version labels applied to the destination workload. You can change the graph type in Kiali to “Workload” to see workload nodes (circles). A workload is not necessarily equivalent to a pod, but it does represent the entity physically servicing the request. Also, if you prefer not to see the service nodes you can optionally remove them from the graph by unchecking the “Service Nodes” option in the Display dropdown.
You are right that the graph can make it look like a request “goes through” the service but we wanted to be able to represent that the source requests were routed to the service, and then serviced by a specific app/workload. For example, it allows us to visualize that productpage is making requests for reviews, and that those requests are in turn routed to three different versions of the reviews service.
I have certain doubts about this… I’m trying to test the scenario. I deployed clusterip service called
nc with netcat listening on port 8888. I also have testing pod
network-multitool from which I’m trying telnet/netcat to clusterip IP address/port (10.109.209.96:8888) directly. I can see following output various istioctl commands:
./bin/istioctl proxy-config listener network-multitool.default | grep nc.default.svc.cluster.local
10.109.209.96 8888 Trans: raw_buffer; App: HTTP Route: nc.default.svc.cluster.local:8888
10.109.209.96 8888 ALL Cluster: outbound|8888||nc.default.svc.cluster.local
./bin/istioctl proxy-config endpoint network-multitool.default | grep nc.default.svc.cluster.local
172.17.0.16:8888 HEALTHY OK outbound|8888||nc.default.svc.cluster.local
This indicates to me that istio-proxy captures traffic destined to clusterip 10.109.209.96:8888. And then creates new TCP connection directly to endpoint IP belonging to this route/cluster 172.17.0.16:8888. Completely avoiding standard k8s services created in iptables / ipvs by kube-proxy.
Anyway I can’t prove this idea using netstat . By doing netstat in
istio-proxy or ‘main’ container I can see only this line:
tcp 0 0 172.17.0.17:53294 10.109.209.96:8888 ESTABLISHED 123/nc
I think this might have something to do with iptables redirect and fact that netstat sees the connections before they are being redirected by iptables? But even in that case I would expect to see line such as
172.17.0.17:xxxxx 172.17.0.16:8888 initiated by envoy process.
After some testing I can see this has something to do with pure TCP. If I do http request using curl it’s obviously unsuccessful with 503 returned by envoy proxy. But TCP connection remains and I can suddenly see in netstat new connectoin (btw. normally in
istio-proxy container you can’t see what process established the original connection… I mixed netstat output from
tcp 0 0 172.17.0.17:54132 172.17.0.16:8888 ESTABLISHED 12/envoy
tcp 0 0 172.17.0.17:52882 10.109.209.96:8888 ESTABLISHED 107/curl
So options that come to my mind is:
- netstat is telling the truth and in first case envoy is bypassed. I have always thought ALL the egress/ingress traffic goes through istio-proxy no matter what. Also it’d mean
istioctl proxy-config output is very misleading
- netstat is not telling the truth… there is in fact some connection established directly to pod… I just can’t see it for some reason?