but this doesnt solve my problem, i didn’t provide more info in the description, basically i want prometheus (with istio sidecar + STRICT mtls) talk to the application pod (with istio sidecar + STRICT mtls), prometheus directly talks to the pod ips discovered from the k8 endpoints, there is no way to provide Host header with prometheus
i am trying to make “pod to pod” communication work with istio over STRICT mtls, theoretically should it work because the prometheus istio proxy will intercept the traffic and make mtls connection to application’s istio proxy?
there is a solution to not to use istio with prometheus but i am trying to make everything mtls including prometheus and its other components.
well, i saw both the links during troubleshooting, all that is needed is not to use istio-proxy with prometheus and then use tlsconfig to scrape the targets but I am trying to make it work when prometheus has istio sidecar enabled so there isnt any need to use tlsconfig.
do you know why istio doesnt allow pod-to-pod communication without adding host header? is it possible to make that happen?
I narrowed down the problem, pod to pod ip traffic goes through PassthroughCluster because pod ip isnt in istio proxy’s known endpoints, with PassthroughCluster istio sidecar acts as a TCP proxy and doesn’t interfere (bypasses) the connection so the TLS connection doesn’t happen.
i dont know how to make it work or if it is like that by design
@ianmiell pod to service communication works, I am looking for pod to pod
my use case is to make prometheus scrape the metrics from a service pod which is running in STRICT mtls enabled namespace. Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.
other stateful applications like redis, kafka, elasticsearch also communicate to the other peer pods directly, I think those applications also cant work with istio due to this problem
Yes, I know you meant pod-pod - we had to solve the same thing recently for a kafka-connect pod that needed to communicate podip<=>podip via STRICT mTLS through envoy
I don’t fully understand why adding a headless service solves this problem, but from looking at the code, it seems that the headless service triggers the EDS to treat the IP addresses as legitimate endpoints, and adds the service in the hosts header.
/prometheus $ wget -SqO /dev/null http://httpbin-headless.default.svc.cluster.local:8000/
HTTP/1.1 503 Service Unavailable
wget: server returned error: HTTP/1.1 503 Service Unavailable
The side effect of making this problem go away isn’t documented anywhere we could find. There are hints in recent changes to the code, but frankly, I don’t think I understand those changes.
@deepak_deore, I think, we got it working on our side. But still will need to properly test everything.
So far we found a few “acceptable” ways to make the direct pod-to-pod calls through istio-proxy:
Via a headless service (as was already pointed out).
Adding a headless Service in k8s results in an ORIGINAL_DST cluster in Envoy (LB will be: CLUSTER_PROVIDED). Making calls through that cluster should allow you to connect directly to pods.
You might need to provide a Host header, ie something like curl -vvv 10.178.180.3:8000 -H "HOST: httpbin-headless.default.svc.cluster.local:8000", where 10.178.180.3 is a Pod IP to which you are trying to connect. Host header will allow Envoy to pick the correct VirtualService/Route
You might also need to verify that your requests are flowing through the VirtualOutbound. In our scenario, we had to use the traffic.sidecar.istio.io/includeOutboundPorts annotation.
Via a “subset” defined in a DestinationRule.
This approach is similar to #1, but works when you have a regular service defined (ie. not Headless) and you also want to enable direct calls for the pods backing that service.
Change the DestinationRule rule and define a subset that uses the PASSTHROUGH LB:
When that’s done, a new “direct” ORIGINAL_DST cluster will appear in envoy’s /config_dump. And you should be able to make direct calls with something like:
Our setup of Istio is a bit “non-standard” due to different reasons. But I would be curios to see if you can get this working in your environment.
A few things that may help you debugging your configs:
collect outputs from envoy’s /clusters and /stats admin endpoints and save them to “before” files
try to make curl/wget calls
collect the stats again and compare with the “before” versions (diff before after)
In the stats you will see where your calls are flowing to.
If they are ending up in the PassthroughCluster on the sending side, then probably a correct virtualservice/route wasn’t matched.
If your calls are ending up in the InboundPassthroughCluster on the receiving side, then something might be wrong with the listeners config.
@npshenichnyy thanks for the reply, i mentioned in my later comment why i want pod-to-pod communication work over ip
Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.
Prometheus doesn’t and wont support adding Host header into the request which makes it difficult to call with host header.
I can exclude the target ips from strict mtls but trying to make everything under strict mtls.
The closest i got was with serviceentry and destinationrule but servicenetry’s host.addresses isnt working with a CIDR as mentioned in documentation, with single ip in servicenetry’s host.addresses works fine. I have raised a bug wich explains what I am saying here, there is no response so far to that bug.
create a headless service and add host header solves the same problem for me.
But still, when the ip address is the current pod, it doesn’t work.
My use case is, one of the pod will get external request (through regular service) and invalidates the cache, and then it will use the pod to pod communication to notify peer pods to invalidate their cache, I got the ip addresses of all pods from the endpoints of the headless service, which includes the pod which was chosen for the external request… and I don’t think there is a way to exclude itself from the loop, but good thing is, it doesn’t matter even if it failed… because it already finished the cache invalidation.
Using a headless service for the port in question fixed the problem for us. We are using raw TCP only with strictTLS and can establish connections. Example service def:
This whole thread could and should be converted into a bug. Can someone in this thread report an issue and report the final solution? That way this can be documented. Again I think headless service is the way to go, although I don’t understand all of the moving parts of this system and the mechanics they represent.