503 between pod to pod communication (1.5.1)

there are 2 namespaces (source and target) with STRICT mtls

  1. 200 from source namespace pod to target service
curl -s -o /dev/null -w "%{http_code}" alertmanager-operated.target.svc.cluster.local:9093; echo
200

  1. but 503 error when source pod accesses target pod directly
curl -s -o /dev/null -w "%{http_code}" 172.17.0.13:9093; echo
503

is this really an issue and how to fix this?

Could you try the curl with host header? I’m not sure if this will help tho. I thought that proxy know about all endpoints. :slight_smile:

@Tomas_Kohout
yes, it works with Host header :slight_smile: !

curl -s -o /dev/null -w "%{http_code}" -H "Host: alertmanager-operated.target.svc.cluster.local" 172.17.0.11:9093
200

but this doesnt solve my problem, i didn’t provide more info in the description, basically i want prometheus (with istio sidecar + STRICT mtls) talk to the application pod (with istio sidecar + STRICT mtls), prometheus directly talks to the pod ips discovered from the k8 endpoints, there is no way to provide Host header with prometheus

i am trying to make “pod to pod” communication work with istio over STRICT mtls, theoretically should it work because the prometheus istio proxy will intercept the traffic and make mtls connection to application’s istio proxy?

there is a solution to not to use istio with prometheus but i am trying to make everything mtls including prometheus and its other components.

2 Likes

Check out this BYO Prometheus with mTLS :slight_smile:

Or even better :slight_smile: https://github.com/istio/istio/issues/7352#issuecomment-439617432

well, i saw both the links during troubleshooting, all that is needed is not to use istio-proxy with prometheus and then use tlsconfig to scrape the targets but I am trying to make it work when prometheus has istio sidecar enabled so there isnt any need to use tlsconfig.

do you know why istio doesnt allow pod-to-pod communication without adding host header? is it possible to make that happen?

I think that it’s not possible. :slight_smile: But I’ve never tried that so maybe it is possible. :slight_smile:

I narrowed down the problem, pod to pod ip traffic goes through PassthroughCluster because pod ip isnt in istio proxy’s known endpoints, with PassthroughCluster istio sidecar acts as a TCP proxy and doesn’t interfere (bypasses) the connection so the TLS connection doesn’t happen.

i dont know how to make it work or if it is like that by design

Try setting up a headless service. Since 1.5.x this seems to add the ip addresses as legit endpoints and adds the host header for you.

1 Like

@ianmiell pod to service communication works, I am looking for pod to pod

my use case is to make prometheus scrape the metrics from a service pod which is running in STRICT mtls enabled namespace. Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.

other stateful applications like redis, kafka, elasticsearch also communicate to the other peer pods directly, I think those applications also cant work with istio due to this problem

1 Like

Yes, I know you meant pod-pod - we had to solve the same thing recently for a kafka-connect pod that needed to communicate podip<=>podip via STRICT mTLS through envoy :slight_smile:

I don’t fully understand why adding a headless service solves this problem, but from looking at the code, it seems that the headless service triggers the EDS to treat the IP addresses as legitimate endpoints, and adds the service in the hosts header.

1 Like

headless service gives 503:

/prometheus $ wget -SqO /dev/null http://httpbin-headless.default.svc.cluster.local:8000/
  HTTP/1.1 503 Service Unavailable
wget: server returned error: HTTP/1.1 503 Service Unavailable

works with direct service:

/prometheus $ wget -SqO /dev/null http://httpbin.default.svc.cluster.local:8000/
  HTTP/1.1 200 OK
  server: envoy
  date: Tue, 21 Apr 2020 18:00:18 GMT
  content-type: text/html; charset=utf-8
  content-length: 9593
  access-control-allow-origin: *
  access-control-allow-credentials: true
  x-envoy-upstream-service-time: 3
  connection: close

fails with pod (pod listens on 80):

/prometheus $ wget -SqO /dev/null http://172.17.0.8/
  HTTP/1.1 503 Service Unavailable
wget: server returned error: HTTP/1.1 503 Service Unavailable
1 Like

Do you know more detail or links here? thanks

Only really the headless service docs on kubernetes:

https://kubernetes.io/docs/concepts/services-networking/service/#headless-services

The side effect of making this problem go away isn’t documented anywhere we could find. There are hints in recent changes to the code, but frankly, I don’t think I understand those changes.

@deepak_deore, I think, we got it working on our side. But still will need to properly test everything.
So far we found a few “acceptable” ways to make the direct pod-to-pod calls through istio-proxy:

  1. Via a headless service (as was already pointed out).
  • Adding a headless Service in k8s results in an ORIGINAL_DST cluster in Envoy (LB will be: CLUSTER_PROVIDED). Making calls through that cluster should allow you to connect directly to pods.
  • You might need to provide a Host header, ie something like curl -vvv 10.178.180.3:8000 -H "HOST: httpbin-headless.default.svc.cluster.local:8000", where 10.178.180.3 is a Pod IP to which you are trying to connect. Host header will allow Envoy to pick the correct VirtualService/Route
  • You might also need to verify that your requests are flowing through the VirtualOutbound. In our scenario, we had to use the traffic.sidecar.istio.io/includeOutboundPorts annotation.
  1. Via a “subset” defined in a DestinationRule.
  • This approach is similar to #1, but works when you have a regular service defined (ie. not Headless) and you also want to enable direct calls for the pods backing that service.
  • Change the DestinationRule rule and define a subset that uses the PASSTHROUGH LB:
    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
    name: httpbin
    namespace: default
    spec:
    host: httpbin.default.svc.cluster.local
    subsets:
        - name: direct
        trafficPolicy:
            loadBalancer:
            simple: PASSTHROUGH
    ...
    
  • Modfy your VirtualService to pick the direct subset on some criteria:
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: httpbin
    namespace: default
    spec:
    hosts:
    - httpbin.default.svc.cluster.local
    http:
    - match:
        - headers:
            use-direct:
            exact: "true"
        route:
        - destination:
            host: httpbin.default.svc.cluster.local
            port:
            number: 8000
            subset: direct
    ...            
    
  • When that’s done, a new “direct” ORIGINAL_DST cluster will appear in envoy’s /config_dump. And you should be able to make direct calls with something like:
    curl -vvv 10.178.180.3:8000 -H "HOST: httpbin.default.svc.cluster.local:8000" -H "use-direct: true"
    

Our setup of Istio is a bit “non-standard” due to different reasons. But I would be curios to see if you can get this working in your environment.
A few things that may help you debugging your configs:

  • collect outputs from envoy’s /clusters and /stats admin endpoints and save them to “before” files
  • try to make curl/wget calls
  • collect the stats again and compare with the “before” versions (diff before after)

In the stats you will see where your calls are flowing to.
If they are ending up in the PassthroughCluster on the sending side, then probably a correct virtualservice/route wasn’t matched.
If your calls are ending up in the InboundPassthroughCluster on the receiving side, then something might be wrong with the listeners config.

1 Like

@npshenichnyy thanks for the reply, i mentioned in my later comment why i want pod-to-pod communication work over ip

Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.

Prometheus doesn’t and wont support adding Host header into the request which makes it difficult to call with host header.

I can exclude the target ips from strict mtls but trying to make everything under strict mtls.

The closest i got was with serviceentry and destinationrule but servicenetry’s host.addresses isnt working with a CIDR as mentioned in documentation, with single ip in servicenetry’s host.addresses works fine. I have raised a bug wich explains what I am saying here, there is no response so far to that bug.

A guess: does something in your app demand a certain host header?

503 is odd, but I didn’t dwell on it before: if memory serves we were getting 502s, not 503s

This wiki may help troubleshoot: https://github.com/istio/istio/wiki/Troubleshooting-Istio

create a headless service and add host header solves the same problem for me.
But still, when the ip address is the current pod, it doesn’t work.

My use case is, one of the pod will get external request (through regular service) and invalidates the cache, and then it will use the pod to pod communication to notify peer pods to invalidate their cache, I got the ip addresses of all pods from the endpoints of the headless service, which includes the pod which was chosen for the external request… and I don’t think there is a way to exclude itself from the loop, but good thing is, it doesn’t matter even if it failed… because it already finished the cache invalidation.

Using a headless service for the port in question fixed the problem for us. We are using raw TCP only with strictTLS and can establish connections. Example service def:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: MyApp
  ports:
    - name: tcp-my-port
      port: 3100

This whole thread could and should be converted into a bug. Can someone in this thread report an issue and report the final solution? That way this can be documented. Again I think headless service is the way to go, although I don’t understand all of the moving parts of this system and the mechanics they represent.