503 between pod to pod communication (1.5.1)

deepak_deore · April 14, 2020, 3:28pm

there are 2 namespaces (source and target) with STRICT mtls

200 from source namespace pod to target service

curl -s -o /dev/null -w "%{http_code}" alertmanager-operated.target.svc.cluster.local:9093; echo
200

but 503 error when source pod accesses target pod directly

curl -s -o /dev/null -w "%{http_code}" 172.17.0.13:9093; echo
503

is this really an issue and how to fix this?

Tomas_Kohout · April 15, 2020, 5:55am

Could you try the curl with host header? I’m not sure if this will help tho. I thought that proxy know about all endpoints.

deepak_deore · April 15, 2020, 8:08am

@Tomas_Kohout
yes, it works with Host header !

curl -s -o /dev/null -w "%{http_code}" -H "Host: alertmanager-operated.target.svc.cluster.local" 172.17.0.11:9093
200

but this doesnt solve my problem, i didn’t provide more info in the description, basically i want prometheus (with istio sidecar + STRICT mtls) talk to the application pod (with istio sidecar + STRICT mtls), prometheus directly talks to the pod ips discovered from the k8 endpoints, there is no way to provide Host header with prometheus

i am trying to make “pod to pod” communication work with istio over STRICT mtls, theoretically should it work because the prometheus istio proxy will intercept the traffic and make mtls connection to application’s istio proxy?

there is a solution to not to use istio with prometheus but i am trying to make everything mtls including prometheus and its other components.

Tomas_Kohout · April 15, 2020, 8:42am

Check out this BYO Prometheus with mTLS

Tomas_Kohout · April 15, 2020, 8:59am

Or even better https://github.com/istio/istio/issues/7352#issuecomment-439617432

deepak_deore · April 15, 2020, 9:08am

well, i saw both the links during troubleshooting, all that is needed is not to use istio-proxy with prometheus and then use tlsconfig to scrape the targets but I am trying to make it work when prometheus has istio sidecar enabled so there isnt any need to use tlsconfig.

do you know why istio doesnt allow pod-to-pod communication without adding host header? is it possible to make that happen?

Tomas_Kohout · April 15, 2020, 9:16am

I think that it’s not possible. But I’ve never tried that so maybe it is possible.

deepak_deore · April 16, 2020, 9:39am

I narrowed down the problem, pod to pod ip traffic goes through PassthroughCluster because pod ip isnt in istio proxy’s known endpoints, with PassthroughCluster istio sidecar acts as a TCP proxy and doesn’t interfere (bypasses) the connection so the TLS connection doesn’t happen.

i dont know how to make it work or if it is like that by design

ianmiell · April 20, 2020, 1:57pm

Try setting up a headless service. Since 1.5.x this seems to add the ip addresses as legit endpoints and adds the host header for you.

deepak_deore · April 20, 2020, 5:13pm

@ianmiell pod to service communication works, I am looking for pod to pod

my use case is to make prometheus scrape the metrics from a service pod which is running in STRICT mtls enabled namespace. Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.

other stateful applications like redis, kafka, elasticsearch also communicate to the other peer pods directly, I think those applications also cant work with istio due to this problem

ianmiell · April 21, 2020, 3:51pm

Yes, I know you meant pod-pod - we had to solve the same thing recently for a kafka-connect pod that needed to communicate podip<=>podip via STRICT mTLS through envoy

I don’t fully understand why adding a headless service solves this problem, but from looking at the code, it seems that the headless service triggers the EDS to treat the IP addresses as legitimate endpoints, and adds the service in the hosts header.

deepak_deore · April 21, 2020, 6:02pm

headless service gives 503:

/prometheus $ wget -SqO /dev/null http://httpbin-headless.default.svc.cluster.local:8000/
  HTTP/1.1 503 Service Unavailable
wget: server returned error: HTTP/1.1 503 Service Unavailable

works with direct service:

/prometheus $ wget -SqO /dev/null http://httpbin.default.svc.cluster.local:8000/
  HTTP/1.1 200 OK
  server: envoy
  date: Tue, 21 Apr 2020 18:00:18 GMT
  content-type: text/html; charset=utf-8
  content-length: 9593
  access-control-allow-origin: *
  access-control-allow-credentials: true
  x-envoy-upstream-service-time: 3
  connection: close

fails with pod (pod listens on 80):

/prometheus $ wget -SqO /dev/null http://172.17.0.8/
  HTTP/1.1 503 Service Unavailable
wget: server returned error: HTTP/1.1 503 Service Unavailable

111 · April 23, 2020, 10:29am

Do you know more detail or links here? thanks

ianmiell · April 23, 2020, 11:57am

Only really the headless service docs on kubernetes:

https://kubernetes.io/docs/concepts/services-networking/service/#headless-services

The side effect of making this problem go away isn’t documented anywhere we could find. There are hints in recent changes to the code, but frankly, I don’t think I understand those changes.

npshenichnyy · April 24, 2020, 11:31pm

@deepak_deore, I think, we got it working on our side. But still will need to properly test everything.
So far we found a few “acceptable” ways to make the direct pod-to-pod calls through istio-proxy:

Via a headless service (as was already pointed out).

Adding a headless Service in k8s results in an ORIGINAL_DST cluster in Envoy (LB will be: CLUSTER_PROVIDED). Making calls through that cluster should allow you to connect directly to pods.
You might need to provide a Host header, ie something like curl -vvv 10.178.180.3:8000 -H "HOST: httpbin-headless.default.svc.cluster.local:8000", where 10.178.180.3 is a Pod IP to which you are trying to connect. Host header will allow Envoy to pick the correct VirtualService/Route
You might also need to verify that your requests are flowing through the VirtualOutbound. In our scenario, we had to use the traffic.sidecar.istio.io/includeOutboundPorts annotation.

Via a “subset” defined in a DestinationRule.

This approach is similar to #1, but works when you have a regular service defined (ie. not Headless) and you also want to enable direct calls for the pods backing that service.

Change the DestinationRule rule and define a subset that uses the PASSTHROUGH LB:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
namespace: default
spec:
host: httpbin.default.svc.cluster.local
subsets:
    - name: direct
    trafficPolicy:
        loadBalancer:
        simple: PASSTHROUGH
...

Modfy your VirtualService to pick the direct subset on some criteria:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: default
spec:
hosts:
- httpbin.default.svc.cluster.local
http:
- match:
    - headers:
        use-direct:
        exact: "true"
    route:
    - destination:
        host: httpbin.default.svc.cluster.local
        port:
        number: 8000
        subset: direct
...

When that’s done, a new “direct” ORIGINAL_DST cluster will appear in envoy’s /config_dump. And you should be able to make direct calls with something like:
```
curl -vvv 10.178.180.3:8000 -H "HOST: httpbin.default.svc.cluster.local:8000" -H "use-direct: true"
```

Our setup of Istio is a bit “non-standard” due to different reasons. But I would be curios to see if you can get this working in your environment.
A few things that may help you debugging your configs:

collect outputs from envoy’s /clusters and /stats admin endpoints and save them to “before” files
try to make curl/wget calls
collect the stats again and compare with the “before” versions (diff before after)

In the stats you will see where your calls are flowing to.
If they are ending up in the PassthroughCluster on the sending side, then probably a correct virtualservice/route wasn’t matched.
If your calls are ending up in the InboundPassthroughCluster on the receiving side, then something might be wrong with the listeners config.

deepak_deore · April 25, 2020, 5:02am

@npshenichnyy thanks for the reply, i mentioned in my later comment why i want pod-to-pod communication work over ip

Prometheus gets the pod’s ips from endpoints and scrapes the pods directly and doesn’t go via service which makes sense because it wants to scrape all the pods behind a service.

Prometheus doesn’t and wont support adding Host header into the request which makes it difficult to call with host header.

I can exclude the target ips from strict mtls but trying to make everything under strict mtls.

The closest i got was with serviceentry and destinationrule but servicenetry’s host.addresses isnt working with a CIDR as mentioned in documentation, with single ip in servicenetry’s host.addresses works fine. I have raised a bug wich explains what I am saying here, there is no response so far to that bug.

ianmiell · April 28, 2020, 7:39am

A guess: does something in your app demand a certain host header?

503 is odd, but I didn’t dwell on it before: if memory serves we were getting 502s, not 503s

This wiki may help troubleshoot: https://github.com/istio/istio/wiki/Troubleshooting-Istio

Ming_Zhu · July 31, 2020, 5:31pm

create a headless service and add host header solves the same problem for me.
But still, when the ip address is the current pod, it doesn’t work.

My use case is, one of the pod will get external request (through regular service) and invalidates the cache, and then it will use the pod to pod communication to notify peer pods to invalidate their cache, I got the ip addresses of all pods from the endpoints of the headless service, which includes the pod which was chosen for the external request… and I don’t think there is a way to exclude itself from the loop, but good thing is, it doesn’t matter even if it failed… because it already finished the cache invalidation.

Ryan_Harley · October 7, 2020, 3:41am

Using a headless service for the port in question fixed the problem for us. We are using raw TCP only with strictTLS and can establish connections. Example service def:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: MyApp
  ports:
    - name: tcp-my-port
      port: 3100

sdake · November 8, 2020, 6:04pm

This whole thread could and should be converted into a bug. Can someone in this thread report an issue and report the final solution? That way this can be documented. Again I think headless service is the way to go, although I don’t understand all of the moving parts of this system and the mechanics they represent.

Topic		Replies	Views
BYO Prometheus with mTLS Policies and Telemetry	18	6332	December 21, 2020
Istio-mTLS and POD IP, Port Networking	5	3320	July 24, 2020
Previous istio proxyv2:1.7.5 is used with Prometheus with istio 1.8.x Policies and Telemetry	0	534	January 28, 2021
Istio pod to pod mTLS Security	3	3023	July 2, 2020
Scrape mtls enabled pod metrics from external non-mtls prometheus in istio 1.5	2	1257	April 9, 2020

503 between pod to pod communication (1.5.1)

Related topics