Headless TLS to Kafka (with Pod IP's)

OK, I’m desperate. I can’t get it to work. I have a Kafka cluster in a not injected namespace
in the same cluster. It’s using an operator so I don’t have a lot of control over how the
pod spec and service look like. Here is how istio (1.6.5) is installed:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
  name: cps-istio
  namespace: istio-system
      enabled: true
      namespace: kube-system
       - istio-system
       - kube-system
      logLevel: info
#        mode: REGISTRY_ONLY
        mode: ALLOW_ANY
  profile: demo

This is how the spec of the clusters pod looks like:

apiVersion: v1
kind: Pod
    prometheus.io/port: "9020"
    prometheus.io/scrape: "true"
  name: kafka-0-pjt47
    name: kafka
    - containerPort: 9094
      name: tcp-external
      protocol: TCP
    - containerPort: 29092
      name: tcp-ssl
      protocol: TCP
    - containerPort: 9020
      name: metrics
      protocol: TCP
  phase: Running
  qosClass: Burstable

And the troublesome headless service:

apiVersion: v1
kind: Service
    app: kafka
   name: kafka-headless
  namespace: cloud-platform-workload
  clusterIP: None
  - name: tcp-ssl
    port: 29092
    protocol: TCP
    targetPort: 29092
  - name: metrics
    port: 9020
    protocol: TCP
    targetPort: 9020
    app: kafka
    kafka_cr: kafka
  sessionAffinity: None
  type: ClusterIP
  loadBalancer: {}

Using istioctl gives the expected warnings, but I can’t change this as this is not managed by

istioctl analyze

Info [IST0118] (Service kafka-headless.cloud-platform-workload) Port name metrics (port: 9020, targetPort: 9020) doesn't follow the naming convention of Istio port.

So, without the proxy the service (in another namespace) happily connects to Kafka over TLS!
but as soon as I inject the proxy I see this:

istioctl proxy-config cluster axsh

kafka-headless.cloud-platform-workload.svc.cluster.local                                   9020      -              outbound      ORIGINAL_DST
kafka-headless.cloud-platform-workload.svc.cluster.local                                   29092     -              outbound      ORIGINAL_DST

Shelling in the container gives me the headless info of the Kafka headless

nslookup kafka-headless.cloud-platform-workload.svc.cluster.local


Name:	kafka-headless.cloud-platform-workload.svc.cluster.local
Name:	kafka-headless.cloud-platform-workload.svc.cluster.local
Name:	kafka-headless.cloud-platform-workload.svc.cluster.local

Looking at the listeners we get the IP’s listed and if you look at he cluster info
it’s listed as type ORIGINAL_DST

istioctl proxy-config listener axsh     29092     TCP      29092     TCP     29092     TCP

In the logs of the service I see TLS trouble:

[kafka-admin-client-thread | adminclient-1] WARN org.apache.kafka.clients.NetworkClient - [AdminClient clientId=adminclient-1] Connection to node -1 (kafka-headless.cloud-platform-workload.svc/ terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            

And the proxy gives this:

[2020-07-11T13:48:29.045Z] "- - -" 0 UF,URX "-" "-" 0 0 5 - "-" "-" "-" "-" "" outbound|29092||kafka-headless.cloud-platform-workload.svc.cluster.local - - -

I tried a lot of thing, adding ServiceEntries, VS, DR, etc… I see also a lot of articles but nothing works. I think
my main trouble is the the headless services give an IP (and that Pod’s don’t get service entries).

But with having that in mind (the no Pod DNS entries), what can I try next? For my first try, I just want to make it work… after that, I want to move the TLS wrapping to envoy.

We are facing similar issue where our kafka cluster is not a part of istio mesh and producers and consumers are part of istio-mesh. It was working totally fine when we were using only plaintext protocol for kafka broker communication. Once we have enabled SASL listener things stopped working after that change. In logs I found similar thing saying that client could not fetch metadata from the broker and could not connect to broker.

Let me know if you could find a solution as well.

I have tried passing this annotations to my producers/consumers and that helped me to exclude envoy proxy. traffic.sidecar.istio.io/excludeOutboundPorts:29092