Headless TLS to Kafka (with Pod IP's)

OK, I’m desperate. I can’t get it to work. I have a Kafka cluster in a not injected namespace
in the same cluster. It’s using an operator so I don’t have a lot of control over how the
pod spec and service look like. Here is how istio (1.6.5) is installed:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: cps-istio
  namespace: istio-system
spec:
  components:
    cni:
      enabled: true
      namespace: kube-system
  values:
    cni:
      excludeNamespaces:
       - istio-system
       - kube-system
      logLevel: info
    meshConfig:
      outboundTrafficPolicy:
#        mode: REGISTRY_ONLY
        mode: ALLOW_ANY
  profile: demo

This is how the spec of the clusters pod looks like:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/podIP: 100.121.134.33/32
    prometheus.io/port: "9020"
    prometheus.io/scrape: "true"
  name: kafka-0-pjt47
spec:
 ...
    name: kafka
    ports:
    - containerPort: 9094
      name: tcp-external
      protocol: TCP
    - containerPort: 29092
      name: tcp-ssl
      protocol: TCP
    - containerPort: 9020
      name: metrics
      protocol: TCP
 status:
  phase: Running
  podIP: 100.121.134.33
  qosClass: Burstable

And the troublesome headless service:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kafka
   name: kafka-headless
  namespace: cloud-platform-workload
spec:
  clusterIP: None
  ports:
  - name: tcp-ssl
    port: 29092
    protocol: TCP
    targetPort: 29092
  - name: metrics
    port: 9020
    protocol: TCP
    targetPort: 9020
  selector:
    app: kafka
    kafka_cr: kafka
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Using istioctl gives the expected warnings, but I can’t change this as this is not managed by
me:

istioctl analyze

Info [IST0118] (Service kafka-headless.cloud-platform-workload) Port name metrics (port: 9020, targetPort: 9020) doesn't follow the naming convention of Istio port.

So, without the proxy the service (in another namespace) happily connects to Kafka over TLS!
but as soon as I inject the proxy I see this:

istioctl proxy-config cluster axsh

kafka-headless.cloud-platform-workload.svc.cluster.local                                   9020      -              outbound      ORIGINAL_DST
kafka-headless.cloud-platform-workload.svc.cluster.local                                   29092     -              outbound      ORIGINAL_DST

Shelling in the container gives me the headless info of the Kafka headless

nslookup kafka-headless.cloud-platform-workload.svc.cluster.local

Server:		100.64.0.10
Address:	100.64.0.10#53

Name:	kafka-headless.cloud-platform-workload.svc.cluster.local
Address: 100.121.134.33
Name:	kafka-headless.cloud-platform-workload.svc.cluster.local
Address: 100.108.9.188
Name:	kafka-headless.cloud-platform-workload.svc.cluster.local
Address: 100.119.65.217

Looking at the listeners we get the IP’s listed and if you look at he cluster info
it’s listed as type ORIGINAL_DST

istioctl proxy-config listener axsh

100.121.134.33     29092     TCP
100.108.9.188      29092     TCP
100.119.65.217     29092     TCP

In the logs of the service I see TLS trouble:

[kafka-admin-client-thread | adminclient-1] WARN org.apache.kafka.clients.NetworkClient - [AdminClient clientId=adminclient-1] Connection to node -1 (kafka-headless.cloud-platform-workload.svc/100.121.134.33:29092) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            

And the proxy gives this:

[2020-07-11T13:48:29.045Z] "- - -" 0 UF,URX "-" "-" 0 0 5 - "-" "-" "-" "-" "100.121.134.33:29092" outbound|29092||kafka-headless.cloud-platform-workload.svc.cluster.local - 100.121.134.33:29092 100.108.9.147:55432 - -

I tried a lot of thing, adding ServiceEntries, VS, DR, etc… I see also a lot of articles but nothing works. I think
my main trouble is the the headless services give an IP (and that Pod’s don’t get service entries).

But with having that in mind (the no Pod DNS entries), what can I try next? For my first try, I just want to make it work… after that, I want to move the TLS wrapping to envoy.

We are facing similar issue where our kafka cluster is not a part of istio mesh and producers and consumers are part of istio-mesh. It was working totally fine when we were using only plaintext protocol for kafka broker communication. Once we have enabled SASL listener things stopped working after that change. In logs I found similar thing saying that client could not fetch metadata from the broker and could not connect to broker.

Let me know if you could find a solution as well.

I have tried passing this annotations to my producers/consumers and that helped me to exclude envoy proxy. traffic.sidecar.istio.io/excludeOutboundPorts:29092

I have met the similar issue, I summary what I found when automtls is enabled in https://github.com/istio/istio/issues/24082