Unable to resolve long SRV DNS entries when mTLS is enabled

Hi Istio Community,

I’m running into an issue wherein if I have mTLS enabled, my pods are unable to resolve long SRV DNS entries. I’m hoping you can point me in the right direction and let me know whether (1) I’m being dumb and missing something obvious, (2) found a bug in Istio that I should report, or (3) found a bug with Envoy that I should report to that team.

If I have a pod with sidecar injected in a non-mTLS-enabled cluster, it can resolve long SRV records against the kube-system DNS servers like so:

/ # nslookup -type=srv istio-example.reecemath.com
;; Truncated, retrying in TCP mode.
Server:             10.96.0.10
Address:    10.96.0.10#53

Non-authoritative answer:
istio-example.reecemath.com service = 1 10 5000 istio-example1.reecemath.com.
istio-example.reecemath.com service = 10 10 5000 istio-example10.reecemath.com.
istio-example.reecemath.com service = 11 10 5000 istio-example11.reecemath.com.
....

The same goes for pods without sidecars injected; they can resolve long SRV records just fine. If I enable mTLS for my cluster and redeploy my pod, though, and try to resolve long SRV records, I get get timeouts.

/ # nslookup -type=srv istio-example.reecemath.com
;; Truncated, retrying in TCP mode.
;; communications error to 10.96.0.10#53: connection reset

These pods that are failing to resolve the above SRV record are able to resolve google.com’s A record, shorter SRV records (like istio-example2.reecemath.com), and longer CNAME records (like api.netflix.com). It seems like it’s just long SRV records that they can’t resolve, and only when they have an mTLS-enabled sidecar.

I verified that this is an issue in Istio 1.4.0, and 1.3.4, as well. I verified that this is an issue in docker-desktop’s K8s deployment, as well as in GCP. In GCP I tried with K8s versions 1.13 and 1.14; same issue.

Note that in my first example, where mTLS was disabled, I still saw the “Truncated, retrying in TCP mode” line, but then that succeeded. Maybe my pods can’t hit the DNS service on TCP, but only UDP? What could I do to open up TCP to kube-system’s DNS service when mTLS is enabled?

I appreciate the help in advance; thanks!

This seems definitely related to https://github.com/istio/istio/issues/11658

After applying a destination rule and service entry I can resolve DNS records with TCP. I’m still concerned, though, since the above issue was “resolved”.

---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: dns
  namespace: kube-system
spec:
  hosts:
  - "kube-dns.kube-system.svc.cluster.local"
  ports:
  - number: 53
    name: tcp
    protocol: TCP
  resolution: NONE
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: disable-mtls-to-dns
  namespace: kube-system
spec:
  host: "kube-dns.kube-system.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE