DNS Unusable in Multicluster Installation


We’ve set up three clusters in our environment (these are GKE clusters): istio (contains the control plane installation), remote-1, remote-2. Both remote-1 and remote-2 contain application deployments that must communicate with each other.

Test and staging are configured as remotes in the multicluster configuration and appear to be properly synchronizing service definitions with the proxy configuration available in the istio cluster. Using istioctl proxy-config clusters istio-ingressgateway-xxxx I can see that the services in the remotes have outbound rules set up.

However, when I attempt to resolve someservice.remote-1 from any pod in the remote-2 cluster, it fails. I took this to mean that DNS is not automatically propagated to remotes from the Istio control plane. I have found a couple of Google Groups messages regarding this from a few months ago, but the official documentation makes no mention of how remotes actually resolve services between clusters. Can somebody clarify what may be going wrong in this instance?



The dns across the clusters has to be explicitly set up because by default the dns is resolved by kube-dns and kube-dns running in a k8s cluster can only resolve what’s deployed in that cluster and hence the failed result you saw.

You can use isto coredns plugin and set up dns lookup for cross cluster domain names to be able to achieve what you mentioned above.

Here is the core dns plugin: https://github.com/istio-ecosystem/istio-coredns-plugin
Here is an example of how it can be used (different multi-cluster example): https://github.com/rshriram/istio_federation_demo



Thanks for the reply! I’m a bit confused by the explanation here however. It seems that while kube-dns is the primary DNS delegate for pods within the cluster, the spec seems to allow for customization in this regard: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-config

This method seems to be alluded to by this issue: https://github.com/istio/istio/issues/10971

I think fundamentally, there is a failure to communicate effectively how the process of cross-cluster service resolution works.


The new setup task attempts to describe how resolution works - does it help explain what’s happening? (If not, how should we improve it?)