I’m having an issue setting up Istio 1.11.4 as a Multicluster/Single Network deployment between two OKD (OpenShift) clusters (alpha and beta). The installation seems to go smoothly, but when validating the installation, there is a timeout
I’m using istio-csr to provide the common trust and certificates between the two clusters.
Deployment
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
spec:
profile: openshift
hub: gcr.io/istio-release
values:
global:
caAddress: cert-manager-istio-csr.cert-manager.svc:443
meshID: cyber
multiCluster:
clusterName: alpha
network: cyber
components:
pilot:
k8s:
env:
# Disable istiod CA Sever functionality
- name: ENABLE_CA_SERVER
value: "false"
overlays:
- apiVersion: apps/v1
kind: Deployment
name: istiod
patches:
# Mount istiod serving and webhook certificate from Secret mount
- path: spec.template.spec.containers.[name:discovery].args[-1]
value: "--tlsCertFile=/etc/cert-manager/tls/tls.crt"
- path: spec.template.spec.containers.[name:discovery].args[-1]
value: "--tlsKeyFile=/etc/cert-manager/tls/tls.key"
- path: spec.template.spec.containers.[name:discovery].args[-1]
value: "--caCertFile=/etc/cert-manager/ca/root-cert.pem"
- path: spec.template.spec.containers.[name:discovery].volumeMounts[-1]
value:
name: cert-manager
mountPath: "/etc/cert-manager/tls"
readOnly: true
- path: spec.template.spec.containers.[name:discovery].volumeMounts[-1]
value:
name: ca-root-cert
mountPath: "/etc/cert-manager/ca"
readOnly: true
- path: spec.template.spec.volumes[-1]
value:
name: cert-manager
secret:
secretName: istiod-tls
- path: spec.template.spec.volumes[-1]
value:
name: ca-root-cert
configMap:
defaultMode: 420
name: istio-ca-root-cert
Using the sample app, I can see both endpoints:
$ istioctl proxy-status | grep sample
helloworld-v1-776f57d5f6-cgcc9.sample SYNCED SYNCED SYNCED SYNCED istiod-5cf45d4fbc-7tvwr 1.11.4
sleep-557747455f-6464g.sample SYNCED SYNCED SYNCED SYNCED istiod-5cf45d4fbc-7tvwr 1.11.4
$ istioctl proxy-config ep sleep-557747455f-6464g.sample | grep helloworld
10.128.4.117:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local
10.135.0.254:5000 HEALTHY OK outbound|5000||helloworld.sample.svc.cluster.local
Those endpoints are directly reachable, but I get a long delay and only results from the same cluster:
kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello
Hello version: v1, instance: helloworld-v1-776f57d5f6-cgcc9
$ kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS 10.128.4.117:5000/hello
Hello version: v1, instance: helloworld-v1-776f57d5f6-cgcc9
$ kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS 10.135.0.254:5000/hello
Hello version: v2, instance: helloworld-v2-54df5f84b-6d65k
I’m not seeing any errors in any of the logs (istiod, istiogateway, istio-proxy). The only indication I have of an error comes from the sleep container:
$ kubectl --context "${CTX_CLUSTER1}" -n sample exec "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -s localhost:15000/clusters | grep helloworld | grep cx_
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_active::1
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_connect_fail::0
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_total::1
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_active::0
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_connect_fail::2
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_total::2
I’m new to Istio, so I’ve pretty well exhausted my knowledge of what to do. I had setup a cluster as a multi-network, and I didn’t have any issues with that one. Not sure where I’ve gone wrong here.