Multicluster/Single Network Service Timeout

NullOranje · October 20, 2021, 5:39pm

I’m having an issue setting up Istio 1.11.4 as a Multicluster/Single Network deployment between two OKD (OpenShift) clusters (alpha and beta). The installation seems to go smoothly, but when validating the installation, there is a timeout

I’m using istio-csr to provide the common trust and certificates between the two clusters.

Deployment

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
spec:
  profile: openshift
  hub: gcr.io/istio-release
  values:
    global:
      caAddress: cert-manager-istio-csr.cert-manager.svc:443
      meshID: cyber
      multiCluster:
        clusterName: alpha
      network: cyber
  components:
    pilot:
      k8s:
        env:
          # Disable istiod CA Sever functionality
        - name: ENABLE_CA_SERVER
          value: "false"
        overlays:
        - apiVersion: apps/v1
          kind: Deployment
          name: istiod
          patches:
            # Mount istiod serving and webhook certificate from Secret mount
          - path: spec.template.spec.containers.[name:discovery].args[-1]
            value: "--tlsCertFile=/etc/cert-manager/tls/tls.crt"
          - path: spec.template.spec.containers.[name:discovery].args[-1]
            value: "--tlsKeyFile=/etc/cert-manager/tls/tls.key"
          - path: spec.template.spec.containers.[name:discovery].args[-1]
            value: "--caCertFile=/etc/cert-manager/ca/root-cert.pem"
          - path: spec.template.spec.containers.[name:discovery].volumeMounts[-1]
            value:
              name: cert-manager
              mountPath: "/etc/cert-manager/tls"
              readOnly: true
          - path: spec.template.spec.containers.[name:discovery].volumeMounts[-1]
            value:
              name: ca-root-cert
              mountPath: "/etc/cert-manager/ca"
              readOnly: true
          - path: spec.template.spec.volumes[-1]
            value:
              name: cert-manager
              secret:
                secretName: istiod-tls
          - path: spec.template.spec.volumes[-1]
            value:
              name: ca-root-cert
              configMap:
                defaultMode: 420
                name: istio-ca-root-cert

Using the sample app, I can see both endpoints:

$ istioctl proxy-status | grep sample
helloworld-v1-776f57d5f6-cgcc9.sample                  SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf45d4fbc-7tvwr     1.11.4
sleep-557747455f-6464g.sample                          SYNCED     SYNCED     SYNCED     SYNCED       istiod-5cf45d4fbc-7tvwr     1.11.4

$ istioctl proxy-config ep sleep-557747455f-6464g.sample | grep helloworld
10.128.4.117:5000                HEALTHY     OK                outbound|5000||helloworld.sample.svc.cluster.local
10.135.0.254:5000                HEALTHY     OK                outbound|5000||helloworld.sample.svc.cluster.local

Those endpoints are directly reachable, but I get a long delay and only results from the same cluster:

kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- curl -sS helloworld.sample:5000/hello
Hello version: v1, instance: helloworld-v1-776f57d5f6-cgcc9

$ kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- curl -sS 10.128.4.117:5000/hello
Hello version: v1, instance: helloworld-v1-776f57d5f6-cgcc9

$ kubectl exec --context="${CTX_CLUSTER1}" -n sample -c sleep     "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l \
    app=sleep -o jsonpath='{.items[0].metadata.name}')"     -- curl -sS 10.135.0.254:5000/hello
Hello version: v2, instance: helloworld-v2-54df5f84b-6d65k

I’m not seeing any errors in any of the logs (istiod, istiogateway, istio-proxy). The only indication I have of an error comes from the sleep container:

$ kubectl --context "${CTX_CLUSTER1}" -n sample exec "$(kubectl get pod --context="${CTX_CLUSTER1}" -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -s localhost:15000/clusters | grep helloworld | grep cx_
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_active::1
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_connect_fail::0
outbound|5000||helloworld.sample.svc.cluster.local::10.128.4.117:5000::cx_total::1
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_active::0
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_connect_fail::2
outbound|5000||helloworld.sample.svc.cluster.local::10.135.0.254:5000::cx_total::2

I’m new to Istio, so I’ve pretty well exhausted my knowledge of what to do. I had setup a cluster as a multi-network, and I didn’t have any issues with that one. Not sure where I’ve gone wrong here.

Topic		Replies	Views
Install Multi-Primary on different networks for OpenShift	1	2100	July 8, 2021
ISTIO Multi Cluster on two different network	5	365	August 31, 2023
Shared gateways multicluster example does not work Networking	0	440	November 8, 2019
Install Multi-Primary on different networks, TLS error	7	1490	March 28, 2021
Multicluster setup error: "validationController Not ready to switch validation to fail-enclosed" "failed calling webhook 'rev.validation.istio.io'" Networking	0	769	August 21, 2023

Multicluster/Single Network Service Timeout

Related topics