Getting 503 for multi cluster replicated control plane

Hi,

I am using this doc https://istio.io/latest/docs/setup/install/multicluster/gateways/ to setup multicluster. I am getting 503 errors on calling htttpbin global URL from sleep microservice. I am using two AWS EKS clusters.

Istio.yaml

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  addonComponents:
    istiocoredns:
      enabled: true
    kiali:
      enabled: true
    tracing:
      enabled: true

  components:
    egressGateways:
      - name: istio-egressgateway
        enabled: true
    telemetry:
        enabled: true
    citadel:
        enabled: true
  meshConfig:
    accessLogFile: "/dev/stdout"
    accessLogEncoding: JSON
  values:
    global:
      podDNSSearchNamespaces:
        - global
      multiCluster:
        enabled: true
      controlPlaneSecurityEnabled: true
      proxy:
        privileged: true
    gateways:
      istio-egressgateway:
        env:
          ISTIO_META_REQUESTED_NETWORK_VIEW: "external"

Service entry:

kubectl apply -n foo -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: httpbin-bar
spec:
  hosts:
  - httpbin.bar.global
  location: MESH_INTERNAL
  ports:
  - name: http1
    number: 8000
    protocol: http
  resolution: DNS
  addresses:
  - 240.0.0.5
  endpoints:
  - address: <load-balancer-url>
    network: external
    ports:
      http1: 15443
  - address: istio-egressgateway.istio-system.svc.cluster.local
    ports:
      http1: 15443
EOF

On calling sleep service:

❯ kubectl exec -it sleep-6bdb595bcb-ssb58 -c sleep -n foo -- curl -I httpbin.bar.global:8000/headers
HTTP/1.1 503 Service Unavailable
content-length: 91
content-type: text/plain
date: Wed, 01 Jul 2020 20:37:19 GMT
server: envoy

On checking access logs of sleep pods:

{"upstream_cluster":"outbound|8000||httpbin.bar.global","downstream_remote_address":"192.168.96.232:42178","authority":"httpbin.bar.global:8000","path":"/headers","protocol":"HTTP/1.1","upstream_service_time":"-","upstream_local_address":"-","duration":"11","upstream_transport_failure_reason":"-","route_name":"default","downstream_local_address":"240.0.0.5:8000","user_agent":"curl/7.69.1","response_code":"503","response_flags":"UF,URX","start_time":"2020-07-01T20:32:19.419Z","method":"HEAD","request_id":"6084edf4-a3fd-41ce-8b2b-04a28f6e3cb9","upstream_host":"10.100.46.111:15443","x_forwarded_for":"-","requested_server_name":"-","bytes_received":"0","istio_policy_status":"-","bytes_sent":"0"}

I didn’t get any corresponding logs in egress pods. I am assuming that the egress gateway is not working properly. But I am not sure how to debug this further. Need help on this.

Thanks

Can anyone help me with this?

try checking the logs of envoy of the ingressgateway in istio-system of the cluster which has httpbin service in it , when you curl from the first cluster , there you can see weather request it going to the other cluster or not.

assuming you have done all the steps while setting it up, esp. the dns stubbing part.

Thanks for the reply.

  1. When I am doing:
resolution: STATIC
- address: <remote_cluster_ingress_ip>
    network: external
    ports:
      http1: 15443
  - address: <cluster_ip>
    ports:
      http1: 15443

This seems to be working. I checked the sleep service envoy logs. The upstream host is remote cluster ingress ip. Then I assuming it is not going through egress gateway. But, I am not seeing any logs on the ingress side of the remote cluster.

  1. When I am doing:
resolution: DNS
- address: <load-balancer-url>
    network: external
    ports:
      http1: 15443
  - address: istio-egressgateway.istio-system.svc.cluster.local
    ports:
      http1: 15443

I am getting 503. In this case, the upstream host in sleep service envoy logs is showing egress cluster IP.

Not sure why after changing the resolution to DNS, the egress is not working.

I have update coredns config.