Help Debugging a VirtualService (Envoy URX)


I just installed Argo CD in a cluster with Istio installed via Helm (I installed the demo profile without auth), I’m using the default ingress gateway in the istio-system namespace with VirtualServices in each namespace that needs external access, the Argo service is defined in the following way (please note that I changed the host to a generic one):

kind: VirtualService
  name: argo-virtual-service
  namespace: argo
  - ""
  - public-gateway.istio-system.svc.cluster.local
  - match:
    - uri:
        prefix: /argo
      uri: /
    - destination:
          number: 80
        host: argo-ui.argo.svc.cluster.local

I can access the service within the cluster in namespaces with and without istio sidecars, however, when accessing from outside the cluster I’m getting “upstream connect error or disconnect/reset before headers. reset reason: connection failure” and Mixer logs are showing the following:

    "level": "info",
    "time": "2019-04-26T13:25:22.499226Z",
    "instance": "accesslog.logentry.istio-system",
    "apiClaims": "",
    "apiKey": "",
    "clientTraceId": "",
    "connection_security_policy": "unknown",
    "destinationApp": "argo-ui",
    "destinationIp": "",
    "destinationName": "argo-ui-588d8d898f-47hrm",
    "destinationNamespace": "argo",
    "destinationOwner": "kubernetes://apis/apps/v1/namespaces/argo/deployments/argo-ui",
    "destinationPrincipal": "",
    "destinationServiceHost": "argo-ui.argo.svc.cluster.local",
    "destinationWorkload": "argo-ui",
    "grpcMessage": "",
    "grpcStatus": "",
    "httpAuthority": "",
    "latency": "29.746198ms",
    "method": "GET",
    "permissiveResponseCode": "none",
    "permissiveResponsePolicyID": "none",
    "protocol": "https",
    "receivedBytes": 659,
    "referer": "",
    "reporter": "source",
    "requestId": "1ba1ed13-fe5f-9db9-b559-261508e96861",
    "requestSize": 0,
    "requestedServerName": "",
    "responseCode": 503,
    "responseFlags": "UF,URX",
    "responseSize": 91,
    "responseTimestamp": "2019-04-26T13:25:22.528772Z",
    "sentBytes": 189,
    "sourceApp": "istio-ingressgateway",
    "sourceIp": "",
    "sourceName": "istio-ingressgateway-6599dd7679-bpll8",
    "sourceNamespace": "istio-system",
    "sourceOwner": "kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-ingressgateway",
    "sourcePrincipal": "",
    "sourceWorkload": "istio-ingressgateway",
    "url": "/argo/",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
    "xForwardedFor": ""

What worries me is that Envoy is saying that that upstream is unhealthy (UF, URX), however, I can access the pods normally via the service in all namespaces and via port forwarding, also, I already have other services in other namespaces working correctly via this setup (ingress in istio-system -> virtualservice in service namespace)

I’m really new to Istio and I was wondering if anyone has any insights onto what might be going on and how it can be fixed


Has the auto-injection been enabled in this specific namespace?

Yes, auto-injection is enabled manually in all namespaces.

Could you post all the corresponding yaml’s? (ingress, deployment, service, gateway)

Sure, I had to host them here because Discourse didn’t let me post them all.

The Argo yaml is autogenerated and I only copied it to host it in my repo.

Thanks for the help!

Experiencing same issue with istio setup, but w/o VirtualService.
Have cluster in EU and cluster in US, GKE, Single-control-plane.

I’ve set httpbin in eu, and call it from us, which works fine.
However when i call another service it fails with similar error.

The gateways field in your VirtualService is wrong. Instead of a FQDN, it’s supposed to be the gateway’s namespace/name.

  - istio-system/public-gateway

I’ve always used FQDN notation and it works perfectly. I found that notation here

In my case it ended up being an incorrectly configured livenessProbe in the Argo generated files

The syntax has changed to use /. This is clearly documented now.

FQDN was previously allowed but not documented. Looks like it’s still supported by this hack:
but, I would recommend using the proper documented syntax now.