Kiali reports red health status for grpc-web

Hello the Community.

I’m testing deployment with istio 1.7.3 on digital ocean k8s cluster.
I have http service with web application and number of grpc services proxyng the API to the web application with grpc-web.

Thanks for istio - everything is working almost as on local development with microk8s.
But on public cluster I see the health status of grpc services is Failure.

I have not found failed requests in all logs i’ve found, and not in traces, not in Graphana, and not in browser network tab.

At same time the status of the web application is reasonable.
We have number of 404 routes, which are not implemented, and Kiali status is Yellow, and failed requests can be easily found in traces, visible in Graphana, etc.

I installed the dashboard just according to documentation, as we are only testing now, and run it with

istioctl dashboard kiali

as well as for other dashboards.

For reference the configutation:
Gateway:

kind: Gateway
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: environment-common
  namespace: default
spec:
  servers:
    - hosts:
        - '*'
      port:
        name: http
        number: 80
        protocol: HTTP2
      tls:
        httpsRedirect: true
    - hosts:
        - '*'
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        credentialName: ingress-cert
        mode: SIMPLE
  selector:
    istio: ingressgateway

Service:

apiVersion: v1
kind: Service
metadata:
  name: interests
  labels:
    app: interests
    helm.sh/chart: interests-0.0.1
    app: interests
    app.kubernetes.io/name: interests
    app.kubernetes.io/instance: interests
    app.kubernetes.io/version: "latest"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  ports:
    - port: 9090
      name: grpc-web
  selector:
    app: interests
    app.kubernetes.io/name: interests
    app.kubernetes.io/instance: interests

VirtualService:

kind: VirtualService
metadata:
  name: interests
spec:
  hosts:
  - '*'
  gateways:
  - environment-common
  http:
  - match:
    - uri:
        prefix: /<path-to-grpc-package>.interests.Interests
    route:
    - destination:
        host: interests
        port:
          number: 9090
    corsPolicy:
      
      allowOrigins:
        - exact: "*"
      allowMethods:
        - POST
        - GET
        - OPTIONS
        - PUT
        - DELETE
      allowHeaders:
        - grpc-timeout
        - keep-alive
        - user-agent
        - cache-control
        - content-type
        - content-transfer-encoding
        - x-accept-content-transfer-encoding
        - x-accept-response-streaming
        - x-user-agent
        - x-grpc-web
      maxAge: 1728s
      exposeHeaders:
        - grpc-status
        - grpc-message
      allowCredentials: true

What could i miss?

It looks like 1/2 of the requests being made of interests are failing. Try clicking on the edge leading into the service node (triangle) and in the side panel see what you can see. maybe try the Hosts or Flags tab to see a breakdown of error codes.

Hi jshaughn,
Thank for interesting in this post.
It is very strange.
I have 100% success requested on “Traffic” tab, but Health Overview still shows 50% failures.
Overview:


Traffic:

I wonder if it could be a rounding error. Given the low rate of 0.03 requests per second the precision may come into play. Could you confirm the metrics in prometheus by executing the query:

sum(istio_requests_total{destination_service_name=“interests”}) by (reporter, response_code, source_workload, destination_workload)

For info about querying prometheus see https://kiali.io/documentation/staging/faq/#prometheus

@jshaughn,
Thanks again for interesting in it ))

I generated 1000 requested in the loop in the browser console to the api built with grpc-web.
The health status still is 50% fails.
The query to prometheus

istio_requests_total{reporter="source", destination_service_name="interests"}

returns everything with status 200:

istio_requests_total{app="istio-ingressgateway",chart="gateways",connection_security_policy="unknown",destination_app="interests",destination_canonical_revision="latest",destination_canonical_service="interests",destination_principal="spiffe://cluster.local/ns/default/sa/interests",destination_service="interests.default.svc.cluster.local",destination_service_name="interests",destination_service_namespace="default",destination_version="unknown",destination_workload="interests",destination_workload_namespace="default",heritage="Tiller",instance="10.244.0.4:15090",istio="ingressgateway",job="kubernetes-pods",kubernetes_namespace="istio-system",kubernetes_pod_name="istio-ingressgateway-59cf75bf7-sclzz",pod_template_hash="59cf75bf7",release="istio",reporter="source",request_protocol="http",response_code="200",response_flags="-",service_istio_io_canonical_name="istio-ingressgateway",service_istio_io_canonical_revision="latest",source_app="istio-ingressgateway",source_canonical_revision="latest",source_canonical_service="istio-ingressgateway",source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}	17
istio_requests_total{app="istio-ingressgateway",chart="gateways",connection_security_policy="unknown",destination_app="interests",destination_canonical_revision="latest",destination_canonical_service="interests",destination_principal="spiffe://cluster.local/ns/default/sa/interests",destination_service="interests.default.svc.cluster.local",destination_service_name="interests",destination_service_namespace="default",destination_version="latest",destination_workload="interests",destination_workload_namespace="default",heritage="Tiller",instance="10.244.0.4:15090",istio="ingressgateway",job="kubernetes-pods",kubernetes_namespace="istio-system",kubernetes_pod_name="istio-ingressgateway-59cf75bf7-sclzz",pod_template_hash="59cf75bf7",release="istio",reporter="source",request_protocol="http",response_code="200",response_flags="-",service_istio_io_canonical_name="istio-ingressgateway",service_istio_io_canonical_revision="latest",source_app="istio-ingressgateway",source_canonical_revision="latest",source_canonical_service="istio-ingressgateway",source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}	32
istio_requests_total{app="istio-ingressgateway",chart="gateways",connection_security_policy="unknown",destination_app="interests",destination_canonical_revision="ce649f0fe1801fbc53c987522f9fc8b6bf122645",destination_canonical_service="interests",destination_principal="spiffe://cluster.local/ns/default/sa/interests",destination_service="interests.default.svc.cluster.local",destination_service_name="interests",destination_service_namespace="default",destination_version="ce649f0fe1801fbc53c987522f9fc8b6bf122645",destination_workload="interests",destination_workload_namespace="default",heritage="Tiller",instance="10.244.0.4:15090",istio="ingressgateway",job="kubernetes-pods",kubernetes_namespace="istio-system",kubernetes_pod_name="istio-ingressgateway-59cf75bf7-sclzz",pod_template_hash="59cf75bf7",release="istio",reporter="source",request_protocol="http",response_code="200",response_flags="-",service_istio_io_canonical_name="istio-ingressgateway",service_istio_io_canonical_revision="latest",source_app="istio-ingressgateway",source_canonical_revision="latest",source_canonical_service="istio-ingressgateway",source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}	3
istio_requests_total{app="istio-ingressgateway",chart="gateways",connection_security_policy="unknown",destination_app="interests",destination_canonical_revision="d2f3b0fe1170935d54ee98591883746b9e6f78bf",destination_canonical_service="interests",destination_principal="spiffe://cluster.local/ns/default/sa/interests",destination_service="interests.default.svc.cluster.local",destination_service_name="interests",destination_service_namespace="default",destination_version="d2f3b0fe1170935d54ee98591883746b9e6f78bf",destination_workload="interests",destination_workload_namespace="default",heritage="Tiller",instance="10.244.0.4:15090",istio="ingressgateway",job="kubernetes-pods",kubernetes_namespace="istio-system",kubernetes_pod_name="istio-ingressgateway-59cf75bf7-sclzz",pod_template_hash="59cf75bf7",release="istio",reporter="source",request_protocol="http",response_code="200",response_flags="-",service_istio_io_canonical_name="istio-ingressgateway",service_istio_io_canonical_revision="latest",source_app="istio-ingressgateway",source_canonical_revision="latest",source_canonical_service="istio-ingressgateway",source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}	8144
istio_requests_total{app="istio-ingressgateway",chart="gateways",connection_security_policy="unknown",destination_app="interests",destination_canonical_revision="2d5b9e86823db2e25eba69fe33af5c86f62a103a",destination_canonical_service="interests",destination_principal="spiffe://cluster.local/ns/default/sa/interests",destination_service="interests.default.svc.cluster.local",destination_service_name="interests",destination_service_namespace="default",destination_version="2d5b9e86823db2e25eba69fe33af5c86f62a103a",destination_workload="interests",destination_workload_namespace="default",heritage="Tiller",instance="10.244.0.4:15090",istio="ingressgateway",job="kubernetes-pods",kubernetes_namespace="istio-system",kubernetes_pod_name="istio-ingressgateway-59cf75bf7-sclzz",pod_template_hash="59cf75bf7",release="istio",reporter="source",request_protocol="http",response_code="200",response_flags="-",service_istio_io_canonical_name="istio-ingressgateway",service_istio_io_canonical_revision="latest",source_app="istio-ingressgateway",source_canonical_revision="latest",source_canonical_service="istio-ingressgateway",source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}	5376

Just nothing with error… Very strange

Try with reporter=“destination”

@pereslava Did you conclude anything? Have similar issue here. Everything looks ok everywhere but shows red health/failure in kiali for a grpc-web service.

@bergemalm If you select the node in the graph and then hover over the health icon in te side-panel it should explain why it thinks the service is unhealthy.

@jshaughn That only shows “Inbound traffic failure 100%”. Very low traffic atm.

Hmmm, perhaps a bug with very low traffic, perhaps rounding related. Maybe specific to GRPC, I’m not sure. Can you query prometheus for

istio_requests_total{destination_service_name="YOUR_SERVICE_NAME"}

and check for any time series with request_code or grpc_status_code that looks suspicious?

I have the same problem. We also use grpc-web with appProtocol and all inbound traffic statuses are errors in Kiali dashboard.

Below are the results of the query from Prometheus.

istio_requests_total{app="yorkie", app_kubernetes_io_instance="yorkie", app_kubernetes_io_version="0.3.1", connection_security_policy="mutual_tls", destination_app="yorkie", destination_canonical_revision="0.3.1", destination_canonical_service="yorkie", destination_cluster="Kubernetes", destination_principal="spiffe://cluster.local/ns/yorkie/sa/default", destination_service="yorkie.yorkie.svc.cluster.local", destination_service_name="yorkie", destination_service_namespace="yorkie", destination_version="0.3.1", destination_workload="yorkie", destination_workload_namespace="yorkie", grpc_response_status="2", instance="10.0.5.55:15020", job="kubernetes-pods", namespace="yorkie", pod="yorkie-6989876c7d-7rk9v", pod_template_hash="6989876c7d", reporter="destination", request_protocol="grpc", response_code="200", response_flags="-", security_istio_io_tlsMode="istio", service_istio_io_canonical_name="yorkie", service_istio_io_canonical_revision="0.3.1", source_app="istio-ingressgateway", source_canonical_revision="latest", source_canonical_service="istio-ingressgateway", source_cluster="Kubernetes", source_principal="spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account", source_version="unknown", source_workload="istio-ingressgateway", source_workload_namespace="istio-system", version="0.3.1"}

grpc_response_status=“2” is a grpc error code for “Unknown Error”. I think Kiali is reporting errors because the telemetry is reporting errors. GRPC Core: Status codes and their use in gRPC