Detecting misconfigured virtualservices

We’ve had instances where our automation has had a bug that caused VirtualServices to be misconfigured. I’d like to detect when this happens. The most obvious side effect is requests through the mesh or ingress don’t reach their intended pods. Is there any metric I can track to detect when this happens? It’d probably be enough to track when ingress requests have no where to be routed.

I think that really depends on the type of misconfiguration.

You could try a prometheus query like this:

sum(irate(istio_requests_total{source_workload="istio-ingressgateway"}[1h])) by (destination_service, response_code)

If it’s about VirtualServices pointing into empty space, looking at the results with destination_service==unknown might help

The last misconfiguration we’ve dealt with is all entries having a header exact match which isn’t normally present during a request, so normal requests weren’t being routed to any subset.

I think you should be able to catch that by looking for requests with a destination_workload=unknown or destination_service=unknown. destination_version might also work, I think that is about subsets.

1 Like

Please try out https://preliminary.istio.io/docs/ops/diagnostic-tools/istioctl-analyze/, this catches many issues

In addition to the Istio analyze feature you may also want to see if Kiali (kiali.io) validation spot a problem, or whether visually you can detect an issue looking at the Kiali traffic graph.

Thanks everyone. Some great debugging tips, but I’m looking for more of an alerting system. Going to play with dgn’s idea, seems promising.