I have been trying to resolve an issue for a couple of weeks that just doesn’t seem to make sense to me.
First the basics,
Kubernetes 1.14.3, deployed via RKE.
Istio both 1.1.8 and 1.3.3, installed via Helm
When routing via the istio ingressgateway, we see increased latency (between 80 and 120 ms increase) in a subset of traffic (from a security scanning service) while normal user traffic is handled in the usual manner.
When we filter out the traffic from the security service there is no impact, and when we direct the impacted traffic to use an nginx-ingress there is no incresse in latency. When routing via the nginx-ingress the traffic uses the same pods as those behind the istio-ingress/ VirtualService.
The increase in latency does not appear to be load related, as the same number of pods is in service when routing via nginx and istio. In both cases the traffic is brought to Kubernetes via a Netscaler loadbalancer.
The traffic from the scanning service generates a large number of 400 && 500 series errors however I have not configured any circuit breakers that I am aware of, and the query rate is quite low. The requests once they are received are handled in approximately the same timeframe as the other “normal” requests.
Any hints as to where to start looking would be greatly appreciated.