Some Context:
My team provides a Platform as a Service like offering to our company’s microservice teams. We provide CI/CD and the environment runtime via K8s with Istio. We’re committed to running Istio in production, but we’re often faced with the question:
“What is the Istio overhead?”
On a case by case basis, this isn’t too difficult to derive or even visualize (its worth noting we’re utilizing Google Cloud and the Stackdriver adaptor…alot can be deduced from Stackdriver tracing). But I wanted to know if anyone else is coming up with a more elegant answer to this question? Rephrased, and from the perspective of a the micrservices team is, “What cost do I incur for the features and benefits of Istio?” And the answer is either latency, or $$ for more resources…or a combination of both I suppose. As this platform is our team’s responsibility, my team would like to be able monitor in some way the Ingress to Side Car Latency…Side Car to App Container Latency…side car to side car. As we tweak the resources of the cluster, can we visualize its effects.
We are in early production days, so we’ve been able to manually address this question…but I’d like to be able to report on it in a metrics like manner, or get alerts if there is a upwards trend in the Istio Introduced Overhead. Or be able to more easily visualize the affects of any Istio/K8s config tweaks.
Hope that makes some sense. Just curious what others think.
Thanks!