Pilot cluster scalability results


Today on the community call, Mitch and Mandar did some really good overviews of performance enhancements that went into 1.1. First thanks Mitch, Mandar and all who put in the work!

I asked a question about where the configs-under-test lived and Mandar pointed me to them, again thanks. For this question, I’m interested mostly in “cluster scale”, meaning to me how much CPU/memory pilot and pilot-agent/envoy consume in proportion to the number of services, pods, and configs.

My team was doing some pre-testing (~deploying 100 bookinfos) of an Istio 1.1-RC and we saw:

  1. A reduction by an average of 68% of memory in the pod (it’s bookinfo, so dominated mostly by the sidecar) just by moving from 1.0 to 1.1, regardless of config (no graph pruning).
  2. A reduction by an average of 2% of memory in the pod turning on graph pruning.

So “1” is good but unrelated to the graph pruning/sidecar resource. We were expecting the boost from “2” to be much bigger. And Mandar reported today that Istio’s testing found 2 to be much bigger (also the relnotes). So we’re trying to reconcile: Did we configure wrong? We’re looking through the configs from https://github.com/istio/tools/tree/master/perf but there are some other things that would help if available:

What version of the 1.1 RC was tested? (We used 1.1.0-rc0)
Is this the right place to look for “results”: https://github.com/istio/istio/issues/9961 and the linked raintanks https://snapshot.raintank.io/dashboard/snapshot/KvgVBadMUtTlPSwU5NAMAjItuLv37Gd7?refresh=5s&orgId=2 and https://snapshot.raintank.io/dashboard/snapshot/sAVoeBv98auA0V5lfLNUo2D89gpnFV9y?orgId=2 but we’re still trying to digest.

Thanks in advance for any info or pointers.

1 Like