I noticed that Pilot had CPU spikes from time to time. These spikes happen when it had lots of push errors (timeouts) for 2-3 mins. Meanwhile, eds pushes went from 2 ops to 60+ and back to 2; same for cds, lds, rds, but with lower values.
In Pilot pods I found this type of logs:
... 2019-09-02T12:04:37.188365Z warn ads Failed to push, client busy sidecar~10.200.144.16~<POD1>-25524 2019-09-02T12:04:37.079611Z warn ads Failed to push, client busy sidecar~10.200.134.218~<POD2>-25596 ...
In order to investigate, I ran
istioctl proxy-status I get the following output
NAME CDS LDS EDS RDS PILOT VERSION <POD1_SVC_A> SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-864c44c7f8-rn7v7 1.0-dev <POD1_SVC_B> SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-864c44c7f8-bww7j 1.2.4 <POD2_SVC_B> SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-864c44c7f8-9bspg 1.0-dev ... // many more pods from different services with same values as above
Everything looked ok, except the VERSION column. Why are there multiple versions? Both Pilot and istio-proxy had running a istio:1.2.4 official image.
Moreover, what should I look for when Pilot has this kind of random behaviour?
I am running istio 1.2.4, installed via Helm, on-premise cluster.