Istio-ingressgateway High-CPU

iciolus · November 1, 2019, 6:41pm

After making the upgrade from 1.2.* to 1.3* (I’ve gone from 1.3.0 to 1.3.3 with the same results). I’m seeing exponential CPU utilization in the istio-ingressgateway pods. While in 1.2.5 the pod took less than 50m CPU and immediately after upgrading to 1.3.0 and 1.3.3 the CPU spikes to 500-800m CPU and scales out to the maximum value of instances that the autoscaler allows.

I’ve tested this against low and high traffic clusters (We’re running about 10 small ones) and the result is the same from one with around 300RPS to 0 RPS.

When exec-ing into the pod itself and running a top/htop, the envoy process spikes to +100% about every 30 seconds. Is there information on what went into the last versions of proxyv2 that would cause this? I can make adjustments to the clusters to accommodate for the time being but I’m a little concerned if I see spikes in traffic and the system can’t handle it.

iciolus · November 4, 2019, 4:39pm

So digging through the issue above, I narrowed the cause down to all 1.3.* images of istio-proxyV2. When downgrading the proxy to 1.2.8 (or anything lower) I saw a dramatic decrease in utilization, with no change in traffic

Being a less than stellar Golang dev, I’ve still yet to find a cause within the upgrade to 1.3, but I’d love to hear some thoughts!

I know that going forward with upgrades will be an issue now though… Any ideas?

Matthias_Jg · November 5, 2019, 7:45pm

I have noticed the same issues. Going up from 1.2.x to 1.3.2. The CPU usage of Istio Ingressgateway was way higher than before. Did you find the issue for this? Maybe a wrong Helm Chart flag ?

iciolus · November 5, 2019, 8:04pm

There’s an open issue on the repo for it. I haven’t found a root cause in the minor version upgrade as to why it’s happening. Here’s the link:

iciolus · November 5, 2019, 8:05pm

Also, I opened this issue:

dwradcliffe · November 6, 2019, 3:12pm

I haven’t looked but perhaps it’s related to Regression: Istio 1.1 sidecar cpu peaks periodically far beyond 1.0.x (PILOT_DISABLE_XDS_MARSHALING_TO_ANY) · Issue #12162 · istio/istio · GitHub

In 1.3 we have re-enabled PILOT_DISABLE_XDS_MARSHALING_TO_ANY

Topic		Replies	Views
Istio Sidecar consuming high CPU	29	9666	December 26, 2019
Pods not balanced from ingressgateway Networking	2	627	September 23, 2020
Ingress-gateway cpu usage stuck at 100% vcpu allocated 1.4.6 Performance and Scalability	17	2806	March 25, 2020
Istio-ingressgateway tuning for TLS termination Performance and Scalability	3	1609	March 23, 2019
Ingress gateway pods takes ages to forward traffic after upgrading to v1.1.x Networking	1	886	May 16, 2019

Istio-ingressgateway High-CPU

Related topics