Istio-ingressgateway High-CPU

After making the upgrade from 1.2.* to 1.3* (I’ve gone from 1.3.0 to 1.3.3 with the same results). I’m seeing exponential CPU utilization in the istio-ingressgateway pods. While in 1.2.5 the pod took less than 50m CPU and immediately after upgrading to 1.3.0 and 1.3.3 the CPU spikes to 500-800m CPU and scales out to the maximum value of instances that the autoscaler allows.

I’ve tested this against low and high traffic clusters (We’re running about 10 small ones) and the result is the same from one with around 300RPS to 0 RPS.

When exec-ing into the pod itself and running a top/htop, the envoy process spikes to +100% about every 30 seconds. Is there information on what went into the last versions of proxyv2 that would cause this? I can make adjustments to the clusters to accommodate for the time being but I’m a little concerned if I see spikes in traffic and the system can’t handle it.

So digging through the issue above, I narrowed the cause down to all 1.3.* images of istio-proxyV2. When downgrading the proxy to 1.2.8 (or anything lower) I saw a dramatic decrease in utilization, with no change in traffic

Being a less than stellar Golang dev, I’ve still yet to find a cause within the upgrade to 1.3, but I’d love to hear some thoughts!

I know that going forward with upgrades will be an issue now though… Any ideas?

2 Likes

I have noticed the same issues. Going up from 1.2.x to 1.3.2. The CPU usage of Istio Ingressgateway was way higher than before. Did you find the issue for this? Maybe a wrong Helm Chart flag ?

There’s an open issue on the repo for it. I haven’t found a root cause in the minor version upgrade as to why it’s happening. Here’s the link:

Also, I opened this issue:

I haven’t looked but perhaps it’s related to Regression: Istio 1.1 sidecar cpu peaks periodically far beyond 1.0.x (PILOT_DISABLE_XDS_MARSHALING_TO_ANY) · Issue #12162 · istio/istio · GitHub

In 1.3 we have re-enabled PILOT_DISABLE_XDS_MARSHALING_TO_ANY