Istio Sidecar consuming high CPU

I was able to collect an strace during this issue

strace: Process 5296 attached with 15 threads
^Cstrace: Process 5296 detached
strace: Process 5329 detached
strace: Process 5330 detached
strace: Process 5331 detached
strace: Process 5332 detached
strace: Process 5338 detached
strace: Process 5340 detached
strace: Process 5341 detached
strace: Process 5342 detached
strace: Process 5343 detached
strace: Process 5344 detached
strace: Process 5345 detached
strace: Process 5346 detached
strace: Process 5347 detached
strace: Process 6356 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.39    0.176384         256       687        44 futex
  9.04    0.021728          49       436           nanosleep
  8.82    0.021201         122       173           epoll_pwait
  6.45    0.015504        3100         5         2 restart_syscall
  0.64    0.001548          30        50           write
  0.50    0.001208          11       109        50 read
  0.31    0.000753          75        10           sched_yield
  0.30    0.000715          14        50        25 accept4
  0.24    0.000567          22        25           close
  0.16    0.000385           7        50           epoll_ctl
  0.08    0.000181           7        25           getsockname
  0.07    0.000178           7        25           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.240352                  1645       121 total

Any update on this? the error keeps happening on istio/proxyv2:1.3.2-distroless

Still seeing this issue in 1.3.3… Any word?

Hi @iciolus We’re still trying to find a way to reproduce this. Have you seen any patterns? Would you be able to share your Envoy config?

Thanks,
Brian

We have the same problem with 1.3.3. In our sittuation we pinned the problem with file upload and download. Now I am creating a simple service to do just that in order to reproduce it reliably and will report back the result.

We are also having the same issue, i was working with @Francois last night to try and provide the outputs from dump_kubernetes.sh etc.

We are using 1.3.3 in production.

If we can provide any more info, willing to try and help… we have been running 1.3.0 on our staging environment for quite 2-3 weeks and didnt notice any issue, we also upgraded to 1.3.3 2 days ago and everything was still fine on that cluster.

So i guess its caused by high load somehow?

In our case, we are running 3 clusters on istio 1.3.3. On two of them (which are very similar - flannel, kube-dns and kubernetes 1.11.4) we are encountering this issue and on the 3rd one (cilium, coredns, kubernetes 1.12.7) there are no pods with high cpu envoys. All of them run workloads that range from low to high ops and have similar application patterns.

To be sure I’m providing the right configs, which are you looking for… will gladly share it. Although, the biggest impact is actually in istio-ingressgateway (also running proxyv2).

I had a sandbox cluster that was using 1.2.2 that I wanted to test and see if I could reproduce…

With a single pod the ingress-gateway was only using about 43m on the 1.2.2 image. After upgrading to 1.3.3 it hit the max replicas in the HPA within minutes all sitting somewhere close to 800m+ CPU.

This cluster does not have any inbound traffic at all, why would it see that big of a spike in utilization?

Just adding my two cents, I’ve had these symptoms on 1.4 as well