High CPU on istiod pod

Hello,

We are using Istio v1.12.6 and during Istio version upgrade to v1.13.9, we observed high CPU on one of the istiod from Istio Pilot Discovery service (Istiod pods). Due to this performance issue we rollback Istio to v1.12.6.

Current configuration of CPU is 0.5 for Pilot. We also observe high CPU even after increasing this CPU to 1.

We have approximately 20 services and run with more than 25 Pods.
Few errors from Istiod logs we observed:

DR:

gc 316 @1901.164s 1%: 0.23+288+0.085 ms clock, 0.93+468/288/0+0.34 ms cpu, 96->101->53 MB, 99 MB goal, 4 P │
│ 2022-12-07T11:29:26.630981Z info ads RDS: PUSH for node:sre-hello-world-sec-1-1-18-845c69c99b-9dm9h.prod-sre resources:18 size:19.8kB cached:18/18 │
│ 2022-12-07T11:29:26.634582Z error status Encountered unexpected error updating status for &{{networking.istio.io/v1alpha3/DestinationRule ebdd27d9-af3f-11e9-b118-42010ab4081a citadel-blackbox ops-sre cluster.local map map[kubectl.kubernetes │
│ ] 768207009 2019-07-26 00:54:28 +0000 UTC 1} host:“istio-citadel.istio-system.svc.cluster.local” traffic_policy:<port_level_settings:<port:<number:15014 > tls:<> > > conditions:<type:“Reconciled” status:“True” last_probe_time:<seconds:1670412552 n │

2022-12-07T11:59:00.052811Z error status Encountered unexpected error updating status for &{{networking.istio.io/v1alpha3/DestinationRule 2b1af442-3688-43e4-a494-3a62889f6ac6 sre-hello- │
│ pipeline-uuid:637b7d666127567b6357920b short-sha:ba3fcba traffic:single-destination] 768227975 2022-02-24 06:16:36 +0000 UTC 9} host:“sre-hello-world.prod-sre.svc.cluster.local” traffic_poli │
│ gc 51 @201.167s 0%: 0.17+88+0.045 ms clock, 0.70+2.3/25/118+0.18 ms cpu, 99->100->48 MB, 104 MB goal, 4 P │
│ gc 52 @206.470s 0%: 0.18+94+0.12 ms clock, 0.75+2.8/94/163+0.48 ms cpu, 94->95->48 MB, 97 MB goal, 4 P

Even after removing these DRs, we had high CPU on one of Istiod pod.

Other configurations:

% kubectl get envoyfilters.networking.istio.io -A

NAMESPACE NAME AGE
istio-system stats-filter-1.10-1-12-6 14d
istio-system stats-filter-1.11-1-12-6 14d
istio-system stats-filter-1.12-1-12-6 14d
istio-system tcp-stats-filter-1.10-1-12-6 14d
istio-system tcp-stats-filter-1.11-1-12-6 14d
istio-system tcp-stats-filter-1.12-1-12-6 14d

% kubectl get configmaps -A | grep istiod
istio-system istiod-1-12-6-7c9c5bf87b-45gvd-distribution 1 14d
istio-system istiod-1-12-6-7c9c5bf87b-5j2t8-distribution 1 10d
istio-system istiod-1-12-6-7c9c5bf87b-66b84-distribution 1 14d
istio-system istiod-1-12-6-7c9c5bf87b-fr7m7-distribution 1 10d
istio-system istiod-1-12-6-7c9c5bf87b-q7b56-distribution 1 10d
istio-system istiod-1-12-6-7c9c5bf87b-qgqlz-distribution 1 14d
istio-system istiod-1-13-9-6685766656-2h7z2-distribution 1 14d
istio-system istiod-1-13-9-6685766656-772wd-distribution 1 14d
istio-system istiod-1-13-9-6685766656-97t4x-distribution 1 14d
istio-system istiod-1-13-9-6685766656-fvplk-distribution 1 14d
istio-system istiod-1-13-9-6685766656-kmz6x-distribution 1 14d
istio-system istiod-1-13-9-6685766656-tfzqr-distribution 1 14d
istio-system istiod-1-13-9-6685766656-z7mhz-distribution 1 14d
istio-system istiod-1-13-9-6b595f5bb7-cmf88-distribution 1 14d
istio-system istiod-1-13-9-6b595f5bb7-jkql2-distribution 1 14d
istio-system istiod-1-13-9-6b595f5bb7-vnppm-distribution 1 14d
istio-system istiod-1-13-9-6b595f5bb7-wzjbd-distribution 1 14d
istio-system istiod-1-13-9-86ddc47f8c-7274b-distribution 1 14d
istio-system istiod-1-13-9-86ddc47f8c-lk9rv-distribution 1 14d
istio-system istiod-1-13-9-86ddc47f8c-tfhfw-distribution 1 14d
istio-system istiod-1-13-9-86ddc47f8c-wlfn9-distribution 1 14d

Do you guys have any thought on this performance issue?

Thanks in advance!

Suhas

Do you see a lot of error, failure messages within the High CPU Pod containers ?

Yes, many similar error messages related VS and DR in the logs of istiod pod.
eg. Encountered unexpected error updating status for &{{networking.istio.io/v1alpha3/DestinationRule

What is the output of following two commands ?
istioctl analyze
istioctl proxy-status

Here is analyze output, Selector is already there still analyze is showing error:

$./istioctl analyze -A | grep -E ‘Error|Warning’
Error: Analyzers found issues when analyzing all namespaces.
See Istioldie 1.13 / Configuration Analysis Messages for more information about causes and resolutions.
Error [IST0101] (Gateway egress/api-gateway) Referenced selector not found: “istio=sec-egressgateway”
Error [IST0101] (Gateway egress/api-token) Referenced selector not found: “istio=sec-egressgateway”
Error [IST0101] (Gateway ingress/global-gateway) Referenced selector not found: “istio=sec-ingressgateway”
Error [IST0101] (Gateway egress/mediation) Referenced selector not found: “istio=sec-egressgateway”
Error [IST0101] (Gateway egress/sre-testing) Referenced selector not found: “istio=sec-egressgateway”

./istioctl proxy-status was showing SYNCED after sometime. (I do not have command output from production)

how do you check “Selector is already there” ?

Hello

Here is available selector in my GW.


apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: global-gateway
namespace: ingress
spec:
selector:
istio: sec-ingressgateway

Thanks.

Your errors are : Gateway egress/api-gateway, Gateway egress/api-token, Gateway egress/mediation, Gateway egress/sre-testing

And your post is : Gateway ingress/global-gateway

We have errors for Gateway egress/api-gateway, Gateway egress/api-token, Gateway egress/mediation, Gateway egress/sre-testing and others but I have shared yaml for GW ingress/global-gateway only to show you that we have selector available.

All these yamls for Gateway egress/api-gateway, Gateway egress/api-token, Gateway egress/mediation, Gateway egress/sre-testing containing selector even though analyze is giving errors.

Thanks for posting your error as well as the suspicious troublemaker for analysis.

If the K8S Service label matches your K8S Pod selector, analyze should not complain. This high CPU pod could be the only one that is serving.

We found that analyze utility itself is not working correctly and giving incorrect result with 1.13
Ref: istioctl analyze: showing invalid error on release 1.13.2 · Issue #38148 · istio/istio · GitHub

Can you please let us know what could be reason for high CPU.

Yes, let me know What else guidances you need to fix your error.

I need solution for performance issue on istiod pod, the CPU usage is 100% after upgrade.

@CoderCooker What would be the solution for high CPU on istiod.

Can someone help me in finding the cause of high CPU on istiod.

Can someone help me in finding the cause of high CPU on istiod.
@CoderCooker Do you have anything for high CPU on istiod pod.

Can someone help me in finding the cause of high CPU on istiod.
@CoderCooker Do you have anything for high CPU on istiod pod.

Can you reproduce the issue with at least a supported istio version (1.16.3) and probably with latest?

Consider checking pprof profile when you observe the high cpu usage - or if it’s easy to reproduce, open an issue with how to reproduce it?