I started getting 503s from ingressgateways suddenly on a cluster running 1.1.4.
istio-ingressgateway-6f5c8645ff-2h7px.istio-system SYNCED SYNCED SYNCED (87%) SYNCED istio-pilot-bb96c89f5-kkzf8 1.1.3
istio-ingressgateway-6f5c8645ff-4f7k9.istio-system SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-bb96c89f5-v54zm 1.1.3
istio-ingressgateway-6f5c8645ff-4pk2t.istio-system NOT SENT SYNCED SYNCED (77%) SYNCED istio-pilot-bb96c89f5-nfhv9 1.1.3
istio-ingressgateway-6f5c8645ff-7bgdq.istio-system SYNCED SYNCED SYNCED (88%) SYNCED istio-pilot-bb96c89f5-nfhv9 1.1.3
istio-ingressgateway-6f5c8645ff-7v6cz.istio-system NOT SENT SYNCED SYNCED (78%) SYNCED istio-pilot-bb96c89f5-kkzf8 1.1.3
istio-ingressgateway-6f5c8645ff-fd9cm.istio-system NOT SENT SYNCED SYNCED (88%) SYNCED istio-pilot-bb96c89f5-dmj7w 1.1.3
istio-ingressgateway-6f5c8645ff-kxgv8.istio-system SYNCED SYNCED SYNCED (93%) SYNCED istio-pilot-bb96c89f5-9xd46 1.1.3
istio-ingressgateway-6f5c8645ff-lxkgs.istio-system SYNCED SYNCED SYNCED (94%) SYNCED istio-pilot-bb96c89f5-55zd6 1.1.3
istio-ingressgateway-6f5c8645ff-q662p.istio-system SYNCED SYNCED SYNCED (89%) SYNCED istio-pilot-bb96c89f5-kkzf8 1.1.3
istio-ingressgateway-6f5c8645ff-qkwn2.istio-system SYNCED SYNCED SYNCED (89%) SYNCED istio-pilot-bb96c89f5-kkzf8 1.1.3
istio-ingressgateway-6f5c8645ff-qmvg2.istio-system NOT SENT SYNCED SYNCED (86%) SYNCED istio-pilot-bb96c89f5-zk9fb 1.1.3
istio-ingressgateway-6f5c8645ff-qtghv.istio-system SYNCED SYNCED SYNCED (89%) SYNCED istio-pilot-bb96c89f5-hnvxb 1.1.3
istio-ingressgateway-6f5c8645ff-x5h7v.istio-system NOT SENT SYNCED SYNCED (78%) SYNCED istio-pilot-bb96c89f5-hnvxb 1.1.3
istio-ingressgateway-6f5c8645ff-xp586.istio-system SYNCED SYNCED SYNCED (100%) SYNCED istio-pilot-bb96c89f5-8xdtj 1.1.3
istio-ingressgateway-6f5c8645ff-z5zlc.istio-system NOT SENT SYNCED SYNCED (79%) SYNCED istio-pilot-bb96c89f5-v54zm 1.1.3
All the pods that are marked as CDS: NOT SENT are the pods that are returning 503 NR. I compared the proxy-config of pods that says CDS: SYNCED with those that say CDS: NOT SENT. I found that the sync’d pods had all the latest deployments and virtual service configuration, while the err’ng pods missed recent deployment and virtual service configurations.
I see zero error logs in my pilot and ingressgateway pods, and not sure why this is happening.
In terms of memory and CPU, everything looks good in Grafana. Traffic is only the occasional tests that we run.