Hello,
We have been seeing an issue with the istioctl proxy-status in our cluster alternating between SYNCED and STALE state across services in the mesh since earlier today, resulting in sporadic 404s for calls through the ingressgateway.
When inspecting the routes on the ingressgateway, it alternates between having the single blackhole:80 route reporting the 404s, and at other times all the valid (191) routes.
We’re running istio version 1.1.3 in EKS, and currently have 4 replicas running for pilot. There are 100 services in the cluster, across 186 pods running on 22 worker nodes in AWS.
We are also trying to scale pilot to see if there is any impact (cpu and mem are somewhat higher than baseline right now, but not at capacity yet).
While it appears to be an issue with the ingress’s sidecar not being able to reach pilot discovery container consistently to resolve the routes for the services, it is not clear as to why this would happen. We do see some of these warnings in the ingress gateway sidecar container’s logs:
[2019-05-22 20:23:47.935][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream
[2019-05-22 20:23:47.935][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
[2019-05-22 20:24:01.436][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 13,
Routes reported by ingressgateway when proxy-status is SYNCED…
istioctl proxy-config route istio-ingressgateway-747ff57cc5-lzd94 -n istio-system
NOTE: This output only contains routes loaded via RDS.
NAME VIRTUAL HOSTS
http.80 191
1
when proxy-status is STALE…
istioctl proxy-config route istio-ingressgateway-747ff57cc5-lzd94 -n istio-system
NOTE: This output only contains routes loaded via RDS.
NAME VIRTUAL HOSTS
http.80 1
1
istioctl proxy-config route istio-ingressgateway-747ff57cc5-lzd94 -n istio-system -o json
[
{
"name": "http.80",
"virtualHosts": [
{
"name": "blackhole:80",
"domains": [
"*"
],
"routes": [
{
"match": {
"prefix": "/"
},
"directResponse": {
"status": 404
},
"perFilterConfig": {
"mixer": {
"disable_check_calls": true
}
}
}
]
}
],
"validateClusters": false
},
{
"virtualHosts": [
{
"name": "backend",
"domains": [
"*"
],
"routes": [
{
"match": {
"prefix": "/stats/prometheus"
},
"route": {
"cluster": "prometheus_stats"
}
}
]
}
]
}
]
Have also looked at the possibility of a bad route/serviceentry/virtualservice in the cluster but nothing has jumped out yet.
Would appreciate any help/pointers to troubleshoot this issue further.
thanks,
Aish