istio 1.2
We have a service that after some time has requests hang with result EOF. When inspecting the ingressgateway logs it shows DC for this request. When inspecting the downstream pods and clusters they are all alive and the proxy_status in ingress gateway says they are all HEALTHY. In addition all have been SYNCED. Another observation is that when this occurs, for the service in question i can curl the endpoint fine from the ingressgateway pod via execing in. Also at this time this service has about 9K websockets connected to it that still receive data. Other services in cluster are routed to fine.
Looking at istio ingressgateway shows no restarts and no errors. I do see some ‘warn Omitting for collision’ in pilot but i also see these when it works.
This is very unpredictable as it happens after some time where this service already has 8-10K connections maintained.
What does DC actually mean?
How can we further debug why the requests going through ingress do not make it to service but requests from ingress pod using curl make it to service?
[2019-09-05T20:45:10.093Z] “GET /endpoint HTTP/2” 0 DC “-” “-” 0 0 59525 - “10.0.12.231” “curl/7.50.3” “d0d91b5c-ffee-4c08-af7a-db134e5c5ccc” “somehost.com” “-” - - 10.0.75.246:443 10.0.12.231:13828 somehost.com
root@istio-ingressgateway-5bd8488489-4ztfs:/# curl localhost:15000/clusters | grep “outbound|80||someservice” | grep “cx_active”
outbound|80||someservice.svc.cluster.local::10.0.10.174:8084::cx_active::332
outbound|80||someservice.svc.cluster.local::10.0.13.198:8084::cx_active::1347
outbound|80||someservice.svc.cluster.local::10.0.8.140:8084::cx_active::1002
outbound|80||someservice.svc.cluster.local::10.0.33.252:8084::cx_active::1563
outbound|80||someservice.svc.cluster.local::10.0.44.168:8084::cx_active::1946
outbound|80||someservice.svc.cluster.local::10.0.67.235:8084::cx_active::149
outbound|80||someservice.svc.cluster.local::10.0.78.248:8084::cx_active::606
outbound|80||someservice.svc.cluster.local::10.0.92.145:8084::cx_active::3055
root@istio-ingressgateway-5bd8488489-4ztfs:/# curl localhost:15000/clusters | grep “outbound|80||someservice” | grep “rq_active”
outbound|80||someservice.svc.cluster.local::10.0.10.174:8084::rq_active::1015
outbound|80||someservice.svc.cluster.local::10.0.13.198:8084::rq_active::2033
outbound|80||someservice.svc.cluster.local::10.0.8.140:8084::rq_active::1687
outbound|80||someservice.svc.cluster.local::10.0.33.252:8084::rq_active::466
outbound|80||someservice.svc.cluster.local::10.0.44.168:8084::rq_active::2630
outbound|80||someservice.svc.cluster.local::10.0.67.235:8084::rq_active::835
koutbound|80||someservice.svc.cluster.local::10.0.78.248:8084::rq_active::1291
outbound|80||someservice.svc.cluster.local::10.0.92.145:8084::rq_active::27