I’ve already asked in Prometheus community, but i think it’s good to ask also here to have another kind of comprehension of a behaviour that we’ve currently observing.
We’ve got a GKE Kubernetes cluster (
v1.16.15-gke.7800) where Istio 1.8.3 has been installed and is managing pods from the
Istio has also been installed in the cluster with the default “PERMISSIVE” mode, as to say that every envoy sidecar accepts plain http traffic (as far as I can understand)
Everything is deployed in default namespace, and everypod BUT prometheus/alertmanager/grafana is managed by Istio (i.e. the monitoring stack is out of the mesh. We’ve managed to do it by using the
neverInjectSelector key in
Prometheus can successfully scrape all its targets (defined via ServiceMonitors), every target but some that it fails consistently to scrape.
For example, from the logs of Prometheus i can see:
level=debug ts=2021-02-19T11:15:55.595Z caller=scrape.go:927 component="scrape manager" scrape_pool=default/divolte/0 target=http://10.172.22.36:7070/metrics msg="Scrape failed" err="server returned HTTP status 503 Service Unavailable"
But if i log into the Prometheus pod i can successully reach the pod that it’s failing to scrape
/prometheus $ wget -SqO /dev/null http://10.172.22.36:7070/metrics HTTP/1.1 200 OK date: Fri, 19 Feb 2021 11:27:57 GMT content-type: text/plain; version=0.0.4; charset=utf-8 content-length: 75758 x-envoy-upstream-service-time: 57 server: istio-envoy connection: close x-envoy-decorator-operation: divolte-srv.default.svc.cluster.local:7070/*
What am I missing? I see that the 503 is the actual response of the target, and that means that Prometheus is reaching it while scraping, but obtains the error.
What i cannot understand is what are the differences, in term of “networking path” and “involved pieces” between the scraping (that fails) and the “internal wget-ing” that succeeds. And i cannot also understand how to debug it.
Here the relevant logs from the main container and the Envoy/Istio proxy
❯ k logs divolte-dpy-594d8cb676-vgd9l prometheus-jmx-exporter DEBUG: Environment variables set/received... Service port (metrics): 7070 Destination host: localhost Destination port: 5555 Rules to appy: divolte Local JMX: 7071 CONFIG FILE not found, enabling PREPARE_CONFIG feature Preparing configuration based on environment variables Configuration preparation completed, final cofiguration dump: ############ --- hostPort: localhost:5555 username: password:lowercaseOutputName: true lowercaseOutputLabelNames: true ######## Starting Service..
❯ k logs divolte-dpy-594d8cb676-vgd9l istio-proxy -f 2021-02-22T07:41:15.450702Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012 2021-02-22T07:41:15.451182Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0, 2021-02-22T07:41:15.894626Z info xdsproxy Envoy ADS stream established 2021-02-22T07:41:15.894837Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012 2021-02-22T08:11:25.679886Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012 2021-02-22T08:11:25.680655Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0, 2021-02-22T08:11:25.936956Z info xdsproxy Envoy ADS stream established 2021-02-22T08:11:25.937120Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012 2021-02-22T08:39:56.813543Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012 2021-02-22T08:39:56.814249Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0, 2021-02-22T08:39:57.183354Z info xdsproxy Envoy ADS stream established 2021-02-22T08:39:57.183653Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:150