Hi all,
I’ve already asked in Prometheus community, but i think it’s good to ask also here to have another kind of comprehension of a behaviour that we’ve currently observing.
We’ve got a GKE Kubernetes cluster (v1.16.15-gke.7800
) where Istio 1.8.3 has been installed and is managing pods from the default
namespace.
Istio has also been installed in the cluster with the default “PERMISSIVE” mode, as to say that every envoy sidecar accepts plain http traffic (as far as I can understand)
Everything is deployed in default namespace, and everypod BUT prometheus/alertmanager/grafana is managed by Istio (i.e. the monitoring stack is out of the mesh. We’ve managed to do it by using the neverInjectSelector
key in istio-sidecar-injector
ConfigMap)
Prometheus can successfully scrape all its targets (defined via ServiceMonitors), every target but some that it fails consistently to scrape.
For example, from the logs of Prometheus i can see:
level=debug ts=2021-02-19T11:15:55.595Z caller=scrape.go:927 component="scrape manager" scrape_pool=default/divolte/0 target=http://10.172.22.36:7070/metrics msg="Scrape failed" err="server returned HTTP status 503 Service Unavailable"
But if i log into the Prometheus pod i can successully reach the pod that it’s failing to scrape
/prometheus $ wget -SqO /dev/null http://10.172.22.36:7070/metrics
HTTP/1.1 200 OK
date: Fri, 19 Feb 2021 11:27:57 GMT
content-type: text/plain; version=0.0.4; charset=utf-8
content-length: 75758
x-envoy-upstream-service-time: 57
server: istio-envoy
connection: close
x-envoy-decorator-operation: divolte-srv.default.svc.cluster.local:7070/*
What am I missing? I see that the 503 is the actual response of the target, and that means that Prometheus is reaching it while scraping, but obtains the error.
What i cannot understand is what are the differences, in term of “networking path” and “involved pieces” between the scraping (that fails) and the “internal wget-ing” that succeeds. And i cannot also understand how to debug it.
Here the relevant logs from the main container and the Envoy/Istio proxy
Main container
❯ k logs divolte-dpy-594d8cb676-vgd9l prometheus-jmx-exporter
DEBUG: Environment variables set/received...
Service port (metrics): 7070
Destination host: localhost
Destination port: 5555
Rules to appy: divolte
Local JMX: 7071
CONFIG FILE not found, enabling PREPARE_CONFIG feature
Preparing configuration based on environment variables
Configuration preparation completed, final cofiguration dump:
############
---
hostPort: localhost:5555
username:
password:lowercaseOutputName: true
lowercaseOutputLabelNames: true
########
Starting Service..
Istio-proxy
❯ k logs divolte-dpy-594d8cb676-vgd9l istio-proxy -f
2021-02-22T07:41:15.450702Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012
2021-02-22T07:41:15.451182Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0,
2021-02-22T07:41:15.894626Z info xdsproxy Envoy ADS stream established
2021-02-22T07:41:15.894837Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2021-02-22T08:11:25.679886Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012
2021-02-22T08:11:25.680655Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0,
2021-02-22T08:11:25.936956Z info xdsproxy Envoy ADS stream established
2021-02-22T08:11:25.937120Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2021-02-22T08:39:56.813543Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012
2021-02-22T08:39:56.814249Z warning envoy config StreamAggregatedResources gRPC config stream closed: 0,
2021-02-22T08:39:57.183354Z info xdsproxy Envoy ADS stream established
2021-02-22T08:39:57.183653Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:150