enablePrometheusMerge and ISTIO_PROMETHEUS_ANNOTATIONS

Trying to understand the doc for using enablePrometheusMerge and env var ISTIO_PROMETHEUS_ANNOTATIONS.

Does not mention what port the users app should be using in order for the merge with Envoy metrics to work. What port should be used?

The enablePrometheusMerge will automatically add the annotations:

prometheus.io path: /stats/prometheus
prometheus.io port: "15020"
prometheus.io scrape: "true"

That is the “merged” metrics port, but what to understand the envoy and app ports that feed into that.

From the code, looks like that env VAR ISTIO_PROMETHEUS_ANNOTATIONS is used and default to “80:/metrics”
That doc is also missing: https://istio.io/latest/docs/reference/commands/pilot-agent/#envvars

cc @howardjohn @rvennam

https://github.com/istio/istio/blob/f508fdd78eb0d3444e2bc2b3f36966d904c5db52/pilot/cmd/pilot-agent/status/server.go#L131-L144 is the code and also https://istio.io/latest/docs/reference/commands/pilot-agent/#envvars is missing doc for that env var ISTIO_PROMETHEUS_ANNOTATIONS

Does not mention what port the users app should be using in order for the merge with Envoy metrics to work. What port should be used?

Whatever port you want. If you configure prometheus.io/port=1234 we use 1234. If you don’t set that annotation, we fall back to 80 (aligned with prometheus defaulting)

I see that if the app has the annotation defined already, this will create the following env vars with that info to allow the app to continue using the same port for scrapes/merges of the metrics.

 - name: ISTIO_METAJSON_ANNOTATIONS
      value: |
        {"kubernetes.io/psp":"ibm-privileged-psp","prometheus.io/path":"/metrics","prometheus.io/port":"8080","prometheus.io/scrape":"true","prometheus.istio.io/merge-metrics":"true"}
    - name: ISTIO_PROMETHEUS_ANNOTATIONS
      value: '{"scrape":"true","path":"/metrics","port":"8080"}'

else it will look for “:80/metrics”.

ISTIO_PROMETHEUS_ANNOTATIONS is an implementation detail, you don’t need to touch it yourself - you just configure the prometheus.io annotations

Thx @howardjohn, maybe the doc needs a bit of tweaking/examples to help with this understanding.

@howardjohn just to clarify, the merge support is looking for existing deployment annotations only at the deployment.metadata.annotations level? If set at the spec.template.metadata level, it won’t find them, right?

Its looking at the pods (ie spec.template.metadata)

@howardjohn Another question. With the merge, we are seeing that the metrics are now coming from the istio-proxy container. For Sysdig using this, we can only see the metrics under the istio-proxy container, it cannot see the true application containers name anymore. Therefore with multiple data plane pods, they all have the same container name.

Is this just a limitation of the merge, or is there some way to allow both scrapes to work, one for the proxy and one for the app container to allow both to show up in Sysdig?
Or allow the customer app name to be applied to the merged metrics instead of the fixed istio-proxy name?

Would need to go up to the Pod labels and use something there for the queries to make them unique I believe.

@markvan I don’t know the definitive answer, but I would guess that metrics merging is more of a convenience feature that allows you to get the application metrics without setting up a dedicated Prometheus job (since the Prometheus job for scraping the 15020 “merged” port of the Envoy proxy is provided by the Istio installation).

If you need more advanced behaviour, you can just scrape the application metrics from the application container directly, as you would normally do with Prometheus when not using Istio. And you can also scrape the Istio Envoy metrics from port 15090 of the Envoy sidecar container (not 15020, since this port serves the merged metrics).

So, you would have two Prometheus jobs instead of one, one for the Istio Envoy metrics and one for the application metrics, and you can fully control how the metrics are scraped for each one independently.

PS: see overview of the ports used by the Istio Envoy proxy: https://istio.io/latest/docs/ops/deployment/requirements/#ports-used-by-istio

I’d like to reactivate this subject. I think I catch the idea of merging metrics:

Let me proof that:

  1. I specify my app port and path for scraping, e.g.
annotations:
        prometheus.io/path: "/metrics"
        prometheus.io/port: "2112"    
  1. Istio Mutating Admission Webhook does the magic:
  • sets env var: ISTIO_PROMETHEUS_ANNOTATIONS to
{"scrape":"","path":"/metrics","port":"2112"}
  • sets again prometheus.io annotations to:
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020" 

Ok, but that’s all. Now the “chaos” is coming:

Trying to curl the metrics from another workload in the mesh ends with:

/ $ curl http://prombin:15020/stats/prometheus
curl: (56) Recv failure: Connection reset by peer

where prombin is a simple server generating counter metric on 0.0.0.0:2112/metrics endpoint.

The istio-sidecar log on the client/downstream side:

[2022-07-21T08:44:32.313Z] "- - -" 0 UH - - "-" 0 0 0 - "-" "-" "-" "-" "-" BlackHoleCluster - 100.113.27.186:15020 100.113.24.220:35186 - -

Looking at the listeners of the upstream (prom-bin) gives no result:

istioctl pc listener -n test-istio deploy/prombin | grep 15020

Looking at clusters on the client/downstream side:

istioctl pc cluster -n test-istio deploy/sleep | grep -E '2112|15020' 
prombin.test-istio.svc.cluster.local                                         2112      -          outbound      EDS 

So the question is: Could anyone explain to me how to use merged metrics?

EDITED:
I think I got it but I do not understand it yet:
If I curl via POD IP from outside the mesh (in STRICT mode) I get the metrics:

curl http://100.113.25.28:15020/stats/prometheus | grep mlk
# HELP mlk_counter The total number of processed events
# TYPE mlk_counter counter
mlk_counter 2392

That’s what I meant.

Now I must dig into iptables and envoy config_dump to understand the rest of that.

I am using the bitnami prometheus operator and istio sidecar injection based upon namespace annotations. After half a day of debugging I have made 3 discoveries about this configuration.

  1. Sidecar injector exports port 15090. That’s not useful because it only has the istio-proxy metrics - it does not have merged metrics.

  2. So I saved 1 pod of my service in a file and deleted it from kubernetes, then i re-launched it and I added an export to istio-proxy for port 15020. Now that I’ve done that, I can look at both 15090 and 15020 side-by-side but unfortunately, they are exactly identical. No merging is happening!

  3. So after reading many web pages including this one, i realized I was missing ISTIO_PROMETHEUS_ANNOTATIONS in my istio-proxy. Both of the operative config lines are included for completeness:

    - name: ISTIO_META_POD_PORTS
      value: [  {"name":"http","containerPort":9538,"protocol":"TCP"}  ]
    - name: ISTIO_PROMETHEUS_ANNOTATIONS
      value: '{"scrape":"true","path":"/metrics","port":"9538"}'
    

This looks truly weird - you really do have to make BOTH declarations and the second ‘ISTIO_PROMETHEUS_ANNOTATIONS’ is the one that will get istio-agent to merge the metrics. And, I think my sidecar injector is virtually useless unless I can teach it to perform steps (2) and (3) on every pod in my system !!

Best wishes and good luck!

After further testing, if you add the annotations below to your Deployment, it will indeed merge the metrics properly.

spec:
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "9538"
        prometheus.io/scrape: "true"

When the istio-proxy sidecar is created, these annotations result in a pod that looks like this :

metadata:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: '15090'
    prometheus.io/scrape: 'true'
    sidecar.istio.io/status: >-
      {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}
spec:
  volumes:
  containers:
    - name: istio-proxy
      image: docker.io/istio/proxyv2:1.16.2
      args:
        - proxy
        - sidecar
        - '--domain'
        - $(POD_NAMESPACE).svc.cluster.local
        - '--proxyLogLevel=warning'
        - '--proxyComponentLogLevel=misc:error'
        - '--log_output_level=default:info'
        - '--concurrency'
        - '2'
      ports:
        - name: http-envoy-prom
          containerPort: 15090
          protocol: TCP

And sadly, it STILL does not export an endpoint for 15020, even though it’s scraping and merging metrics! You have to add further hacks to get what you want!

I am looking into ways to get the endpoint exported properly. We are using bitnami prometheus operator and tetrate-istio charts.

One way is to hand-edit the istio-sidecar-injection ConfigMap and change the containerPort from 15090 to 15020 (it’s at roughly line #346 of the istio-sidecar-inujection/istio-system Configmap - the first template is used because of ’ defaultTemplates: [sidecar]’ near the top of the configmap).