Unable to use istio authz policy alongside prometheus

Hi all,

Recently I decided to try out istio’s AuthorizationPolicy to lock down one of my workloads in kubernetes. We monitor this cluster with a standalone Prometheus using the prometheus operator. Once I enabled the authorization policy, I was unable to get the operator to work. In particular, no matter what I put into the rules, as long as a I had a default deny rule in place, it would fail with a 403.

E.g. I tried this:

spec:
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account
    - source:
        namespaces:
        - prometheus
    to:
    - operation:
        methods:
        - GET
        - POST
        - OPTIONS
  selector:
    matchLabels:
      app: my-app

The ingress works fine, but prometheus is blocked. I figured that maybe this was because I didn’t have a sidecar installed in prometheus. Doing so didn’t help. After reading some more, I stumbled upon Istio / Prometheus

Doing this didn’t really help. All it did was convert the 403s into no route logs like this:

[my-pod-8db45bf9-5jg2r istio-proxy] [2022-11-21T18:41:18.577Z] "- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - - 10.1.134.107:8888 10.1.134.119:55546 my-pod.my-namespace.svc.cluster.local - 

Looking at the underlying envoy listeners, it looks like it installs filters that only accept requests from istio’s non-standard ALPNs. If that’s the case, I don’t see how the approach suggested above could ever work, since the point of it is to bypass the istio sidecar’s handling of the client-side TLS bits of the handshake entirely. Without prometheus knowing to use a non-standard ALPN (for which I could not find a setting), I don’t see any options to make it work.

Has anybody tried a relatively recent prometheus + istio combo and had it work when communicating with a sidecar enforcing authz policy?

For reference:

client version: 1.15.1
control plane version: 1.15.1
data plane version: 1.15.1 (31 proxies)

Prometheus operator version 0.61.0 (I tried 0.51.1 earlier as well).

Thanks

Reading more on this, I stumbled on ALPN filter incorrectly applies to non-Istio TLS traffic · Issue #40680 · istio/istio · GitHub. This makes me wonder: how the heck is the sidecar supposed to understand this TLS traffic is not in fact intended for the application itself vs intended to be terminated by the proxy? Presumably that’s why the ALPN filter in the first place. So… how the heck could this ever work? Is there a better solution here with modern istio? I suspect the method provided in the istio doc link no longer functions at all. I think it came from Secure communication between Prometheus and Istio components · Issue #7352 · istio/istio · GitHub, which is quite old relatively speaking (2018).

Hello, this is how I used prometheus operator: check my github for the configuration

Hi, thanks for your response.

I’m not having any troubling scraping the sidecar (envoy) stats. Rather, I’m having trouble scraping from the application container. And this only happens once I enable istio AuthorizationPolicy on the application pod.

I do not see anything in your yaml to suggest that you have attempted to integrate prometheus into the service mesh, beyond hooking it up to probe the sidecars. Have you done so?

Did you tell istio where to scrape. Provide the port and path of the metrics to scrape like this. If it is an https endpoint, provide the scheme.

Yes. Prometheus scraping worked properly until I enabled istio authorization.

Let me tag people that can be of help

@Tomas_Kohout
@vadimeisenbergibm
@YangminZhu

I don’t have the original reference anymore, but for me the solution was to move the service prometheus scrapes to a headless service. The rationale is prometheus tries to connect to each application’s pod directly and istio won’t allow because, by default, it won’t know the pods IPs, only the cluster IP. So, puting the /metrics on a headless service makes istio know about the individual pods IP addresses and allows the communication.

Also, for me, it was necessary to set the service as tcp. Using http didn’t work and I really didn’t investigate further…

dmz-gateway2-lb        LoadBalancer   172.20.167.89    internal-XXXXXX.amazonaws.com   443:30201/TCP                                              26h
dmz-gateway2-metrics   ClusterIP      None             <none>                                                                             4111/TCP          

-------------------         
apiVersion: v1
kind: Service
metadata:
  labels:
    app: dmz-gateway2
    pvt-service: dmz-gateway2-metrics  <<<<<< diferentiates from the loadbalancer
  name: dmz-gateway2-metrics
  namespace: dmz-gateway
spec:
  clusterIP: None
  ports:
  - name: tcp-metrics
    port: 4111
    protocol: TCP
    targetPort: 4111
  selector:
    app: dmz-gateway2

-------------------------
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: dmz-gateway2
  name: dmz-gateway2
  namespace: dmz-gateway
spec:
  endpoints:
  - honorLabels: true
    interval: 30s
    path: /metrics
    port: tcp-metrics
  namespaceSelector:
    matchNames:
    - dmz-gateway
  selector:
    matchLabels:
      pvt-service: dmz-gateway2-metrics  <<< not using "app" selector here. Just "pvt-service"