Issue with HTTP communication from Prometheus to Alertmanager

guusvw · June 28, 2021, 5:00pm

Bug description

Since the upgrade from Istio 1.7 to 1.8 we’re seeing issues with specially the communication from our Prometheus to our Alertmanager (pushing alarms via HTTP).

Error message:

level=error ts=2021-05-31T07:16:45.210Z caller=notifier.go:527 component=notifier alertmanager=http://100.96.9.151:9093/api/v2/alerts count=1 msg="Error sending alert" err="bad response status 404 Not Found"

Both, Prometheus & Alertmanager are installed via the prometheus-operator and therefore two headless Kubernetes Service exist called prometheus-operated & alertmanager-operated , those services are hard-coded into the prometheus-operator and can not be changed.
Next to it each “installation” of those two components has an own service, which is not headless and under full-control by us. ( alertmanager-main & prometheus-k8s )

One thing we tried and also worked is to down scale the prometheus-operator to 0 and add the appProtocol tags on the -operated Kubernetes services with the value http which worked, but is overwritten, as long as the operator is not scaled to zero.

If we’re setting the appProtocol on the by us controlled service, it is not working out.

Any clue how to to fix it?

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

Istio version:

➜ istioctl version --remote
client version: 1.10.0
control plane version: 1.8.5
data plane version: 1.8.5 (78 proxies)

Kubernetes version:

➜ kubectl version --short
Client Version: v1.19.7
Server Version: v1.19.10

How was Istio installed?

via the istio-operator also in version 1.8.5

Environment where the bug was observed (cloud vendor, OS, etc)

Running on a kOps cluster on AWS.

Dmitriy_Rozentsvay · July 29, 2021, 2:59pm

We are seeing the same issue.
@guusvw let me know if you managed to find acceptable solution

deepak_deore · August 4, 2021, 2:31pm

alertmanager stopped listening on localhost seems causing this issue, check out this https://github.com/prometheus-operator/prometheus-operator/pull/4038

you can set alertmanager.alertmanagerSpec.listenLocal=True in alertmanager CR to make it listen on localhost

istio 1.9 and lower versions require app to listen on localhost, 1.10 and higher doesnt

lesh366 · August 17, 2021, 12:19pm

Same here,
when activating listenLocal=true, Prom post to alerts return 503:
level=error ts=2021-08-17T12:11:54.179Z caller=notifier.go:527 component=notifier alertmanager=http://172.31.36.94:9093/api/v2/alerts count=1 msg=“Error sending alert” err=“bad response status 503 Service Unavailable”

and showing → “upstream connect error or disconnect/reset before headers. reset reason: connection failure”

when disabling listenLocal, alerts return 404:

“POST /api/v2/alerts HTTP/1.1” 404 NR route_not_found - “-” 0 0 0 - “-” “Prometheus/2.24.0” “d673b25b-ead0-429a-8f59-d881c1804ab5” “172.31.36.35:9093” “-” - - 172.31.36.35:9093 172.31.36.94:34952 - -

level=error ts=2021-08-17T11:45:22.465Z caller=notifier.go:527 component=notifier alertmanager=http://172.31.36.35:9093/api/v2/alerts count=1 msg=“Error sending alert” err=“bad response status 404 Not Found”

the only workaround is to disable istio sidecar injection inside monitoring namespace

Please help!

deepak_deore · August 17, 2021, 2:42pm

if you have prometheus operator + strict mtls set within your monitoring namespace, then it needs extra work, check [kube-prometheus-stack] tlsConfig support for servicemonitors and default alertingEndpoints to work with istio strict mtls · Issue #145 · prometheus-community/helm-charts · GitHub

deepak_deore · August 30, 2021, 4:26pm

with 1.10 and 1.11, alertmanager breaks, one workaround i have found is to make alertmanager listen on local only and use sidecar to route traffic to localhost 9093

apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: alertmanager
spec:
  workloadSelector:
    labels:
      alertmanager: prom-kube-prometheus-stack-alertmanager
  ingress:
  - port:
      number: 9093
      protocol: TCP
      name: tcp
    defaultEndpoint: 127.0.0.1:9093

I am still trying to figure out how to make it work without this with 1.10+ istio with alertmanager listening on all IPs instead of localhost only

so it worked after removing duplicate service, i have 2 services for alertmanager alertmanager-operated and prom-kube-prometheus-stack-alertmanager, i removed alertmanager-operated and now everything is working, i had to delete this service 3-4 times, looks like operator was recreating it

looks like some conflict happened due to 2 services for same 9093 port

lesh366 · September 1, 2021, 11:31am

Thanks! Will check it out
listenLocal should be true also for Prometheus or for alertmanager only?

still, hope that istio will solve this properly without all those workarounds

deepak_deore · September 1, 2021, 12:02pm

i made listenLocal true only for alertmanager

actually prometheus operator helm should come with all these options

FeBus · November 17, 2021, 4:32pm

None of the above solutions did not work for me using kube-prometheus-stack on Istio 1.11.4

The trick that fixed everything was apparently changing the alertmanager port name from web to http-web via alertmanager.alertmanagerSpec.portName.

I trust this made Istio identify the protocol correctly (Istio / Protocol Selection) and the Chart propagating the port name where needed.

Topic		Replies	Views
Prometheus alerting on Istio Components	30	9964	October 8, 2021
Istio and Prometheus integration fails	1	490	February 1, 2023
ServiceMonitors needed for using Istio 1.7 with Prometheus Operator on Kubernetes 1.19	9	6608	June 8, 2021
503 between pod to pod communication (1.5.1)	23	12363	October 12, 2021
BYO Prometheus with mTLS Policies and Telemetry	18	6331	December 21, 2020

Issue with HTTP communication from Prometheus to Alertmanager

Related topics