Hello!
We’re using custom, issued by our own CA, certificates in Istio.
During the last certificates changing one of our istio-ingress-gateway pods wasn’t restarted (it must be done for the correct work) due to human error. It led to the error messages in log:
{"level":"error","time":"2021-12-13T13:24:04.890986Z","scope":"xdsproxy","msg":"failed to create upstream grpc client: rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\""}
It’s ok and I know how to fix it - just do a restart of this pod and it’s all. But I noticed one interesting moment, the readiness probe of this pod is healthy .
To my mind, it’s not correct behaviour, cause in fact pod can’t produce customer’s traffic.
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15021
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
As a workaround, I wanted to create an alert about such behaviour of istio-ingress-gateway pod, but I didn’t find any istio metrics which can describe that pod isn’t healthy due to “a certificate signed by unknown authority” error.
If I missed smth, could somebody point me out to the right metric?
Version
istioctl version
client version: 1.8.2
control plane version: 1.8.2
data plane version: 1.8.2 (13 proxies)
kubectl version --short
Client Version: v1.16.1
Server Version: v1.16.9