Hi,
We’re happy Istio users but occasionally suffering from a recurring issue where we lose connectively between applications running in the mesh after restarting one of the services (postgres-pooler
in this case).
When this happens the sidecar seems to be attempting to connect the main container to pods that no longer exist.
Here’s an example where we have two deployments:
-
xgckw
is working - the endpoints fromistioctl proxy-config
match those fromkubectl get endpointslice
. -
dhjdj
is not working - it is trying to connect to non-existent pods, via config provided by a differentistiod
, which doesn’t match the endpoints of the relevant KubernetesService
.
$ kubectl get endpointslice postgres-pooler-x9pvh -n prod
NAME ADDRESSTYPE PORTS ENDPOINTS AGE
postgres-pooler-x9pvh IPv4 5432 10.240.6.85,10.240.3.250 667d
$ istioctl proxy-status
NAME CLUSTER CDS LDS EDS RDS ISTIOD VERSION
deployment-7f78785784-xgckw.prod Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-6ffd54b448-v9nck 1.13.8
deployment-788b6d8c6d-dhjdj.prod Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-6ffd54b448-9fwv4 1.13.8
$ istioctl proxy-config endpoint -n prod deployment-7f78785784-xgckw
ENDPOINT STATUS OUTLIER CHECK CLUSTER
10.0.171.251:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
10.240.3.250:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
10.240.6.85:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
$ istioctl proxy-config endpoint -n prod deployment-788b6d8c6d-dhjdj
ENDPOINT STATUS OUTLIER CHECK CLUSTER
10.0.171.251:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
10.240.4.229:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
10.240.6.152:5432 HEALTHY OK outbound|5432||postgres-pooler.prod.svc.cluster.local
10.240.4.229
and 10.240.6.152
correspond to pods that no longer exist.
Does anyone have any ideas about how to debug this further or correct our setup? Any help would be greatly appreciated.