(Cross-post from my post in StackOverflow)
I have two different clusters (EKS, v1.18) with their own meshes (v1.9.0).
I have a Thanos deployment on cluster A and a Prometheus deployment on cluster B (with the thanos sidecar running too). The goal is to have thanos query these sidecars in remote clusters to proxy queries to each cluster (block persistence using S3 or similar is out of scope for this issue) via an internal load balancer (ELB classic)
The resources for Gateway, Virtual Service and Service are in place in cluster B, and I can run Thanos locally when connected to the network and connect to the sidecars in cluster B successfully using gRPC.
The ServiceEntry for the FQDN from cluster B has been created in cluster A, resolution works, routing is correct, but the deployment in cluster A can’t connect to cluster B.
Istio sidecars (from source workload, Thanos, in cluster A) show that the connection is being reset:
[2021-02-26T14:41:03.509Z] "POST /thanos.Store/Info HTTP/2" 0 - http2.remote_reset - "-" 5 0 4998 - "-" "grpc-go/1.29.1" "50912787-d528-994f-b8ad-78dd42081fea" "thanos.dev.integrations.internal.fqdn:10901" "-" - - 172.20.65.175:10901 172.30.9.174:37594 - default
I don’t see the incoming request in cluster B’s ingress gateway (I have a public one and a private one, I checked both just to be sure).
I have tried:
- Forcing upgrade of http1.1 to http2 using DR
- Forcing TLS to be disabled using DR
- Excluding private LB CIDR range to bypass proxy
Resources (Cluster A)
ServiceEntry:
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: thanos-integrations-dev
namespace: thanos
spec:
hosts:
- thanos.dev.integrations.internal.fqdn
location: MESH_EXTERNAL
ports:
- name: grpc-thanos-int-dev
number: 10901
protocol: GRPC
resolution: DNS
Resources (Cluster B)
Gateway:
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
annotations:
meta.helm.sh/release-name: istio-routing-layer
meta.helm.sh/release-namespace: istio-system
creationTimestamp: "2021-02-25T11:37:49Z"
generation: 3
labels:
app.kubernetes.io/instance: istio-routing-layer
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: istio-routing-layer
app.kubernetes.io/version: 0.0.1
helm.sh/chart: istio-routing-layer-0.0.1
name: thanos
namespace: istio-system
spec:
selector:
istio: internal-ingressgateway
servers:
- hosts:
- thanos.dev.integrations.internal.fqdn
port:
name: grpc-thanos
number: 10901
VirtualService:
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
annotations:
meta.helm.sh/release-name: istio-routing-layer
meta.helm.sh/release-namespace: istio-system
creationTimestamp: "2021-02-25T11:37:49Z"
generation: 3
labels:
app.kubernetes.io/instance: istio-routing-layer
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: istio-routing-layer
app.kubernetes.io/version: 0.0.1
helm.sh/chart: istio-routing-layer-0.0.1
spec:
gateways:
- thanos
hosts:
- thanos.dev.integrations.internal.fqdn
http:
- route:
- destination:
host: thanos-sidecar.prometheus.svc.cluster.local
port:
number: 10901
Service:
---
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: prometheus-thanos-istio
meta.helm.sh/release-namespace: prometheus
creationTimestamp: "2021-02-25T14:31:02Z"
labels:
app.kubernetes.io/instance: prometheus-thanos-istio
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: prometheus-thanos-istio
app.kubernetes.io/version: 0.0.1
helm.sh/chart: prometheus-thanos-istio-0.0.1
spec:
clusterIP: None
ports:
- name: grpc-thanos
port: 10901
protocol: TCP
targetPort: grpc
selector:
app: prometheus
component: server
sessionAffinity: None
type: ClusterIP