Ingress gateway SNI proxy sending RST after Client Hello

We’re attempting to setup multicluster (replicated control plane), but hitting an issue with the SNI proxy (tcp/15443) in the ingress gateway.

The client istio-proxy connects to ingress, sends TLS Client Hello (with SNI), and ingress send an ACKs for the Client Hello. But instead of ingress then sending back Server Hello with a certificate, it issues a TCP RST. Client proxy then tries another two times before failing and returning a 503

k8s: v1.18.3
Istio: v1.7.0
Rancher: v2.4.5

We’re following https://istio.io/latest/docs/setup/install/multicluster/gateways (including sample cacerts). Manifests have been installed using istioctl against values-istio-multicluster-gateways.yaml

External access is via NodePort
Port: tls 15443/TCP
TargetPort: 15443/TCP
NodePort: tls 30163/TCP
Endpoints: 10.42.0.72:15443

Ingress appears to be pulling CA certificates from SDS correctly (and confirmed the data decodes back to the sample cacerts)
2020-09-05T01:18:57.866787Z info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?limit=500&resourceVersion=0 200 OK in 4 milliseconds
2020-09-05T01:18:57.866803Z info Response Headers:
2020-09-05T01:18:57.866806Z info Audit-Id: 677db825-6cd6-4e9b-83f5-e7a4b4d4312f
2020-09-05T01:18:57.866809Z info Cache-Control: no-cache, private
2020-09-05T01:18:57.866811Z info Content-Type: application/json
2020-09-05T01:18:57.866813Z info Date: Sat, 05 Sep 2020 01:18:57 GMT
2020-09-05T01:18:57.867035Z info Response Body: {“kind”:“SecretList”,

2020-09-05T01:18:57.868317Z warn secretfetcher failed load server cert/key pair from secret cacerts: server cert or private key is empty
2020-09-05T01:18:57.868896Z info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?allowWatchBookmarks=true&resourceVersion=2497786&timeout=5m19s&timeoutSeconds=319&watch=true 200 OK in 0 milliseconds
2020-09-05T01:18:57.868905Z info Response Headers:
2020-09-05T01:18:57.868908Z info Cache-Control: no-cache, private
2020-09-05T01:18:57.868910Z info Content-Type: application/json
2020-09-05T01:18:57.868912Z info Date: Sat, 05 Sep 2020 01:18:57 GMT
2020-09-05T01:18:57.961725Z debug caches populated
2020-09-05T01:18:57.962040Z info sds SDS gRPC server for workload UDS starts, listening on “./etc/istio/proxy/SDS”
2020-09-05T01:18:57.962120Z info sds SDS gRPC server for gateway controller starts, listening on “./var/run/ingress_gateway/sds”
2020-09-05T01:18:57.962162Z info Starting proxy agent
2020-09-05T01:18:57.962127Z info sds Start SDS grpc server
2020-09-05T01:18:57.962202Z info Opening status port 15020
2020-09-05T01:18:57.962294Z info Received new config, creating new Envoy epoch 0
2020-09-05T01:18:57.962351Z info Epoch 0 starting
2020-09-05T01:18:57.962367Z info sds Start SDS grpc server for ingress gateway proxy
2020-09-05T01:18:57.967501Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.42.0.72~istio-ingressgateway-85d8dd7994-28gbc.istio-system~istio-system.svc.cluster.local --local-address-ip-version v4 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error]
2020-09-05T01:18:58.004271Z warning envoy runtime Unable to use runtime singleton for feature envoy.reloadable_features.activate_fds_next_event_loop
2020-09-05T01:18:58.031737Z warning envoy config StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2020-09-05T01:18:58.031767Z warning envoy config Unable to establish new stream
2020-09-05T01:18:58.039132Z info sds resource:default new connection
2020-09-05T01:18:58.039201Z info sds Skipping waiting for gateway secret
2020-09-05T01:18:58.219285Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-09-05T01:18:58.219308Z info cache GenerateSecret default
2020-09-05T01:18:58.219764Z info sds resource:default pushed key/cert pair to proxy
2020-09-05T01:18:58.221865Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2020-09-05T01:18:58.604333Z info sds resource:ROOTCA new connection
2020-09-05T01:18:58.604417Z info sds Skipping waiting for gateway secret
2020-09-05T01:18:58.604442Z info cache Loaded root cert from certificate ROOTCA
2020-09-05T01:18:58.604601Z info sds resource:ROOTCA pushed root cert to proxy
2020-09-05T01:18:59.039710Z info Envoy proxy is ready

Client shows the 503 to one of the cluster nodes (172.25.22.20)
[2020-09-05T01:54:45.307Z] "HEAD /headers HTTP/1.1" 503 UF,URX "-" "-" 0 0 18 - "-" "curl/7.69.1" "3f4dd68f-c346-4ade-a69f-7a7771a5d7f8" "httpbin.bar.global:8000" "172.25.22.20:30163" outbound|8000||httpbin.bar.global - 240.0.0.2:8000 10.42.2.52:59116 - default

Ingress is showing the connection received, but NR state
2020-09-05T01:54:43.037907Z trace envoy connection [C1096] write ready
[2020-09-05T01:54:45.309Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:40152 - -
[2020-09-05T01:54:45.322Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:11885 - -
[2020-09-05T01:54:45.325Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:58423 - -
2020-09-05T01:54:53.038229Z trace envoy http [C4] message complete

Wireshark shows the SNI being set as outbound_.8000_._.httpbin.bar.global

Ingress shows what look like correct paths
SERVICE FQDN PORT SUBSET DIRECTION TYPE DESTINATION RULE
BlackHoleCluster - - - STATIC
agent - - - STATIC
default-http-backend.ingress-nginx.svc.cluster.local 80 - outbound EDS
httpbin.bar.svc.cluster.local 8000 - outbound EDS
httpbin.unite-core-stage1.svc.cluster.local 8000 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15021 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istiocoredns.istio-system.svc.cluster.local 53 - outbound EDS
istiod.istio-system.svc.cluster.local 443 - outbound EDS
istiod.istio-system.svc.cluster.local 853 - outbound EDS
istiod.istio-system.svc.cluster.local 15010 - outbound EDS
istiod.istio-system.svc.cluster.local 15012 - outbound EDS
istiod.istio-system.svc.cluster.local 15014 - outbound EDS
kube-dns.kube-system.svc.cluster.local 53 - outbound EDS
kube-dns.kube-system.svc.cluster.local 9153 - outbound EDS
kubernetes.default.svc.cluster.local 443 - outbound EDS
metrics-server.kube-system.svc.cluster.local 443 - outbound EDS
outbound_.15010_..istiod.istio-system.svc.cluster.local - - - EDS
outbound
.15012_..istiod.istio-system.svc.cluster.local - - - EDS
outbound
.15014_..istiod.istio-system.svc.cluster.local - - - EDS
outbound
.15021_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.15443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.15443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.443_..istiod.istio-system.svc.cluster.local - - - EDS
outbound
.443_..kubernetes.default.svc.cluster.local - - - EDS
outbound
.443_..metrics-server.kube-system.svc.cluster.local - - - EDS
outbound
.53_..istiocoredns.istio-system.svc.cluster.local - - - EDS
outbound
.53_..kube-dns.kube-system.svc.cluster.local - - - EDS
outbound
.8000_..httpbin.bar.svc.cluster.local - - - EDS
outbound
.8000_..httpbin.unite-core-stage1.svc.cluster.local - - - EDS
outbound
.80_..default-http-backend.ingress-nginx.svc.cluster.local - - - EDS
outbound
.80_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.80_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound
.80_..sleep.unite-core-stage1.svc.cluster.local - - - EDS
outbound
.853_..istiod.istio-system.svc.cluster.local - - - EDS
outbound
.9153_._.kube-dns.kube-system.svc.cluster.local - - - EDS
prometheus_stats - - - STATIC
sds-grpc - - - STATIC
sleep.unite-core-stage1.svc.cluster.local 80 - outbound EDS
xds-grpc - - - STRICT_DNS
zipkin - - - STRICT_DNS

I must be missing something obvious here. I would guess one of the following:

  1. The SNI gateway is not actually running in TLS mode
  2. The client is meant to be mapping the .global hostname to svc.cluster.local when populating the SNI field in the Client Hello
  3. The ingress is meant to have a route for the .global path (I’ve been assuming it’s doing automatic translation)

Any ideas on how to troubleshoot this further?

-Neil