Ingress gateway SNI proxy sending RST after Client Hello

ngarratt · September 5, 2020, 3:14am

We’re attempting to setup multicluster (replicated control plane), but hitting an issue with the SNI proxy (tcp/15443) in the ingress gateway.

The client istio-proxy connects to ingress, sends TLS Client Hello (with SNI), and ingress send an ACKs for the Client Hello. But instead of ingress then sending back Server Hello with a certificate, it issues a TCP RST. Client proxy then tries another two times before failing and returning a 503

k8s: v1.18.3
Istio: v1.7.0
Rancher: v2.4.5

We’re following https://istio.io/latest/docs/setup/install/multicluster/gateways (including sample cacerts). Manifests have been installed using istioctl against values-istio-multicluster-gateways.yaml

External access is via NodePort
Port: tls 15443/TCP
TargetPort: 15443/TCP
NodePort: tls 30163/TCP
Endpoints: 10.42.0.72:15443

Ingress appears to 2020-09-05T01:18:57.866787Z 2020-09-05T01:18:57.866803Z 2020-09-05T01:18:57.866806Z 2020-09-05T01:18:57.866809Z 2020-09-05T01:18:57.866811Z 2020-09-05T01:18:57.866813Z 2020-09-05T01:18:57.867035Z …
2020-09-05T01:18:57.868317Z 2020-09-05T01:18:57.868896Z 2020-09-05T01:18:57.868905Z 2020-09-05T01:18:57.868908Z 2020-09-05T01:18:57.868910Z 2020-09-05T01:18:57.868912Z 2020-09-05T01:18:57.961725Z 2020-09-05T01:18:57.962040Z 2020-09-05T01:18:57.962120Z 2020-09-05T01:18:57.962162Z 2020-09-05T01:18:57.962127Z 2020-09-05T01:18:57.962202Z 2020-09-05T01:18:57.962294Z 2020-09-05T01:18:57.962351Z 2020-09-05T01:18:57.962367Z 2020-09-05T01:18:57.967501Z 2020-09-05T01:18:58.004271Z 2020-09-05T01:18:58.031737Z 2020-09-05T01:18:58.031767Z 2020-09-05T01:18:58.039132Z 2020-09-05T01:18:58.039201Z 2020-09-05T01:18:58.219285Z 2020-09-05T01:18:58.219308Z 2020-09-05T01:18:58.219764Z 2020-09-05T01:18:58.221865Z 2020-09-05T01:18:58.604333Z 2020-09-05T01:18:58.604417Z 2020-09-05T01:18:58.604442Z 2020-09-05T01:18:58.604601Z 2020-09-05T01:18:59.039710Z be pulling CA certificates from SDS correctly (and confirmed the data decodes back to the sample cacerts)
info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?limit=500&resourceVersion=0 200 OK in 4 milliseconds
info Response Headers:
info Audit-Id: 677db825-6cd6-4e9b-83f5-e7a4b4d4312f
info Cache-Control: no-cache, private
info Content-Type: application/json
info Date: Sat, 05 Sep 2020 01:18:57 GMT
info Response Body: {“kind”:“SecretList”,
warn secretfetcher failed load server cert/key pair from secret cacerts: server cert or private key is empty
info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?allowWatchBookmarks=true&resourceVersion=2497786&timeout=5m19s&timeoutSeconds=319&watch=true 200 OK in 0 milliseconds
info Response Headers:
info Cache-Control: no-cache, private
info Content-Type: application/json
info Date: Sat, 05 Sep 2020 01:18:57 GMT
debug caches populated
info sds SDS gRPC server for workload UDS starts, listening on “./etc/istio/proxy/SDS”
info sds SDS gRPC server for gateway controller starts, listening on “./var/run/ingress_gateway/sds”
info Starting proxy agent
info sds Start SDS grpc server
info Opening status port 15020
info Received new config, creating new Envoy epoch 0
info Epoch 0 starting
info sds Start SDS grpc server for ingress gateway proxy
info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.42.0.72~istio-ingressgateway-85d8dd7994-28gbc.istio-system~istio-system.svc.cluster.local --local-address-ip-version v4 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error]
warning envoy runtime Unable to use runtime singleton for feature envoy.reloadable_features.activate_fds_next_event_loop
warning envoy config StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
warning envoy config Unable to establish new stream
info sds resource:default new connection
info sds Skipping waiting for gateway secret
info cache Root cert has changed, start rotating root cert for SDS clients
info cache GenerateSecret default
info sds resource:default pushed key/cert pair to proxy
warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
info sds resource:ROOTCA new connection
info sds Skipping waiting for gateway secret
info cache Loaded root cert from certificate ROOTCA
info sds resource:ROOTCA pushed root cert to proxy
info Envoy proxy is ready

Client shows the 503 to one of the cluster nodes (172.25.22.20)
[2020-09-05T01:54:45.307Z] "HEAD /headers HTTP/1.1" 503 UF,URX "-" "-" 0 0 18 - "-" "curl/7.69.1" "3f4dd68f-c346-4ade-a69f-7a7771a5d7f8" "httpbin.bar.global:8000" "172.25.22.20:30163" outbound|8000||httpbin.bar.global - 240.0.0.2:8000 10.42.2.52:59116 - default

Ingress is showing the connection received, but NR state
2020-09-05T01:54:43.037907Z trace envoy connection [C1096] write ready
[2020-09-05T01:54:45.309Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:40152 - -
[2020-09-05T01:54:45.322Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:11885 - -
[2020-09-05T01:54:45.325Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:58423 - -
2020-09-05T01:54:53.038229Z trace envoy http [C4] message complete

Wireshark shows the SNI being set as outbound_.8000_._.httpbin.bar.global

Ingress shows what look like correct paths
SERVICE FQDN PORT SUBSET DIRECTION TYPE DESTINATION RULE
BlackHoleCluster - - - STATIC
agent - - - STATIC
default-http-backend.ingress-nginx.svc.cluster.local 80 - outbound EDS
httpbin.bar.svc.cluster.local 8000 - outbound EDS
httpbin.unite-core-stage1.svc.cluster.local 8000 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15021 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istiocoredns.istio-system.svc.cluster.local 53 - outbound EDS
istiod.istio-system.svc.cluster.local 443 - outbound EDS
istiod.istio-system.svc.cluster.local 853 - outbound EDS
istiod.istio-system.svc.cluster.local 15010 - outbound EDS
istiod.istio-system.svc.cluster.local 15012 - outbound EDS
istiod.istio-system.svc.cluster.local 15014 - outbound EDS
kube-dns.kube-system.svc.cluster.local 53 - outbound EDS
kube-dns.kube-system.svc.cluster.local 9153 - outbound EDS
kubernetes.default.svc.cluster.local 443 - outbound EDS
metrics-server.kube-system.svc.cluster.local 443 - outbound EDS
outbound_.15010_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15012_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15014_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15021_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.15443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.15443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.443_..kubernetes.default.svc.cluster.local - - - EDS
outbound.443_..metrics-server.kube-system.svc.cluster.local - - - EDS
outbound.53_..istiocoredns.istio-system.svc.cluster.local - - - EDS
outbound.53_..kube-dns.kube-system.svc.cluster.local - - - EDS
outbound.8000_..httpbin.bar.svc.cluster.local - - - EDS
outbound.8000_..httpbin.unite-core-stage1.svc.cluster.local - - - EDS
outbound.80_..default-http-backend.ingress-nginx.svc.cluster.local - - - EDS
outbound.80_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.80_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.80_..sleep.unite-core-stage1.svc.cluster.local - - - EDS
outbound.853_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.9153_._.kube-dns.kube-system.svc.cluster.local - - - EDS
prometheus_stats - - - STATIC
sds-grpc - - - STATIC
sleep.unite-core-stage1.svc.cluster.local 80 - outbound EDS
xds-grpc - - - STRICT_DNS
zipkin - - - STRICT_DNS

I must be missing something obvious here. I would guess one of the following:

The SNI gateway is not actually running in TLS mode
The client is meant to be mapping the .global hostname to svc.cluster.local when populating the SNI field in the Client Hello
The ingress is meant to have a route for the .global path (I’ve been assuming it’s doing automatic translation)

Any ideas on how to troubleshoot this further?

-Neil

blackenz · October 15, 2020, 5:06pm

Hi!
Have you managed to fix your problem?
I am having similar issues, after upgrading my clusters from istio 1.5.x to 1.7.3. Right before the upgrade, everything was running smoothly with our own CA’s.

I have managed to run ksniff on the cluster, and the behaviour is exactly the same as yours… After the Client Hello we see a TCP ACK from the IngressGateway and right after a RST,ACK

Thanks

ngarratt · October 15, 2020, 8:11pm

Our issue was a mistake within the Rancher-provided helm chart for Istio (which was itself caused by an incorrect documentation update on the Istio side for v1.5). The end result was that the environment variable ISTIO_META_ROUTER_MODE was being set to ‘standard’ instead of ‘sni-dnat’ inside the ingress.

The behavior of the ingress is that it will reset the socket if it doesn’t have a listener for the SNI being presented. I haven’t looked through the code, but assume ‘sni-dnat’ is responsible for translating the synthetic .global address into .svc.cluster.local before the listener and route are evaluated.

We added “gateways.istio-ingressgateway.env.ISTIO_META_ROUTER_MODE=sni-dnat” to helm variables to fix it.

blackenz · October 15, 2020, 8:58pm

Unfortunately that doesn’t seem to be our case…
I’ve checked the environment variables in the pod and they seem to be correct.
Have any other clues where the issue might be?

We even tried to purge all istio configurations and cleanly reinstall 1.7.3, but the issue prevails…

I wished istio could be more “debug” friendly than it actually is…

Thank You

sands6 · December 8, 2020, 1:29am

@blackenz Have you managed to solve your problem with SNI proxy port(15443)?

sands6 · December 8, 2020, 3:37am

@ngarratt Can you share listener config for SNI port(15443) on Istio Gateway? I am running into exact issue on v1.6.9

Topic		Replies	Views
Istio ingress gateway support tls without SNI Security	6	3185	September 18, 2023
istio-Multi Cluster issue with ingress gateway	8	4196	February 17, 2020
Istio ingress gateway with tls mode PASSTHROUGH Networking	1	2095	March 30, 2021
Istio multi cluster ISTIO_MUTUAL fails on client cert validate	1	1478	October 10, 2023
I have tried to use tls passthrough with istio controller and k8s ingress , it does not work but with Gateway and VirtualServce it works. ere is the ingress YAML Networking	0	455	December 28, 2021

Ingress gateway SNI proxy sending RST after Client Hello

Related topics