We’re attempting to setup multicluster (replicated control plane), but hitting an issue with the SNI proxy (tcp/15443) in the ingress gateway.
The client istio-proxy connects to ingress, sends TLS Client Hello (with SNI), and ingress send an ACKs for the Client Hello. But instead of ingress then sending back Server Hello with a certificate, it issues a TCP RST. Client proxy then tries another two times before failing and returning a 503
k8s: v1.18.3
Istio: v1.7.0
Rancher: v2.4.5
We’re following https://istio.io/latest/docs/setup/install/multicluster/gateways (including sample cacerts). Manifests have been installed using istioctl against values-istio-multicluster-gateways.yaml
External access is via NodePort
Port: tls 15443/TCP
TargetPort: 15443/TCP
NodePort: tls 30163/TCP
Endpoints: 10.42.0.72:15443
Ingress appears to be pulling CA certificates from SDS correctly (and confirmed the data decodes back to the sample cacerts)
2020-09-05T01:18:57.866787Z info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?limit=500&resourceVersion=0 200 OK in 4 milliseconds
2020-09-05T01:18:57.866803Z info Response Headers:
2020-09-05T01:18:57.866806Z info Audit-Id: 677db825-6cd6-4e9b-83f5-e7a4b4d4312f
2020-09-05T01:18:57.866809Z info Cache-Control: no-cache, private
2020-09-05T01:18:57.866811Z info Content-Type: application/json
2020-09-05T01:18:57.866813Z info Date: Sat, 05 Sep 2020 01:18:57 GMT
2020-09-05T01:18:57.867035Z info Response Body: {“kind”:“SecretList”,
…
2020-09-05T01:18:57.868317Z warn secretfetcher failed load server cert/key pair from secret cacerts: server cert or private key is empty
2020-09-05T01:18:57.868896Z info GET https://10.43.0.1:443/api/v1/namespaces/istio-system/secrets?allowWatchBookmarks=true&resourceVersion=2497786&timeout=5m19s&timeoutSeconds=319&watch=true 200 OK in 0 milliseconds
2020-09-05T01:18:57.868905Z info Response Headers:
2020-09-05T01:18:57.868908Z info Cache-Control: no-cache, private
2020-09-05T01:18:57.868910Z info Content-Type: application/json
2020-09-05T01:18:57.868912Z info Date: Sat, 05 Sep 2020 01:18:57 GMT
2020-09-05T01:18:57.961725Z debug caches populated
2020-09-05T01:18:57.962040Z info sds SDS gRPC server for workload UDS starts, listening on “./etc/istio/proxy/SDS”
2020-09-05T01:18:57.962120Z info sds SDS gRPC server for gateway controller starts, listening on “./var/run/ingress_gateway/sds”
2020-09-05T01:18:57.962162Z info Starting proxy agent
2020-09-05T01:18:57.962127Z info sds Start SDS grpc server
2020-09-05T01:18:57.962202Z info Opening status port 15020
2020-09-05T01:18:57.962294Z info Received new config, creating new Envoy epoch 0
2020-09-05T01:18:57.962351Z info Epoch 0 starting
2020-09-05T01:18:57.962367Z info sds Start SDS grpc server for ingress gateway proxy
2020-09-05T01:18:57.967501Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.42.0.72~istio-ingressgateway-85d8dd7994-28gbc.istio-system~istio-system.svc.cluster.local --local-address-ip-version v4 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error]
2020-09-05T01:18:58.004271Z warning envoy runtime Unable to use runtime singleton for feature envoy.reloadable_features.activate_fds_next_event_loop
2020-09-05T01:18:58.031737Z warning envoy config StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2020-09-05T01:18:58.031767Z warning envoy config Unable to establish new stream
2020-09-05T01:18:58.039132Z info sds resource:default new connection
2020-09-05T01:18:58.039201Z info sds Skipping waiting for gateway secret
2020-09-05T01:18:58.219285Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-09-05T01:18:58.219308Z info cache GenerateSecret default
2020-09-05T01:18:58.219764Z info sds resource:default pushed key/cert pair to proxy
2020-09-05T01:18:58.221865Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2020-09-05T01:18:58.604333Z info sds resource:ROOTCA new connection
2020-09-05T01:18:58.604417Z info sds Skipping waiting for gateway secret
2020-09-05T01:18:58.604442Z info cache Loaded root cert from certificate ROOTCA
2020-09-05T01:18:58.604601Z info sds resource:ROOTCA pushed root cert to proxy
2020-09-05T01:18:59.039710Z info Envoy proxy is ready
Client shows the 503 to one of the cluster nodes (172.25.22.20)
[2020-09-05T01:54:45.307Z] "HEAD /headers HTTP/1.1" 503 UF,URX "-" "-" 0 0 18 - "-" "curl/7.69.1" "3f4dd68f-c346-4ade-a69f-7a7771a5d7f8" "httpbin.bar.global:8000" "172.25.22.20:30163" outbound|8000||httpbin.bar.global - 240.0.0.2:8000 10.42.2.52:59116 - default
Ingress is showing the connection received, but NR state
2020-09-05T01:54:43.037907Z trace envoy connection [C1096] write ready
[2020-09-05T01:54:45.309Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:40152 - -
[2020-09-05T01:54:45.322Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:11885 - -
[2020-09-05T01:54:45.325Z] “- - -” 0 NR “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.42.0.72:15443 172.25.22.20:58423 - -
2020-09-05T01:54:53.038229Z trace envoy http [C4] message complete
Wireshark shows the SNI being set as outbound_.8000_._.httpbin.bar.global
Ingress shows what look like correct paths
SERVICE FQDN PORT SUBSET DIRECTION TYPE DESTINATION RULE
BlackHoleCluster - - - STATIC
agent - - - STATIC
default-http-backend.ingress-nginx.svc.cluster.local 80 - outbound EDS
httpbin.bar.svc.cluster.local 8000 - outbound EDS
httpbin.unite-core-stage1.svc.cluster.local 8000 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-egressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 80 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 443 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15021 - outbound EDS
istio-ingressgateway.istio-system.svc.cluster.local 15443 - outbound EDS
istiocoredns.istio-system.svc.cluster.local 53 - outbound EDS
istiod.istio-system.svc.cluster.local 443 - outbound EDS
istiod.istio-system.svc.cluster.local 853 - outbound EDS
istiod.istio-system.svc.cluster.local 15010 - outbound EDS
istiod.istio-system.svc.cluster.local 15012 - outbound EDS
istiod.istio-system.svc.cluster.local 15014 - outbound EDS
kube-dns.kube-system.svc.cluster.local 53 - outbound EDS
kube-dns.kube-system.svc.cluster.local 9153 - outbound EDS
kubernetes.default.svc.cluster.local 443 - outbound EDS
metrics-server.kube-system.svc.cluster.local 443 - outbound EDS
outbound_.15010_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15012_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15014_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.15021_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.15443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.15443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.443_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.443_..kubernetes.default.svc.cluster.local - - - EDS
outbound.443_..metrics-server.kube-system.svc.cluster.local - - - EDS
outbound.53_..istiocoredns.istio-system.svc.cluster.local - - - EDS
outbound.53_..kube-dns.kube-system.svc.cluster.local - - - EDS
outbound.8000_..httpbin.bar.svc.cluster.local - - - EDS
outbound.8000_..httpbin.unite-core-stage1.svc.cluster.local - - - EDS
outbound.80_..default-http-backend.ingress-nginx.svc.cluster.local - - - EDS
outbound.80_..istio-egressgateway.istio-system.svc.cluster.local - - - EDS
outbound.80_..istio-ingressgateway.istio-system.svc.cluster.local - - - EDS
outbound.80_..sleep.unite-core-stage1.svc.cluster.local - - - EDS
outbound.853_..istiod.istio-system.svc.cluster.local - - - EDS
outbound.9153_._.kube-dns.kube-system.svc.cluster.local - - - EDS
prometheus_stats - - - STATIC
sds-grpc - - - STATIC
sleep.unite-core-stage1.svc.cluster.local 80 - outbound EDS
xds-grpc - - - STRICT_DNS
zipkin - - - STRICT_DNS
I must be missing something obvious here. I would guess one of the following:
- The SNI gateway is not actually running in TLS mode
- The client is meant to be mapping the .global hostname to svc.cluster.local when populating the SNI field in the Client Hello
- The ingress is meant to have a route for the .global path (I’ve been assuming it’s doing automatic translation)
Any ideas on how to troubleshoot this further?
-Neil