Multi-cluster communication with custom hostnames and HTTPS requests fails
We are migrating a legacy system to istio service mesh in kubernetes that uses consul DNS. Services within the mesh have to be addressed in the format of e.g. https://app.service.consul
. Running this request within Istio service mesh has some challenges. While we managed to make it work with service entries
and a custom envoy filter
, we are now stuck making it work in a istio multi-cluster setup when calling the service in the external cluster. Below is a simplified diagram of how the service is called for reference.
Request from client app serivce when trying to reach the external cluster service fails:
curl https://static-server-https.service.consul
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443
# but should be
hello world from clsuter-b
Verbose output:
curl https://static-server-https.service.consul
* Trying 240.240.0.1:443...
* Connected to static-server-https.service.consul (240.240.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/certs/ca/ca-bundle.pem
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443
Istio proxy outbound log to external cluster:
[2022-01-25T10:09:59.177Z] "- - -" 0 UF,URX - - "-" 0 0 10000 - "-" "-" "-" "-" "35.157.89.251:443" outbound|443||static-server-https.service.consul - 240.240.0.2:443 100.96.7.214:47428 - -
Request from client app serivce when trying to reach the internal cluster service succeeds:
curl https://static-server-https.service.consul
hello world from cluster-a
Istio proxy outbound log to internal cluster:
[2022-01-25T09:45:54.381Z] "- - -" 0 - - - "-" 102 191 8 - "-" "-" "-" "-" "100.70.235.145:443" outbound|443||static-server-https.service.consul 100.96.7.214:48320 240.240.0.2:443 100.96.7.214:55200 - -
Notes on istio proxy outbound logs
When comparing the istio proxy outbound logs, they look quite different from when using istio’s out of the box multi cluster communication. The following is an example log from the standard configuration, when using http://static-server-http.default.svc.cluster.local:8080
with response code 200:
[2022-01-25T10:02:40.334Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 29 7 6 "-" "curl/7.81.0-DEV" "7a46868c-7d22-9f79-89e5-b4b04835bb35" "static-server-http:8080" "18.195.240.125:15443" outbound|8080||static-server-http.default.svc.cluster.local 100.96.7.214:35468 100.96.7.223:8080 100.96.7.214:44864 - default
Compared to the log with response code 200 when using https://static-server-https.service.consul
:
[2022-01-25T09:45:54.381Z] "- - -" 0 - - - "-" 102 191 8 - "-" "-" "-" "-" "100.70.235.145:443" outbound|443||static-server-https.service.consul 100.96.7.214:48320 240.240.0.2:443 100.96.7.214:55200 - -
In my personal view, this might be due to the custom envoy filter, but not sure at this point how we should interpret the difference as of now.
Notes on kiali service representation
We also found that kiali renders the graph of the https
configured connection with tcp
instead of http
. It is not known as of now why this is the case and if it is relevant.
Notes on the setup
The setup is based on the multi-primary setup from the official documentation. I also have to note here that our setup is based on AWS infrastructure, which requires to supply the network loadbalancer ip addresses instead of its host names.
The east-est gateway for cluster communication is modified to allow traffic over the consul hostnames:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
istio: eastwestgateway
servers:
- port:
number: 15443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
- "*.service.consul"
The target service is deployed with a service entry that configures the AWS NLB IP addresses as endoints. We also have a custom envoy filter to allow the use of https
in our requests.
apiVersion: v1
kind: Service
metadata:
name: static-server-https
namespace: default
spec:
selector:
app: static-server-https
ports:
- name: https
protocol: TCP
port: 443
targetPort: 8080
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: static-server-https
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: static-server-https
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: static-server-https
template:
metadata:
name: static-server-https
labels:
app: static-server-https
annotations:
# the certificates for the custom envoy https filter
sidecar.istio.io/userVolumeMount: '[{"name":"consul-external-cert", "mountPath":"/etc/certs/consul-external", "readonly":true},{"name":"root-ca", "mountPath":"/etc/certs/ca", "readonly":true}]'
sidecar.istio.io/userVolume: '[{"name":"consul-external-cert", "secret":{"secretName":"consul-wildcard"}},{"name":"root-ca", "secret":{"secretName":"ca-bundle"}}]'
spec:
containers:
- name: static-server-https
image: hashicorp/http-echo:latest
args:
- -text="hello world from {{.Values.mesh.cluster.this.name}}"
- -listen=:8080
ports:
- containerPort: 8080
name: http
serviceAccountName: static-server-https
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: static-server-https
namespace: default
spec:
hosts:
- static-server-https.service.consul
location: MESH_INTERNAL
ports:
- number: 443
name: https
protocol: TLS
resolution: DNS
endpoints:
# the AWS NLB addresses to reach the app in the external cluster
- address: {{.Values.mesh.cluster.external.gateway.address1}}
ports:
http: 15443
- address: {{.Values.mesh.cluster.external.gateway.address2}}
ports:
http: 15443
- address: {{.Values.mesh.cluster.external.gateway.address3}}
ports:
http: 15443
# the app endpoint for the local cluster
- address: static-server-https.default
subjectAltNames:
- "spiffe://cluster.local/ns/default/sa/static-server-https"
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: static-server-https
spec:
host: static-server-https.service.consul
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: custom-envoy-filter
spec:
configPatches:
- applyTo: FILTER_CHAIN
match:
context: SIDECAR_OUTBOUND
listener:
portNumber: 443
filterChain:
filter:
name: "*.service.consul"
patch:
operation: MERGE
value:
transport_socket:
name: tls
typed_config:
"@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext"
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/certs/consul-external/tls.crt
private_key:
filename: /etc/certs/consul-external/tls.key
validation_context:
trusted_ca:
filename: /etc/certs/ca/ca.crt