TCP traffic to AWS MSK (Kafka) through egress gateway

Dear all

I’m trying to setup TCP communication from the Istio proxy sidecar to AWS MSK via Istio’s egress gateway. We were able to successfully setup this basic flow for HTTP/HTTPS traffic to www.istio.io as a test but numerous attempts to accomplish the same for connections to the zookeeper endpoints of AWS MSK are failing and I was hoping I can get some assistance from the community.

  • Istio version: 1.4.3
  • Kubernetes version : 1.15.0
  • We have a private AWS EKS cluster (eg: 10.106.20.0/24)
  • We have a private AWS MSK cluster (eg: 10.106.11.0/24)
  • We have a network policy that only allows DNS lookup and communication to Istio’s components (and temporarily access to 10.106.0.0/16)
  • We have set the global.outboundTrafficPolicy to REGISTRY_ONLY
  • We configure a service entry, virtual service and gateway resource (example below) and don’t use destination rules / subset as I would think this is not required if you don’t try to access multiple versions or proxy only portions (weight) of the traffic
  • Some of our attempts involved adding port 2181 (zookeeper) as a listener port to the istio-egressgateway component (pod/service) because by default it only listens on 80, 443 and 15443. This would allow us to keep the same port throughout every component
  • We have 3 AWS MSK Zookeeper strings but I will only list 1 as an example

Test case 1
We’ve tried to use port 2181 throughout the full chain (app–>sidecar–>egressgateway) by adding port 2181 to the and we see traffic floating from the sidecar to the egressgateway, where it fails with No healthy upstream as if the final destination route in the virtual host doesn’t work

istio-egressgateway-6cb6c46b9-64p96 istio-proxy [2020-04-14T17:55:01.625Z] "- - -" 0 UH "-" "-" 0 0 0 - "-" "-" "-" "-" "-" - - 10.106.20.209:2181 10.106.21.56:44626 - -

We got to this point by using following resources and I’m feeling this is the closest I can get to a working solution

---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: msk-se-z1
  namespace: msk
spec:
  hosts:
  - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
  addresses:
  - 10.106.11.0/24
  ports:
  - name: tcp
    number: 2181
    protocol: TCP
  exportTo:
  - "*"
  location: MESH_EXTERNAL
  resolution: NONE
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: msk-egressgateway
  namespace: msk
spec:
  selector:
    app: istio-egressgateway
  servers:
  - port:
      number: 2181
      name: tcp-2181
      protocol: TCP
    hosts:
    - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: msk-vs-z1
  namespace: msk
spec:
  exportTo:
  - "*"
  hosts:
  - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
  gateways:
  - mesh
  - msk-egressgateway
  tcp:
  - match:
    - gateways:
      - mesh
      destinationSubnets:
      - 10.106.11.0/24
      port: 2181
    route:
    - destination:
        host: istio-egressgateway.istio-system.svc.cluster.local
        port:
          number: 2181
  - match:
    - gateways:
      - msk-egressgateway
      port: 2181
    route:
    - destination:
        host: z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
        port:
          number: 2181
      weight: 100
---

Test case 2
We’ve tried the same as above, but instead of using the newly added 2181 port on the egress gateway, we tried to used a default port, like port 80. This time it our request is never processed by the egress gateway, instead we’re seeing following error on the application Packet len1213486160 is out of range!

The resources are almost identical as above, only this time we used port 80

---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: msk-se-z1
  namespace: msk
spec:
  hosts:
  - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
  addresses:
  - 10.106.11.0/24
  ports:
  - name: tcp
    number: 2181
    protocol: TCP
  exportTo:
  - "*"
  location: MESH_EXTERNAL
  resolution: NONE
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: msk-egressgateway
  namespace: msk
spec:
  selector:
    app: istio-egressgateway
  servers:
  - port:
      number: 80
      name: tcp-80
      protocol: TCP
    hosts:
    - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: msk-vs-z1
  namespace: msk
spec:
  exportTo:
  - "*"
  hosts:
  - z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
  gateways:
  - mesh
  - msk-egressgateway
  tcp:
  - match:
    - gateways:
      - mesh
      destinationSubnets:
      - 10.106.11.0/24
      port: 2181
    route:
    - destination:
        host: istio-egressgateway.istio-system.svc.cluster.local
        port:
          number: 80
  - match:
    - gateways:
      - msk-egressgateway
      port: 80
    route:
    - destination:
        host: z-1.msk-dev-cluster.9xhpez.c2.kafka.eu-west-1.amazonaws.com
        port:
          number: 2181
      weight: 100
---

We’ve tried so many things and it feels like there is something fundamental I’m not seeing or misunderstanding.For example: Is it allowed to created all of our resources (service entry, virtual service and gateway) in the namespaces of the application, which is different from the istio egressgateway namespace?

PS: I can also add that if we delete our network policy and delete the virtual services and gateways created above, we are able to access our kafka cluster, only this time the sidecar sends the request directly to the internet, which is the expected behaviour

Already a big thanks for pointing me in the right direction or share your own config if you’re connecting with external zookeeper services over TCP

Hi, I am working on a similar problem in trying to access the msk and I am trying to do get the http/https traffic flow working. Do you have a http/https traffic that works?

It depends, I’m under the impression TCP protocol (instead of HTTP) has to be configured for this flow (to MSK) to work. I’m still not able to route the traffic through an egress gateway so we’re sending the traffic out directly via the sidecar. In case SSL is enabled on your MSK cluster, you should be able to use HTTPS protocol to establish that connection (directly from the sidecar as well as via an egress gateway) but I haven’t tried it yet

Thanks @mrtnfchs for the reply. That helps. I have SSL enabled in the cluster and am trying to figure out the flow. Hope to reproduce the issue you are facing in my setup. Will keep you posted.

Thanks a ton! I’m looking forward to your findings!

Do you resolve problem?

Hey @mrtnfchs,

Were you able to solve the problem ? I was stuck in a similar one and able to get a solution. Please let us know if you too arrived at one and then we can discuss more here.