Truncated TCP connection on AWS

I have a k8s multi cluster setup with shared control plane mode on AWS and I have a persistent TCP connection from primary cluster to the remote clusters.

The problem is that these TCP connections are truncated after a few minutes when idle, but from the container that starts the connection, it is still on ESTABLISHED status, so it can’t be recreated at application level.

It looks like the connection between the two envoys is truncated and it is not recreated.

We tried with the following destination rule to set better tcpKeepalive:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: testrule
spec:
  host: remote-service.test.svc.cluster.local
  trafficPolicy:
	connectionPool:
	  tcp:
		maxConnections: 10000
		connectTimeout: 20s
		tcpKeepalive:
		  time: 20s
		  interval: 75s

This workaround seems to solve the problem, but is this the best way to fix it?

If you’re using NLB in the path, that has a hardcoded 350ms timeout on idle… keep alive is the right way to get around that