I have our WebApps and backend server deployed with the Istio-proxy Sidecar and I am seeing some issues with the Websocket connection reliability.
The Services are exposed via the GCP Kubernetes Ingress Load balancer with an idle timeout of 30 minutes. Our Web apps make a Websocket connection upgrade request to the backend server and the first time this succeeds. This connection stays up for about 30 minutes as configured and then is terminated by the load balancer.
Then the WebApp makes a new Websocket request, which is immediately terminated and this happens for several subsequent several 10s of requests. Eventually, a new OPTIONS http request is made and the Websocket connection upgrade request subsequent to that succeeds and stays up for another 30 minutes. This cycle repeats every 30 minutes or so after the idle timeout.
One thing I observed in all the response including the successful ones is a “content-length: 0” header, so suspected may be the load balancer terminates connections non deterministically post idle timeout. As a hack I added a custom EnvoyFilter to replace the header to fake the length to 1 “content-length: 1” but this did not help.
I didn’t get much insight from the Access logs on the istio-proxy or the load balancer. Any help or pointers will be much appreciated.
Istio version 1.6.3.
Note: The Ingress is hitting the Kubernetes Service directly and is not configured to go via a Gateway or a Virtual Service.