We have a downstream server that runs Node v8 and connects to mesh external hosts via TLS. We are using the keep alive (KA) agent in Node with a 60s initial delay (keepAliveMsecs
). We are trying to figure out how we can maintain a HTTP keep alive connection to avoid the overhead of opening new TLS sockets with Istio in the mix.
Our hope was that it would be transparent and we could remove the KA agent in our code and shove the responsibility of connection pooling to the proxy, but we have found that might not be possible due to the fact that Envoy does not pool TLS TCP sockets like it does HTTP TCP sockets.
We are running Istio 1.4.6.
Here is what we have tried and noticed:
No ServiceEntry
(se
)/DestinationRule
(dr
)/VirtualService
(vs
) defined:
Downstream <-- TCP KA Socket --> Envoy <-- Socket --> Upstream (external)
The defaults apply, so there is a 60min timeout on the Socket
which will close the downstream TCP KA Socket
. There are times where the upstream will reset the connection and thus we get an ECONNRESET
back to the downstream (the increased frequency triggered us to start digging into this process).
se
(MESH_EXTERNAL
/protocol: TLS
) + dr
(connectionPool.http.idleTimeout
< 60min):
Downstream <-- TCP KA Socket --> Envoy <-- Socket --> Upstream (external)
So same result as with no Istio components, but I expected the socket to timeout earlier, it times out in 60min again, so not sure if the idleTimeout
is getting passed through to the envoy TCP conn. We get the same frequency of ECONNRESET
.
se
(MESH_EXTERNAL
/protocol: TLS
) + dr
(connectionPool.http.idleTimeout
< 60min, tcp. tcpKeepalive.time/interval
set to match the app KA agent):
Downstream <-- TCP KA Socket --> Envoy <-- TCP KA Socket --> Upstream (external)
This seems to be what we want? The idle timeout is still not respected, so it will timeout after 60m and close the downstream socket correctly, but the KA settings should bring us back to our regular connection topology and reduce the ECONNRESET
. We are testing this theory now.
We have tried removing the app KA to let envoy manage, but ran into it closing the connection after each request due to the description I provided above.
Any feedback would be appreciated on best practice in this scenario.