We have a NodeJS (8.x.x) application that connects to an external service on :443 with a 60s keep alive timeout. Frequently, we receive “socket hang up” when interacting with that external service. I’ve been digging through envoy and istio GH issues and have tried the following, but they do not go away. If anything it makes them more common:
- No ServiceEntry (passthrough): least frequent econnreset
- ServiceEntry w/ DestinationRule, http/connectionPool/idleTimeout: 60s: frequent econnreset
- ServiceEntry w/ DestinationRule, http/connectionPool/maxRequestsPerConnection: 1: frequent econnreset (seems like every hour)
I have not tried tweaking the TCP keep alive settings, but it is very unclear to me if this is even enabled (I don’t see any packets flowing over tcpdump when idle). We don’t have any mesh enabled tcpKeepalive settings from what I can tell, but I have not found a way to verify settings outside of:
istioctl proxy-config cluster <pod> --fqdn '<fqdn>' -ojson
which only shows (per 3):
[
{
"name": "outbound|443||<fqdn>",
"type": "ORIGINAL_DST",
"connectTimeout": "10s",
"lbPolicy": "CLUSTER_PROVIDED",
"maxRequestsPerConnection": 1,
"circuitBreakers": {
"thresholds": [
{
"maxConnections": 4294967295,
"maxPendingRequests": 4294967295,
"maxRequests": 4294967295,
"maxRetries": 4294967295
}
]
},
"metadata": {
"filterMetadata": {
"istio": {
"config": "/apis/networking/v1alpha3/namespaces/<ns>/destination-rule/<drName>"
}
}
}
}
]
The question is do I turn off application level keep alive in Nodejs and leave it to the underlying pool, or is it supposed to “just work”?
I realize that we can create a VirtualService with retryOn settings enabled, but is that required for every external service (we have those settings on our internal services)?