Help regarding "unresolved" requests

Hey everyone!

This is my first post here, so i hope i’m not missing anything. Thanks already for everyone jumping in here and reading this.

I’m currently running istio 1.0.5 with mtls on aks and I’m using ambassador as an api gateway. I’ve configured a few services that serve static files (a node service and a nginx service). The system runs fine since a while, but occasionally a few requests fail, and i cant seam to find out why. i’m using jaeger to trace those requests and i see two different (maybe unrelated) patterns.

  1. within a short period of time, i see a few requests hitting ambassador, getting forwarded to the responsible service and resolve in about 10ms. Occasionally, one of those requests hits ambassador (i see two spans) and fails with a 503 upstream request timeout within <2ms. No trace of the targeted service.

  2. similar to the other pattern, i notice that sometimes a request hits ambassador (two spans) and than has one span on the targeted service that shows a http status code 0. i see this request in the logs of the sidecar but not on the service itself.

in both cases, i’m not able to see the request on the container. also, requests before and after a failed request work without any issues (also on the same file requested). the containers run with health probes and don’t seam to have any issues overall.

I’m trying to figure out where the system bugs out, but i cant seam to find the issue. i’ll highly appreciate any suggestions and tips. thanks a lot!

short update what i think was / is the issue:
it seams like the issue was because our node service responded with a connection: keep-alive and envoy reuses connections. in some cases, it used a connection that was right before timing out, which resulted in the 503 UC error. for now, we’ve added a response header (connection: close) in the service, which seams to avoid this issue, though we’re not sure about performance implications and if this is actually the best practice solution.
it somehow feels like this should be managed by envoy / istio or at least be documented in a guide.