Istio-ingressgateway tuning for TLS termination

#1

Are there any performance tuning guidelines for terminating TLS with Istio ingress?

A bit of background:
Out of the box, we’re seeing that istio-ingressgateway pods run extremely hot when terminating TLS. Under load, the ingress gateways are creating a major bottleneck for https traffic, and we haven’t had any luck tuning them to relieve the problem. For example, with a load test that ramps up to 100 concurrent users pinging a noop healthcheck route, we see latency ramp up from a start of 60ms to 500ms+.

We’ve tried both bumping up the CPU resource requests (e.g. 1000m) and the maximum number of ingress gateway pods in the HPA (e.g. 150 max), but haven’t found any configuration that makes a meaningful difference.

(As a workaround, we’re currently looking at moving TLS termination to our ELBs. We’re on EKS using Istio 1.0.2. We’ve tried Istio 1.1 as well, but that didn’t help.)

1 Like

#3

This is one of the concerning points for me too. He haven’t yet reached the point where we can do full-fledged Load test. Still struggling with one last piece.

But we found a work around to just move forward with a little load test but results were not satisfactory at all.

0 Likes

#4

Could you validate that the issue happens with just vanilla Envoy on some VM or test box? if this is an envoy level problem/regression, it helps to know so that we can profile/troubleshoot it without the pains of debugging on kubernetes

0 Likes

#5

I’m not really set up to do that. I’m really just an Istio consumer; I’ve never configured or deployed Envoy by itself.

I do have an update, though. I’ve been able to suss out that the performance problem I was seeing was partly an issue with the saas load testing tool I was using. Switching to another tool, I was able to run my simple load test (100 concurrent “users” hammering a noop endpoint for 10 minutes) successfully with my istio-ingressgateway pods configured to request 100m CPU. With that much allocation, it was able to handle the traffic without scaling up to 150 pods (the max I set and was hitting previously).

1 Like