Istio-ingressgateway tuning for TLS termination

Ghazgkull · March 22, 2019, 4:38pm

Are there any performance tuning guidelines for terminating TLS with Istio ingress?

A bit of background:
Out of the box, we’re seeing that istio-ingressgateway pods run extremely hot when terminating TLS. Under load, the ingress gateways are creating a major bottleneck for https traffic, and we haven’t had any luck tuning them to relieve the problem. For example, with a load test that ramps up to 100 concurrent users pinging a noop healthcheck route, we see latency ramp up from a start of 60ms to 500ms+.

We’ve tried both bumping up the CPU resource requests (e.g. 1000m) and the maximum number of ingress gateway pods in the HPA (e.g. 150 max), but haven’t found any configuration that makes a meaningful difference.

(As a workaround, we’re currently looking at moving TLS termination to our ELBs. We’re on EKS using Istio 1.0.2. We’ve tried Istio 1.1 as well, but that didn’t help.)

Sourabh_Wadhwa · March 22, 2019, 10:03pm

This is one of the concerning points for me too. He haven’t yet reached the point where we can do full-fledged Load test. Still struggling with one last piece.

But we found a work around to just move forward with a little load test but results were not satisfactory at all.

rshriram · March 23, 2019, 12:23am

Could you validate that the issue happens with just vanilla Envoy on some VM or test box? if this is an envoy level problem/regression, it helps to know so that we can profile/troubleshoot it without the pains of debugging on kubernetes

Ghazgkull · March 23, 2019, 11:14pm

I’m not really set up to do that. I’m really just an Istio consumer; I’ve never configured or deployed Envoy by itself.

I do have an update, though. I’ve been able to suss out that the performance problem I was seeing was partly an issue with the saas load testing tool I was using. Switching to another tool, I was able to run my simple load test (100 concurrent “users” hammering a noop endpoint for 10 minutes) successfully with my istio-ingressgateway pods configured to request 100m CPU. With that much allocation, it was able to handle the traffic without scaling up to 150 pods (the max I set and was hitting previously).

Topic		Replies	Views
TLS termination at edge Envoy (with nginx pod)	0	633	February 26, 2019
Is TLS used between istio ingressgateway pod and service pod?	2	1767	February 6, 2019
Istio-ingressgateway High-CPU Performance and Scalability	5	3489	November 6, 2019
Ingress Gateway - Decrypt then send to a POD listening on https Security	3	1448	August 31, 2021
Ingress gateway pods takes ages to forward traffic after upgrading to v1.1.x Networking	1	886	May 16, 2019

Istio-ingressgateway tuning for TLS termination

Related topics