How to make sure that gRPC requests are distributed evenly among Ingress Gateway pods

We use Istio Ingress Gateway to load balance our gRPC services. While the requests to the gRPC services backend are evenly distributed across the pods, the requests are not evenly distributed across the Istio Ingress Gateway pod, since gRPC connection is persistent, and the ingress gateway services are load balanced by Kubernetes Service (L4 load balancer).

Normally this isn’t an issue, but under extremely high load, we observe some impact on the end to end latency.

What would be the best approach to mitigate this issue?

Client side reconnecting is how it was approached at high loads to avoid the long running connections from grpc. Also worked well when auto-scaling up.