Our team is hosting a big Kubernetes (using AWS/EKS) cluster with Istio to serve traffic, and AWS Load Balancer is used to route our traffic to our cluster. Recently, we found that, seems like we always see a spike of target reset from NLB, along with CPU and IO spike in istiod and ingress gateway, indicating connection reset at Istio level (ingress gateway).
Not so familiar with how everything works internally, so any guidance/ideas would be appreciated:
- Usually under what kind of situation we will see connection reset from ingress gateway?
- And any suggestion on how to further debug this issue?
Refs:
[1] AWS load balancer: GitHub - kubernetes-sigs/aws-load-balancer-controller: A Kubernetes controller for Elastic Load Balancers
[2] Target reset count metrics reported by NLB: CloudWatch metrics for your Network Load Balancer - Elastic Load Balancing