3 Nodes with Masters+Workers in Centos 7.5
Istio version: 1.1.6
Kubernetes version: 14.4
Cloud being used: bare-metal
Installation method: Kubeadm
Host OS: Centos 7.5
CNI and version: Flannel 0.11.0-amd64
CRI and version: Docker 18.06.1-ce, build e68fc7a
Hello, I have a 3 Nodes
HA Kubernetes cluster where all nodes are Master
. I am facing a behavior that I would like to avoid.
When I was doing an HA Test
I have removed the NIC interface from one of the nodes and see the hole cluster takes at most 2 minutes
to realize the node is down. I have faced situations where it took more than that, like 18 minutes
.
During this time the services is still sending traffic to a Pod
located in the Dead Host
and as I have it replicated in the other two I am receiving 66% of Success responses
and 33% of Failure responses
. When the Cluster realizes the node is down, after a pod eviction time it will Terminate
the pod that is running in the Dead Node
and the traffic to this Pod is stoped.
Is there a way to allow kubernetes to stop sending requests if that Node where the Pod is located doing any custom configuration or using any custom solution?
By the way, I am using Istio Ingress Gateway
and Envoy Proxies
to routing my requests to the Pod
, but I could not achieve this with Circuit Break
like this below.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
name: ccc-service-core
spec:
host: ccc-service-core
subsets:
- labels:
statefulset.kubernetes.io/pod-name: ccc-service-core-0
name: c0
trafficPolicy:
outlierDetection:
baseEjectionTime: 2m
consecutiveErrors: 3
interval: 2s
maxEjectionPercent: 100
- labels:
statefulset.kubernetes.io/pod-name: ccc-service-core-1
name: c1
trafficPolicy:
outlierDetection:
baseEjectionTime: 2m
consecutiveErrors: 3
interval: 2s
maxEjectionPercent: 100
- labels:
statefulset.kubernetes.io/pod-name: ccc-service-core-2
name: c2
trafficPolicy:
outlierDetection:
baseEjectionTime: 2m
consecutiveErrors: 3
interval: 2s
maxEjectionPercent: 100
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ccc-service-core
spec:
gateways:
- default-ingressgateway.default.svc.cluster.local
hosts:
- '*'
http:
- match:
- uri:
prefix: /service-core/
- uri:
prefix: /service-core
retries:
attempts: 3
perTryTimeout: 1s
rewrite:
uri: /
route:
- destination:
host: ccc-service-core
port:
number: 17101
subset: c0
weight: 33
- destination:
host: ccc-service-core
port:
number: 17101
subset: c1
weight: 33
- destination:
host: ccc-service-core
port:
number: 17101
subset: c2
weight: 34
timeout: 4s