Hi all,
I appreciate any help that can help me to understand the problem and make my cluster working again.
I have a K8s 1.15.7 on-premise cluster with about 120 workloads and about a 10 Cronjobs starting each in 5 to 30 minutes interval. I had Istio since 0.8 periodically migrating to new versions and last working was a 1.5.1.
I decided to migrate to 1.6.1 - dropped 1.5.1 installed with Helm, dropped all Istio CRDs, installed 1.6.1 with Istioctl. 1.6.1 installed with default profile with minor changes:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
outboundTrafficPolicy:
mode: REGISTRY_ONLY
accessLogFile: "/dev/stdout"
components:
pilot:
k8s:
replicaCount: 2
hpaSpec:
minReplicas: 2
proxy:
k8s:
resources:
requests:
cpu: 10m
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
env:
- name: ISTIO_META_ROUTER_MODE
value: "sni-dnat"
service:
type: NodePort
ports:
- port: 15021
targetPort: 15021
name: status-port
- port: 80
targetPort: 8080
name: http2
- port: 443
targetPort: 8443
name: https
nodePort: 31390
- port: 15443
targetPort: 15443
name: tls
hpaSpec:
maxReplicas: 5
minReplicas: 2
Then I have a problem with service interconnection. Small amount of workloads cannot connect to other services. Some services after 4-5 minutes start working, but some still after 1 hour and multiple primary container restarts cannot.
For example:
Service on startup call:Sending HTTP request “POST” http://platform-auth-sts.dmz.svc.cluster.local/connect/token
Then Envoy on this POD says:"- - -" 0 UH “-” “-” 0 0 0 - “-” “-” “-” “-” “-” - - 10.105.219.37:80 10.244.4.1:40756 - -
Other services and two of our CronJobs starting each 5 min without problem communicate with this endpoint.
Target Service Definition and Endpoints from K8s:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-01-22T09:20:22Z"
labels:
app: platform-auth-sts
chart: platform-auth-sts-0.4.20
heritage: Tiller
release: platform-auth-sts
name: platform-auth-sts
namespace: dmz
resourceVersion: "132316717"
selfLink: /api/v1/namespaces/dmz/services/platform-auth-sts
uid: f1a80a1f-1e26-11e9-8395-000c29cb8c62
spec:
clusterIP: 10.105.219.37
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
selector:
app: platform-auth-sts
release: platform-auth-sts
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2020-06-11T10:36:26Z"
creationTimestamp: "2019-01-22T09:20:22Z"
labels:
app: platform-auth-sts
chart: platform-auth-sts-0.4.20
heritage: Tiller
release: platform-auth-sts
name: platform-auth-sts
namespace: dmz
resourceVersion: "134288719"
selfLink: /api/v1/namespaces/dmz/endpoints/platform-auth-sts
uid: f1a9b57d-1e26-11e9-8395-000c29cb8c62
subsets:
- addresses:
- ip: 10.244.4.35
nodeName: k8s-node1.abc
targetRef:
kind: Pod
name: platform-auth-sts-677fcf79db-jlzhh
namespace: dmz
resourceVersion: "134288717"
uid: 8b574c2b-8dbd-4463-a8a7-a2275a05130d
- ip: 10.244.5.54
nodeName: k8s-node3.abc
targetRef:
kind: Pod
name: platform-auth-sts-677fcf79db-cj8h7
namespace: dmz
resourceVersion: "134288520"
uid: 2f599029-c0a5-42aa-93c1-87062dfa6626
ports:
- name: http
port: 80
protocol: TCP
IstioCtl
Endpoints:
10.244.4.35:80 HEALTHY OK outbound|80||platform-auth-sts.dmz.svc.cluster.local
10.244.5.54:80 HEALTHY OK outbound|80||platform-auth-sts.dmz.svc.cluster.local
[
{
"name": "outbound|80||platform-auth-sts.dmz.svc.cluster.local",
"addedViaApi": true,
"hostStatuses": [
{
"address": {
"socketAddress": {
"address": "10.244.4.35",
"portValue": 80
}
},
"stats": [
{
"name": "cx_connect_fail"
},
{
"name": "cx_total"
},
{
"name": "rq_error"
},
{
"name": "rq_success"
},
{
"name": "rq_timeout"
},
{
"name": "rq_total"
},
{
"type": "GAUGE",
"name": "cx_active"
},
{
"type": "GAUGE",
"name": "rq_active"
}
],
"healthStatus": {
"edsHealthStatus": "HEALTHY"
},
"weight": 1,
"locality": {}
},
{
"address": {
"socketAddress": {
"address": "10.244.5.54",
"portValue": 80
}
},
"stats": [
{
"name": "cx_connect_fail"
},
{
"name": "cx_total"
},
{
"name": "rq_error"
},
{
"name": "rq_success"
},
{
"name": "rq_timeout"
},
{
"name": "rq_total"
},
{
"type": "GAUGE",
"name": "cx_active"
},
{
"type": "GAUGE",
"name": "rq_active"
}
],
"healthStatus": {
"edsHealthStatus": "HEALTHY"
},
"weight": 1,
"locality": {}
}
]
}
]
Clusters:
SERVICE FQDN PORT SUBSET DIRECTION TYPE
platform-auth-sts.dmz.svc.cluster.local 80 - outbound EDS