Hi Everyone,
I really hope you can help me with a matter I am struggling for quite some time.
Istio Version
client version: 1.14.1
control plane version: 1.14.1
data plane version: 1.14.1 (130 proxies)
Kubectl Version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17+rke2r1", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-28T21:40:04Z", GoVersion:"go1.19.6 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
I have Istio and inject sidecar containers throughout my namespace.
There are various components which work out of the box using mTLS connection.
Istio is installed either by official helm-chart repo, IstioOperator with istioctl or taken from Kubeflow helm-chart in Github.
Everything works when it comes to few core components that are doing checks via initContainer over HTTP to a Kubernetes service and once they are completed the system starts.
initContainer
- command:
- sh
- -c
- until curl -X GET "elasticsearch:9200/_cluster/health?wait_for_status=yellow&timeout=50s&pretty";
do echo waiting for elasticsearch; sleep 2; done
When I enforce strict policy authentication everything collapses. I am getting hit by connection failures. The below snippet is an example for that.
curl: (56) Recv failure: Connection reset by peer waiting for elasticsearch
I can also see similar behavior for Database instances.
"- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - -
As far as I read Istio’s doc the above issue is related to missing Destination Rule.
The same goes for the Jobs which are trying to install something on a Pod.
command: [sh, -c, sleep 10, "curl -XPUT -H 'Content-Type: application/json' -T /mnt/license/license.bin 'http://elasticsearch:9200/_siren/license'"]
The license just doesn’t apply on the pod and thus Elasticsearch is failing.
Another example if I try to do a curl to Kubernetes service from istio-proxy on any pod
istio-proxy@index-upload-ab6055a0aa6c3477b41a-np8k9:/$ curl -v -k http://index-es-http:9200
* Trying 10.43.121.43:9200...
* TCP_NODELAY set
* Connected to index-es-http (10.43.121.43) port 9200 (#0)
> GET / HTTP/1.1
> Host: index-es-http:9200
> User-Agent: curl/7.68.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
istio-proxy@index-es-mdi-0:/$ curl http://k8s-service:9091
curl: (56) Recv failure: Connection reset by peer
I have also decided to use cert-manager for CA. That didn’t solve the matter either.
Here are my PeerAuthentication policy, Destination Rule and Authorization Policy all placed within istio-system namespace
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: mtls
namespace: istio-system
spec:
host: '*.cluster.local'
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: mtls-policy
namespace: istio-system
spec:
mtls:
mode: STRICT
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-all
namespace: istio-system
spec:
rules:
- {}
I have also tried creating Gateways to server everything on port 80, tried cluster-local-gateway as a component.
What am I missing here? Should every service that is failing has it’s own VirtualService? But than again why I cannot run a curl to ANY of the pods inside the mesh when STRICT policy is enabled?
The strict policy is also affecting my internal ingress controllers. They cannot server calls. I am either getting a 502 bad gateway or nothing at all.
Since I am kind of new to this one, I am sure that the information might be insufficient, so hit me with whatever necessary.