Istio mtls connection issues

mbchristoff · October 25, 2019, 11:50am

Hi guys,

I’ve been using istio for a few weeks now in dev environments and want to deploy towards acc/prod.
We want to make use of global mtls on our clusters but keep bumping into issues with pods losing connection to other services.

I’m using istio 1.3.1 on k8s v1.14.6 (dev) and v1.15.4-k3s.1 (local-dev) with rancher 2.3.1.

mesh policy

apiVersion: authentication.istio.io/v1alpha1
kind: MeshPolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"authentication.istio.io/v1alpha1","kind":"MeshPolicy","metadata":{"annotations":{},"name":"default"},"spec":{"peers":[{"mtls":{"mode":"PERMISSIVE"}}]}}
  creationTimestamp: "2019-10-23T11:47:34Z"
  generation: 10
  name: default
  resourceVersion: "61710"
  selfLink: /apis/authentication.istio.io/v1alpha1/meshpolicies/default
  uid: 524fab75-6bf3-4c8a-af49-0ee8149d6e32
spec:
  peers:
  - mtls:
      mode: STRICT

Destination rule

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"default","namespace":"istio-system"},"spec":{"host":"*.local","trafficPolicy":{"tls":{"mode":"ISTIO_MUTUAL"}}}}
  creationTimestamp: "2019-10-24T14:07:38Z"
  generation: 1
  name: default
  namespace: istio-system
  resourceVersion: "61722"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/istio-system/destinationrules/default
  uid: e679dbd2-7b29-40ea-aeda-2304f6511b62
spec:
  host: '*.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

I am able to deploy the httpbin and sleep containers from the sample files without issues, however our own deployments are not able to connect when enabling mtls.

To test our deployments I made a simple grpc client and server to check if the mesh works as intented. When using the istio mesh without mtls the services can connect with loadbalancing to the services.

When enabling mtls by setting the policy to strict and adding a destination rule the client will start giving 503 errors:
“upstream connect error or disconnect/reset before headers. reset reason: connection termination”
Kiali still sees traffic flowing over the mesh network but the istio-proxy sidecar doesn’t seem to hand over the request to the pod, wireshark confirmed this behaviour as the request does flow from the client-ip to the server-ip but no internal traffic on 127.0.0.1 from the sidecar to the pod.

grpc test deployment

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      field.cattle.io/ipAddresses: "null"
      field.cattle.io/targetDnsRecordIds: "null"
      field.cattle.io/targetWorkloadIds: "null"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"hello-world-server-grpc"},"name":"hello-world-server-grpc","namespace":"hello-world-grpc"},"spec":{"ports":[{"name":"http","port":50051,"targetPort":50051}],"selector":{"app":"hello-world-server-grpc"}}}'
    creationTimestamp: "2019-10-24T15:43:26Z"
    labels:
      app: hello-world-server-grpc
    name: hello-world-server-grpc
    namespace: hello-world-grpc
    resourceVersion: "73338"
    selfLink: /api/v1/namespaces/hello-world-grpc/services/hello-world-server-grpc
    uid: 0fa4c9ef-fe5c-4946-b62b-a45a54619dfe
  spec:
    clusterIP: 10.43.37.7
    ports:
    - name: http2
      port: 50051
      protocol: TCP
      targetPort: 50051
    selector:
      app: hello-world-server-grpc
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"hello-world-server-grpc","namespace":"hello-world-grpc"},"spec":{"replicas":1,"template":{"metadata":{"labels":{"app":"hello-world-server-grpc","version":"v1"}},"spec":{"containers":[{"image":"docker.io/kennethreitz/hello-world-server-grpc","imagePullPolicy":"IfNotPresent","name":"hello-world-server-grpc","ports":[{"containerPort":50051}]}]}}}}'
    creationTimestamp: "2019-10-24T15:43:26Z"
    generation: 5
    labels:
      app: hello-world-server-grpc
      version: v1
    name: hello-world-server-grpc
    namespace: hello-world-grpc
    resourceVersion: "88254"
    selfLink: /apis/apps/v1/namespaces/hello-world-grpc/deployments/hello-world-server-grpc
    uid: 6c9506b7-d732-44b9-86d3-6445770a90bc
  spec:
    progressDeadlineSeconds: 2147483647
    replicas: 0
    revisionHistoryLimit: 2147483647
    selector:
      matchLabels:
        app: hello-world-server-grpc
        version: v1
    strategy:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      type: RollingUpdate
    template:
      metadata:
        annotations:
          cattle.io/timestamp: "2019-10-24T16:01:15Z"
          field.cattle.io/ports: '[[{"containerPort":50051,"dnsName":"hello-world-server-grpc-","name":"http2","protocol":"TCP"}]]'
        creationTimestamp: null
        labels:
          app: hello-world-server-grpc
          version: v1
      spec:
        containers:
        - image: mbchristoff/grpc-hello-world
          imagePullPolicy: IfNotPresent
          name: hello-world-server-grpc
          ports:
          - containerPort: 50051
            name: http2
            protocol: TCP
          resources: {}
          securityContext:
            capabilities: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2019-10-24T15:43:26Z"
      lastUpdateTime: "2019-10-24T15:43:26Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 5
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"hello-world-client-grpc","namespace":"hello-world-grpc"},"spec":{"replicas":1,"template":{"metadata":{"labels":{"app":"hello-world-client-grpc"}},"spec":{"containers":[{"image":"mbchristoff/grpc-hello-world-client","imagePullPolicy":"IfNotPresent","name":"hello-world-client-grpc"}],"serviceAccountName":"hello-world-client-grpc"}}}}'
    creationTimestamp: "2019-10-24T15:46:09Z"
    generation: 5
    labels:
      app: hello-world-client-grpc
    name: hello-world-client-grpc
    namespace: hello-world-grpc
    resourceVersion: "88271"
    selfLink: /apis/apps/v1/namespaces/hello-world-grpc/deployments/hello-world-client-grpc
    uid: 39965d99-f455-4e06-b3fa-0d508626fad8
  spec:
    progressDeadlineSeconds: 2147483647
    replicas: 0
    revisionHistoryLimit: 2147483647
    selector:
      matchLabels:
        app: hello-world-client-grpc
    strategy:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      type: RollingUpdate
    template:
      metadata:
        annotations:
          cattle.io/timestamp: "2019-10-24T16:01:59Z"
        creationTimestamp: null
        labels:
          app: hello-world-client-grpc
      spec:
        containers:
        - env:
          - name: ADDRESS
            value: hello-world-server-grpc
          - name: PORT
            value: "50051"
          image: mbchristoff/grpc-hello-world-client
          imagePullPolicy: IfNotPresent
          name: hello-world-client-grpc
          resources: {}
          securityContext:
            capabilities: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: hello-world-client-grpc
        serviceAccountName: hello-world-client-grpc
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2019-10-24T15:46:09Z"
      lastUpdateTime: "2019-10-24T15:46:09Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 5

Apart from my grpc test a simple nginx and curl container also fail to communicate with mtls active.
Unless I copy the sample line by line altering the names and image I always recieve the 503 error.
The only difference I noticed when comparing my working deployment and non-working deployment is the selector rancher adds to the deployment and service:

  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: deployment-test-nginx

I feel like I’m missing something pretty trivial but haven’t found any clear solution after a week of trial and error and lots of googling.

Any help would be much appreciated.

Michael

mbchristoff · October 29, 2019, 9:35pm

It appears to be tied to namespaces.
Everytime a namespace is created using Rancher mtls just doesn’t enable correctly.
When a namespace is created using kubectl the testsetup works like a charm.

I have not been able to figure out what exactly is different on namespaces created by rancher and kubectl but I’ll post an update if I figure out what the problem is.

incfly · November 19, 2019, 7:01am

Is this related to how your workloads deployed? We see some issues when the pods/workloads are not associated with a service, the istio mtls configuration can be wrong.

does rancher created namespace affect on that regards?

mbchristoff · November 19, 2019, 8:45am

I’ve tried different combinations of deploying a deployment on the rancher gui and via the sample yaml files.

Every deployment in the kubectl created namespaces worked fine even more basic deployments set up using the rancher gui.
I’ve not been able to get anything to work in the rancher created namespace.

Digging through the namespaces using kubectl and diffing the yaml output I noticed no major differences, the few things like annotations created by rancher did not affect anything after mimicking these.

mbchristoff · January 3, 2020, 1:20pm

Not sure what exactly caused this issue but it got resolved after upgrading to kubernetes v1.16.x on both my dev and local-dev cluster using rke 1.0 on dev and k3s on local dev.

Topic		Replies	Views
Authentication Policy issues Security	1	955	February 5, 2019
mTLS configuration ignored Security	2	2001	January 28, 2021
Istio External Authorization to pod inside cluster and MTLS Security	3	771	February 26, 2021
Istio mTLS question Security security	3	819	February 13, 2020
mTLS between k8s clusters security	0	401	May 13, 2020

Istio mtls connection issues

Related topics