Istio mtls connection issues

Hi guys,

I’ve been using istio for a few weeks now in dev environments and want to deploy towards acc/prod.
We want to make use of global mtls on our clusters but keep bumping into issues with pods losing connection to other services.

I’m using istio 1.3.1 on k8s v1.14.6 (dev) and v1.15.4-k3s.1 (local-dev) with rancher 2.3.1.

mesh policy
apiVersion: authentication.istio.io/v1alpha1
kind: MeshPolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"authentication.istio.io/v1alpha1","kind":"MeshPolicy","metadata":{"annotations":{},"name":"default"},"spec":{"peers":[{"mtls":{"mode":"PERMISSIVE"}}]}}
  creationTimestamp: "2019-10-23T11:47:34Z"
  generation: 10
  name: default
  resourceVersion: "61710"
  selfLink: /apis/authentication.istio.io/v1alpha1/meshpolicies/default
  uid: 524fab75-6bf3-4c8a-af49-0ee8149d6e32
spec:
  peers:
  - mtls:
      mode: STRICT
Destination rule
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"default","namespace":"istio-system"},"spec":{"host":"*.local","trafficPolicy":{"tls":{"mode":"ISTIO_MUTUAL"}}}}
  creationTimestamp: "2019-10-24T14:07:38Z"
  generation: 1
  name: default
  namespace: istio-system
  resourceVersion: "61722"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/istio-system/destinationrules/default
  uid: e679dbd2-7b29-40ea-aeda-2304f6511b62
spec:
  host: '*.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

I am able to deploy the httpbin and sleep containers from the sample files without issues, however our own deployments are not able to connect when enabling mtls.

To test our deployments I made a simple grpc client and server to check if the mesh works as intented. When using the istio mesh without mtls the services can connect with loadbalancing to the services.

When enabling mtls by setting the policy to strict and adding a destination rule the client will start giving 503 errors:
“upstream connect error or disconnect/reset before headers. reset reason: connection termination”
Kiali still sees traffic flowing over the mesh network but the istio-proxy sidecar doesn’t seem to hand over the request to the pod, wireshark confirmed this behaviour as the request does flow from the client-ip to the server-ip but no internal traffic on 127.0.0.1 from the sidecar to the pod.

grpc test deployment
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      field.cattle.io/ipAddresses: "null"
      field.cattle.io/targetDnsRecordIds: "null"
      field.cattle.io/targetWorkloadIds: "null"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"hello-world-server-grpc"},"name":"hello-world-server-grpc","namespace":"hello-world-grpc"},"spec":{"ports":[{"name":"http","port":50051,"targetPort":50051}],"selector":{"app":"hello-world-server-grpc"}}}'
    creationTimestamp: "2019-10-24T15:43:26Z"
    labels:
      app: hello-world-server-grpc
    name: hello-world-server-grpc
    namespace: hello-world-grpc
    resourceVersion: "73338"
    selfLink: /api/v1/namespaces/hello-world-grpc/services/hello-world-server-grpc
    uid: 0fa4c9ef-fe5c-4946-b62b-a45a54619dfe
  spec:
    clusterIP: 10.43.37.7
    ports:
    - name: http2
      port: 50051
      protocol: TCP
      targetPort: 50051
    selector:
      app: hello-world-server-grpc
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"hello-world-server-grpc","namespace":"hello-world-grpc"},"spec":{"replicas":1,"template":{"metadata":{"labels":{"app":"hello-world-server-grpc","version":"v1"}},"spec":{"containers":[{"image":"docker.io/kennethreitz/hello-world-server-grpc","imagePullPolicy":"IfNotPresent","name":"hello-world-server-grpc","ports":[{"containerPort":50051}]}]}}}}'
    creationTimestamp: "2019-10-24T15:43:26Z"
    generation: 5
    labels:
      app: hello-world-server-grpc
      version: v1
    name: hello-world-server-grpc
    namespace: hello-world-grpc
    resourceVersion: "88254"
    selfLink: /apis/apps/v1/namespaces/hello-world-grpc/deployments/hello-world-server-grpc
    uid: 6c9506b7-d732-44b9-86d3-6445770a90bc
  spec:
    progressDeadlineSeconds: 2147483647
    replicas: 0
    revisionHistoryLimit: 2147483647
    selector:
      matchLabels:
        app: hello-world-server-grpc
        version: v1
    strategy:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      type: RollingUpdate
    template:
      metadata:
        annotations:
          cattle.io/timestamp: "2019-10-24T16:01:15Z"
          field.cattle.io/ports: '[[{"containerPort":50051,"dnsName":"hello-world-server-grpc-","name":"http2","protocol":"TCP"}]]'
        creationTimestamp: null
        labels:
          app: hello-world-server-grpc
          version: v1
      spec:
        containers:
        - image: mbchristoff/grpc-hello-world
          imagePullPolicy: IfNotPresent
          name: hello-world-server-grpc
          ports:
          - containerPort: 50051
            name: http2
            protocol: TCP
          resources: {}
          securityContext:
            capabilities: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2019-10-24T15:43:26Z"
      lastUpdateTime: "2019-10-24T15:43:26Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 5
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"hello-world-client-grpc","namespace":"hello-world-grpc"},"spec":{"replicas":1,"template":{"metadata":{"labels":{"app":"hello-world-client-grpc"}},"spec":{"containers":[{"image":"mbchristoff/grpc-hello-world-client","imagePullPolicy":"IfNotPresent","name":"hello-world-client-grpc"}],"serviceAccountName":"hello-world-client-grpc"}}}}'
    creationTimestamp: "2019-10-24T15:46:09Z"
    generation: 5
    labels:
      app: hello-world-client-grpc
    name: hello-world-client-grpc
    namespace: hello-world-grpc
    resourceVersion: "88271"
    selfLink: /apis/apps/v1/namespaces/hello-world-grpc/deployments/hello-world-client-grpc
    uid: 39965d99-f455-4e06-b3fa-0d508626fad8
  spec:
    progressDeadlineSeconds: 2147483647
    replicas: 0
    revisionHistoryLimit: 2147483647
    selector:
      matchLabels:
        app: hello-world-client-grpc
    strategy:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 1
      type: RollingUpdate
    template:
      metadata:
        annotations:
          cattle.io/timestamp: "2019-10-24T16:01:59Z"
        creationTimestamp: null
        labels:
          app: hello-world-client-grpc
      spec:
        containers:
        - env:
          - name: ADDRESS
            value: hello-world-server-grpc
          - name: PORT
            value: "50051"
          image: mbchristoff/grpc-hello-world-client
          imagePullPolicy: IfNotPresent
          name: hello-world-client-grpc
          resources: {}
          securityContext:
            capabilities: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: hello-world-client-grpc
        serviceAccountName: hello-world-client-grpc
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2019-10-24T15:46:09Z"
      lastUpdateTime: "2019-10-24T15:46:09Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 5

Apart from my grpc test a simple nginx and curl container also fail to communicate with mtls active.
Unless I copy the sample line by line altering the names and image I always recieve the 503 error.
The only difference I noticed when comparing my working deployment and non-working deployment is the selector rancher adds to the deployment and service:

  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: deployment-test-nginx

I feel like I’m missing something pretty trivial but haven’t found any clear solution after a week of trial and error and lots of googling.

Any help would be much appreciated.

Michael

It appears to be tied to namespaces.
Everytime a namespace is created using Rancher mtls just doesn’t enable correctly.
When a namespace is created using kubectl the testsetup works like a charm.

I have not been able to figure out what exactly is different on namespaces created by rancher and kubectl but I’ll post an update if I figure out what the problem is.

Is this related to how your workloads deployed? We see some issues when the pods/workloads are not associated with a service, the istio mtls configuration can be wrong.

does rancher created namespace affect on that regards?

I’ve tried different combinations of deploying a deployment on the rancher gui and via the sample yaml files.

Every deployment in the kubectl created namespaces worked fine even more basic deployments set up using the rancher gui.
I’ve not been able to get anything to work in the rancher created namespace.

Digging through the namespaces using kubectl and diffing the yaml output I noticed no major differences, the few things like annotations created by rancher did not affect anything after mimicking these.

Not sure what exactly caused this issue but it got resolved after upgrading to kubernetes v1.16.x on both my dev and local-dev cluster using rke 1.0 on dev and k3s on local dev.