mTLS working just between some services with tls-check showing STATUS OK

I am trying to enable mTLS in my mesh that I have already working with istio’s sidecars.
The problem I have is that I just get working connections up to one point, and then it fails to connect.

This is how the services are set up right now with my failing implementation of mTLS (simplified):

Istio IngressGateway -> NGINX pod -> API Gateway -> Service A -> [ Database ] -> Service B

First thing to note is that I was using a NGINX pod as a load balancer to proxy_pass my requests to my API Gateway or my frontend page. I tried keeping that without the istio IngressGateway but I wasn’t able to make it work. Then I tried to use Istio IngressGateway and connect directly to the API Gateway with VirtualService but also fails for me. So I’m leaving it like this for the moment because it was the only way that my request got to the API Gateway successfully.

Another thing to note is that Service A first connects to a Database outside the mesh and then makes a request to Service B which is inside the mesh and with mTLS enabled.

NGINX, API Gateway, Service A and Service B are within the mesh with mTLS enabled and “istioctl authn tls-check” shows that status is OK.

NGINX and API Gateway are in a namespace called “gateway”, Database is in “auth” and Service A and Service B are in another one called “api”.

Istio IngressGateway is in namespace “istio-system” right now.

So the problem is that everything work if I set STRICT mode to the gateway namespace and PERMISSIVE to api, but once I set STRICT to api, I see the request getting into Service A, but then it fails to send the request to Service B with a 500.

This is the output when it fails that I can see in the istio-proxy container in the Service A pod:

api/serviceA[istio-proxy]: [2019-09-02T12:59:55.366Z] "- - -" 0 - "-" "-" 1939 0 2 - "-" "-" "-" "-" "10.20.208.248:4567" outbound|4567||database.auth.svc.cluster.local 10.20.128.44:35366 10.20.208.248:4567 
10.20.128.44:35364 -
api/serviceA[istio-proxy]: [2019-09-02T12:59:55.326Z] "POST /api/my-call HTTP/1.1" 500 - "-" "-" 74 90 60 24 "10.90.0.22, 127.0.0.1, 127.0.0.1" "PostmanRuntime/7.15.0" "14d93a85-192d-4aa7-aa45-1501a71d4924" "serviceA.api.svc.cluster.local:9090" "127.0.0.1:9090" inbound|9090|http-serviceA|serviceA.api.svc.cluster.local - 10.20.128.44:9090 127.0.0.1:0 outbound_.9090_._.serviceA.api.svc.cluster.local

No messages in ServiceB though.

Currently, I do not have a global MeshPolicy, and I am setting Policy and DestinationRule per namespace

Policy:

apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: gateway
spec:
  peers:
    - mtls:
        mode: STRICT

---
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: auth
spec:
  peers:
    - mtls:
        mode: STRICT


---
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: api
spec:
  peers:
    - mtls:
        mode: STRICT

DestinationRule:

apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "mutual-gateway"
  namespace: "gateway"
spec:
  host: "*.gateway.svc.cluster.local"
  trafficPolicy:
tls:
  mode: ISTIO_MUTUAL

---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "mutual-api"
  namespace: "api"
spec:
  host: "*.api.svc.cluster.local"
  trafficPolicy:
tls:
  mode: ISTIO_MUTUAL

---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "mutual-auth"
  namespace: "auth"
spec:
  host: "*.auth.svc.cluster.local"
  trafficPolicy:
tls:
  mode: ISTIO_MUTUAL

Then I have some DestinationRule to disable mTLS for Database (I have some other services in the same namespace that I want to enable with mTLS) and for Kubernetes API

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: "myDatabase"
  namespace: "auth"
spec:
  host: "database.auth.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: DISABLE
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: "k8s-api-server"
  namespace: default
spec:
  host: "kubernetes.default.svc.cluster.local"
  trafficPolicy:
tls:
  mode: DISABLE

Then I have my IngressGateway like so:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: ingress-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway # use istio default ingress gateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - my-api.example.com
      tls:
        httpsRedirect: true # sends 301 redirect for http requests
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
        privateKey: /etc/istio/ingressgateway-certs/tls.key
      hosts:
        - my-api.example.com

And lastly, my VirtualServices:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ingress-nginx
  namespace: gateway
spec:
  hosts:
    - my-api.example.com
  gateways:
    - ingress-gateway.istio-system
  http:
    - match:
        - uri:
            prefix: /
      route:
        - destination:
            port:
              number: 80
            host: ingress.gateway.svc.cluster.local      # this is NGINX pod
      corsPolicy:
        allowOrigin:
          - my-api.example.com
        allowMethods:
          - POST
          - GET
          - DELETE
          - PATCH
          - OPTIONS
        allowCredentials: true
        allowHeaders:
          - "*"
        maxAge: "24h"

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: api-gateway
  namespace: gateway
spec:
  hosts:
    - my-api.example.com
    - api-gateway.gateway.svc.cluster.local
  gateways:
    - mesh
  http:
    - match:
        - uri:
            prefix: /
      route:
        - destination:
            port:
              number: 80
            host: api-gateway.gateway.svc.cluster.local
      corsPolicy:
        allowOrigin:
          - my-api.example.com
        allowMethods:
          - POST
          - GET
          - DELETE
          - PATCH
          - OPTIONS
        allowCredentials: true
        allowHeaders:
          - "*"
        maxAge: "24h"

One thing that I don’t understand is why do I have to create a VirtualService for my API Gateway and why do I have to use “mesh” in the gateways block. If I remove this block, I don’t get my request in API Gateway, but if I do, it works and my requests even get to the next service (Service A), but not the next one to that.

Thanks for the help. I am really stuck with this.