Istio mTLS strange behavior (bug?)

m0ps · August 28, 2020, 11:19am

Currently, we are working on implementing a new GKE setup for our application. It includes GKE (1.16) with Workload Identity and Istio OSS (1.6.8). We are trying to apply STRICT mTLS policy. And it works fine for most microservices except few ones that are calling another one during startup. It receives TCP RST. And it receives due that fact, that it tries to connect without mTLS (plain-text http):

debug envoy connection [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:226] [C77289] TLS error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST

I thought that we affected that bug: https://github.com/istio/istio/issues/11130, but… it’s not true, because I’m able to do the same call via curl successfully (for test purposes, I’ve added curl command prior to java -jar app.jar in entrypoint). Moreover, in case I’m disabling STRICT mTLS mode during startup and re-add it (STRICT policy) after successful application initialization - it able to make the same call to the same destination microservice, which is totally weird.

Both microservices are Java11 springoot… ReactorNetty/0.9.5.RELEASE is used for making call.

We have old cluster (GKE 1.14 without workload identity) and Istio 1.1.17 and here it works fine.

P.S. I’m struggling with this issue without any significant progress for the whole last week, so any ideas/advice are welcomed.

m0ps · August 28, 2020, 11:39am

I’ve tried to look into x_forwarded_client_cert headers (an only successful connection is logged in envoy logs) and the picture is following:
"x_forwarded_client_cert": "By=spiffe://cluster.local/ns/dst-qa/sa/iden-dst-qa-us-west1;Hash=2c63516e6f040e774e5d6b4ca42016f587dc831a3b59bcd031ae5661c00fb2b2;Subject="";URI=spiffe://cluster.local/ns/src-qa/sa/iden-src-qa-us-west1",

The connection from another source pod to the same destination:

"x_forwarded_client_cert": "By=spiffe://cluster.local/ns/dst-qa/sa/iden-dst-qa-us-west1;Hash=307a06d8fbbff3a23cce3fafa71890fdbb56e5557acdf64811b24916fe320956;Subject="";URI=spiffe://cluster.local/ns/src-dev/sa/iden-src-dev-us-west1",

spiffe URI is the same, but hash is different, so I hope that there should not be an issue. Each connection uses it’s own certificates(?)

m0ps · August 28, 2020, 12:14pm

The most interesting thing is tcpdump output. During the failed connection, we have the following picture:

So it sends packet 2 times. The problem is with packet num 2078… It sent as a plaint test, but the target expects mTLS. At the same time, we have packet num 2074 which I guess should be a valid one after which communication between pods should be established.

m0ps · August 28, 2020, 12:15pm

The situation with successful connection is also not so clear for me:

From my understanding - mTLS communication performed within 288-301 packets, but… why I can see plaintext response (packet num 302)?

P.S. http/400 is a valid response that I’m expecting as a result of my request.

jtrbs · September 1, 2020, 11:27pm

Hi, may I know if you have applied both STRICT mTLS policy and DestinationRule? May you show some sample config you have applied for the mTLS?

m0ps · September 2, 2020, 9:30am

Hi @jtrbs,
According to the documentation, the default mTLS policy managed with the following peer authentication policy manifest:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

and DestinationRule is no more needed starting from 1.4.

JimmyChen · September 2, 2020, 11:39pm

Yes, Istio 1.5+ is using auto mTLS by default. If both client and server have sidecars, they will establish mTLS connection. If one of them does not have a sidecar, then it fallbacks to plain-text.

m0ps · September 3, 2020, 10:28am

But… in our case, we have sidecars for both microservices in place. This issue looks very similar to https://github.com/istio/istio/issues/16391, but it was fixed almost a year ago.

m0ps · September 15, 2020, 8:52am

The issue is resolved now.
After examining of debug logs from caller pod, the following message was fond:

[2020-09-11T17:50:48.061Z] "- - -" 0 - "-" "-" 402 0 3885 - "-" "-" "-" "-" "10.160.12.136:80" PassthroughCluster 10.160.162.112:46154 10.160.12.136:80 10.160.162.112:46146 -

The key is PassthroughCluster . It means that the connection was not handled by any of envoy routes and handled by default PassthroughCluster virtual cluster.

Disabling of PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_OUTBOUND solve the issue, but… a potential downside of this solution is that some of the telemetry metrics on the client side can be loosed.

More correct solution become increasing of protocolDetectionTimeout to 5s (default is 100ms). Now apps operate correctly in any phase.

Unfortunately, the root cause is still not clear.

P.S. Solution was provided by GCP Support Team.

m0ps · September 15, 2020, 9:02am

Related github issue: https://github.com/istio/istio/issues/16581

Topic		Replies	Views
SSL errors appearing in trace-level logs of envoy	0	2044	April 25, 2020
mTLS configuration ignored Security	2	1997	January 28, 2021
Istio Gateway without mTLS	1	573	October 11, 2019
Istio - TLS issue when the sidecar is set Security	1	1034	August 26, 2021
Istio mtls connection issues Security	4	1995	January 3, 2020

Istio mTLS strange behavior (bug?)

Related topics