Connection returns TLS error after running for few days

voanhduy1512 · June 6, 2019, 7:07am

Hi, I am having a problem with istio in my current production setup and would need your help to troubleshoot it.

Background:

I am running Istio 1.1.7 in all our environments on kubernetes (amazon eks) 1.12.7 with mtls enable on application namespace, sds in both ingress gateway and sidecar.
There is no circuit breaker, no custom root CA for citadel.

Problem

The behaviour I saw is at first, all services in cluster are working fine, connection from ingress controller hit the services and return correctly.

But after a while, days or weeks, i haven’t been able to find the pattern, all connections from ingress to services return 503 UF, URX.
There are logs in istio-proxy container of ingress pod but no log in the upstream service’s istio-proxy container.

In example log (sorry for the format, i pull it out from elasticsearch)

"stream_name": "istio-ingressgateway-76749b4bb4-z6n78",
"istio_policy_status": "-",
"bytes_sent": "91",
"upstream_cluster": "outbound|8080||frontend.services.svc.cluster.local",
"downstream_remote_address": "172.23.24.174:30690",
"path": "/user",
"authority": "prod.example.com",
"protocol": "HTTP/1.1",
"upstream_service_time": "-",
"upstream_local_address": "-",
"duration": "69",
"downstream_local_address": "172.23.24.189:443",
"response_code": "503",
"user_agent": "Mozilla/5.0 (Linux; Android 8.0.0) ...",
"response_flags": "UF,URX",
"start_time": "2019-06-03T13:26:06.617Z",
"method": "GET",
"request_id": "320037db-601b-9c52-861f-bwoeifwoiegi",
"upstream_host": "172.23.24.143:80",
"x_forwarded_for": "218.186.146.112,172.23.24.174",
"requested_server_name": "prod.example.com",
"bytes_received": "0",

I tried to enable debug logging in proxy sidecar with

curl -XPOST localhost:15000/logging?connection=debug

then i found this in the isito-proxy container of the ingress controller:

[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:644] [C79846] connecting to 172.23.14.229:80
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:517] [C79846] connected
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:653] [C79846] connection in progress
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.883][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.883][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 1
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:175] [C79846] TLS error: 268436501:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C79846] closing socket: 0

So it looks like there are some problem with the TLS cert. The cert in istio-ca-secret and istio.istio-ingressgateway-service-account look correct and are not expired yet. Same goes for the internal certificates for my upstream services.
And as far as I can tell, this only happens when the service pods runs for a few days without being restarted or deployed with a new version.

I also saw another instance of the problem, but these logs were found inside the upstream service’s istio-proxy container, and the TLS error is different from the one in the ingress controller:

[2019-06-04 01:18:58.029][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 2
[2019-06-04 01:18:58.029][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 2
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 1
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:175] [C400] TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C400] closing socket: 0

I am not sure what actually happened here; The citadel logs, node agent logs and the rest looked normal at that point in time.

Please let me know if there are any other logs/config you need to troubleshoot the problem.

Thanks

Update 1: clarify about mtls config

Steven_Hespelt · July 21, 2020, 3:03pm

Hi - Any chance you found a resolution to this? I’m also experiencing this exact situation using Istio 1.4.3, also on EKS. MTIA.
-steve

eabassey · September 7, 2021, 12:13am

Hi All, I don’t know if anyone has had a solution for this issue, but I am also experiencing it right now. Any help?

eabassey · September 7, 2021, 12:37am

It worked for me when I restarted the deployment by running
kubectl rollout restart deployment --namespace default

Topic		Replies	Views
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED	1	13678	January 6, 2023
TLS modes PASSTHROUGH and SIMPLE Networking	6	16312	February 26, 2019
Upstream connect error or disconnect/reset before headers	1	3804	September 3, 2020
[SOLVED] Ingress Stops Working When TLS is Activated Networking	2	3682	April 4, 2022
I have tried to use tls passthrough with istio controller and k8s ingress , it does not work but with Gateway and VirtualServce it works. ere is the ingress YAML Networking	0	450	December 28, 2021

Connection returns TLS error after running for few days

Background:

Problem

Related topics