Calling the same service entry multiple times, sometimes we met error "upstream connect error or disconnect/reset before headers. reset reason: connection termination"

We created a service entry for an external destination. And one of our applications ( say application A) will call this service entry. I noticed that although most of the calls succeeded , sometimes it fails with “upstream connect error or disconnect/reset before headers. reset reason: connection termination”.

I want to investigate this issue but not so clear where to start. So as an initial step, I checked the client envoy proxy log , as following:

2021-01-11T06:56:48.434611Z info FLAG: --binaryPath=“/usr/local/bin/envoy”
2021-01-11T06:56:48.434635Z info FLAG: --concurrency=“2”
2021-01-11T06:56:48.434640Z info FLAG: --configPath=“/etc/istio/proxy”
2021-01-11T06:56:48.434646Z info FLAG: --connectTimeout=“10s”
2021-01-11T06:56:48.434654Z info FLAG: --controlPlaneAuthPolicy=“NONE”
2021-01-11T06:56:48.434658Z info FLAG: --controlPlaneBootstrap=“false”
2021-01-11T06:56:48.434662Z info FLAG: --customConfigFile=“”
2021-01-11T06:56:48.434665Z info FLAG: --datadogAgentAddress=“”
2021-01-11T06:56:48.434669Z info FLAG: --disableInternalTelemetry=“false”
2021-01-11T06:56:48.434674Z info FLAG: --discoveryAddress=“istiod.istio-system.svc:15012”
2021-01-11T06:56:48.434678Z info FLAG: --dnsRefreshRate=“300s”
2021-01-11T06:56:48.434683Z info FLAG: --domain=“test.svc.cluster.local”
2021-01-11T06:56:48.434687Z info FLAG: --drainDuration=“45s”
2021-01-11T06:56:48.434692Z info FLAG: --envoyAccessLogService=“”
2021-01-11T06:56:48.434696Z info FLAG: --envoyMetricsService=“”
2021-01-11T06:56:48.434700Z info FLAG: --help=“false”
2021-01-11T06:56:48.434704Z info FLAG: --id=“”
2021-01-11T06:56:48.434708Z info FLAG: --ip=“”
2021-01-11T06:56:48.434711Z info FLAG: --lightstepAccessToken=“”
2021-01-11T06:56:48.434715Z info FLAG: --lightstepAddress=“”
2021-01-11T06:56:48.434719Z info FLAG: --lightstepCacertPath=“”
2021-01-11T06:56:48.434723Z info FLAG: --lightstepSecure=“false”
2021-01-11T06:56:48.434727Z info FLAG: --log_as_json=“false”
2021-01-11T06:56:48.434731Z info FLAG: --log_caller=“”
2021-01-11T06:56:48.434735Z info FLAG: --log_output_level=“default:info”
2021-01-11T06:56:48.434739Z info FLAG: --log_rotate=“”
2021-01-11T06:56:48.434743Z info FLAG: --log_rotate_max_age=“30”
2021-01-11T06:56:48.434746Z info FLAG: --log_rotate_max_backups=“1000”
2021-01-11T06:56:48.434751Z info FLAG: --log_rotate_max_size=“104857600”
2021-01-11T06:56:48.434755Z info FLAG: --log_stacktrace_level=“default:none”
2021-01-11T06:56:48.434763Z info FLAG: --log_target=“[stdout]”
2021-01-11T06:56:48.434767Z info FLAG: --mixerIdentity=“”
2021-01-11T06:56:48.434770Z info FLAG: --outlierLogPath=“”
2021-01-11T06:56:48.434774Z info FLAG: --parentShutdownDuration=“1m0s”
2021-01-11T06:56:48.434779Z info FLAG: --pilotIdentity=“”
2021-01-11T06:56:48.434783Z info FLAG: --proxyAdminPort=“15000”
2021-01-11T06:56:48.434788Z info FLAG: --proxyComponentLogLevel=“misc:error”
2021-01-11T06:56:48.434800Z info FLAG: --proxyLogLevel=“warning”
2021-01-11T06:56:48.434804Z info FLAG: --serviceCluster=“helloworld.test”
2021-01-11T06:56:48.434808Z info FLAG: --serviceregistry=“Kubernetes”
2021-01-11T06:56:48.434811Z info FLAG: --statsdUdpAddress=“”
2021-01-11T06:56:48.434816Z info FLAG: --statusPort=“15020”
2021-01-11T06:56:48.434821Z info FLAG: --stsPort=“0”
2021-01-11T06:56:48.434825Z info FLAG: --templateFile=“”
2021-01-11T06:56:48.434829Z info FLAG: --tokenManagerPlugin=“GoogleTokenExchange”
2021-01-11T06:56:48.434834Z info FLAG: --trust-domain=“cluster.local”
2021-01-11T06:56:48.434838Z info FLAG: --zipkinAddress=“zipkin.istio-system:9411”
2021-01-11T06:56:48.434860Z info Version 1.5.4-f27639c7369f8a6ef144ed8768f8acb27566d999-Clean
2021-01-11T06:56:48.434982Z info Obtained private IP [100.96.0.136]
2021-01-11T06:56:48.435015Z info Proxy role: &model.Proxy{ClusterID:“”, Type:“sidecar”, IPAddresses:string{“100.96.0.136”, “100.96.0.136”}, ID:“helloworld-v1-5b75657f75-8jbxd.test”, Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:“test.svc.cluster.local”, ConfigNamespace:“”, Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:*model.ServiceInstance(nil), WorkloadLabels:labels.Collection(nil), IstioVersion:(*model.IstioVersion)(nil)}
2021-01-11T06:56:48.435022Z info PilotSAN string(nil)
2021-01-11T06:56:48.435026Z info MixerSAN string(nil)
2021-01-11T06:56:48.435358Z info Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: helloworld.test
statNameLength: 189
tracing:
zipkin:
address: zipkin.istio-system:9411

2021-01-11T06:56:48.435369Z info JWT policy is third-party-jwt
2021-01-11T06:56:48.435413Z info Using user-configured CA istio-pilot.istio-system.svc:15012
2021-01-11T06:56:48.435420Z info istiod uses self-issued certificate
2021-01-11T06:56:48.435462Z info the CA cert of istiod is: -----BEGIN CERTIFICATE-----
MIIC3TCCAcWgAwIBAgIQebQj20/UWaZaRLOEw1+WxzANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIwMTEyNTEwMTMyMVoXDTMwMTEy
MzEwMTMyMVowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAMugAO5BNhnBBbrlnDgp+wTJTmvPJtQljW/klueR
6iNgEVmAJxqLrnEN2Vbc8h0cdXJuLWj0XIFpfj2BaXTBVO7lunOdZDMyL0t6o/d1
A9ll0/cZV3vqFKcb+Z+YkOuqgBnjOk7spJkeq5I6HY1oAfKsHasFDUhM3KhvdPMK
3PKy3g89S+eMjnGCYwv5G8CoRnyU/Ur2RHOEwGZx1fEi0wmApw5PUWtygXpuvHOp
fptzccqnfT8Fw7Ll4q91kcKz+X9xhWEXkmoGW09gaZPU//qKLz0JwfiTQ8qVuTEZ
P9Bsndz3HGG3HGCJbKwjZtA8hEr2l4o73oVIfj2hphbSxzECAwEAAaMjMCEwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEB
AArjNVP53fW5jvp3SukO4fPWb6gGb0WjWMamtd1JFqf1ep72Vltepe580gu58ldY
AIdsxDgZd4rhhmbZ8fsI70zTyWWdO1/m7AYFSTwhjdecgoo+iCdpIEvLx1VIfB14
jblJIT1uXo4x3OI/Kggd4m34xBtx2tT3vjNJdUg+2w81N+iWhhnkaYTzF1bqZXqo
KHssVMFESFr7TmIrFsNJGl/jpfFWWVbOF9O12qADHcp3NuCqXtw/Kl59NSpx0yHU
RZ93rC6RHbwiclp2cpYMNvvafqY/pFzOa6fgni1bg9u/lfY6e5gWt8yuXo44AUB/
ZRYXlXmHmMF0CYWhOjFe7/U=
-----END CERTIFICATE-----

2021-01-11T06:56:48.435574Z info parsed scheme: “”
2021-01-11T06:56:48.435585Z info scheme “” not registered, fallback to default scheme
2021-01-11T06:56:48.435603Z info ccResolverWrapper: sending update to cc: {[{istio-pilot.istio-system.svc:15012 0 }] }
2021-01-11T06:56:48.435613Z info ClientConn switching balancer to “pick_first”
2021-01-11T06:56:48.435758Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {CONNECTING }
2021-01-11T06:56:48.440865Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {READY }
2021-01-11T06:56:48.469744Z info sds SDS gRPC server for workload UDS starts, listening on “/etc/istio/proxy/SDS”

2021-01-11T06:56:48.469784Z info PilotSAN string{“istiod.istio-system.svc”}
2021-01-11T06:56:48.469796Z info Starting proxy agent
2021-01-11T06:56:48.469823Z info sds Start SDS grpc server
2021-01-11T06:56:48.469847Z info Opening status port 15020

2021-01-11T06:56:48.469866Z info Received new config, creating new Envoy epoch 0
2021-01-11T06:56:48.469903Z info Epoch 0 starting
2021-01-11T06:56:48.471992Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster helloworld.test --service-node sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error --concurrency 2]
[Envoy (Epoch 0)] [2021-01-11 06:56:48.511][33][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2021-01-11 06:56:48.511][33][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2021-01-11T06:56:49.354477Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-1 resource:ROOTCA new connection
2021-01-11T06:56:49.354508Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-2 resource:default new connection
2021-01-11T06:56:49.442612Z info cache Root cert has changed, start rotating root cert for SDS clients
2021-01-11T06:56:49.442771Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-2 resource:default pushed key/cert pair to proxy
2021-01-11T06:56:49.442785Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-2 resource:default pushed secret
2021-01-11T06:56:49.554764Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-1 resource:ROOTCA pushed root cert to proxy
2021-01-11T06:56:49.554788Z info sds node:sidecar~100.96.0.136~helloworld-v1-5b75657f75-8jbxd.test~test.svc.cluster.local-1 resource:ROOTCA pushed secret
2021-01-11T06:56:50.777418Z info Envoy proxy is ready
2021-01-11T07:25:54.120353Z info transport: loopyWriter.run returning. connection error: desc = “transport is closing”
2021-01-11T07:25:54.120389Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {CONNECTING }
2021-01-11T07:25:54.124463Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {READY }
[Envoy (Epoch 0)] [2021-01-11 07:27:48.453][33][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2021-01-11T07:55:12.360908Z info transport: loopyWriter.run returning. connection error: desc = “transport is closing”
2021-01-11T07:55:12.360916Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {CONNECTING }
2021-01-11T07:55:12.365231Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {READY }
[Envoy (Epoch 0)] [2021-01-11 07:59:22.400][33][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2021-01-11T08:28:07.337702Z info transport: loopyWriter.run returning. connection error: desc = “transport is closing”
2021-01-11T08:28:07.337749Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {CONNECTING }
2021-01-11T08:28:07.341835Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000702290, {READY }
[Envoy (Epoch 0)] [2021-01-11 08:30:17.205][33][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,

Seems I did not find something unusual.
Then the next step I am planning to check the server side log. But the problem is , as a service entry, I did not find its envoy proxy log as it is actually an external service.
And I am sure the call did not reach the external service when such error shows.

Any hint will be appreciated.

Best regards,
James You

1 Like

did you solve it? I have same problem.