"gRPC config stream closed: 13" error in Envoy proxies?

Hi,

We’ve installed Istio 1.4.3 in kind of a “soft” way on our cluster. Meaning we don’t use the Istio gateway yet, but left traffic coming in over our Nginx ingress controller which was there from before we installed Istio.

We used the following command to create the manifest:

istioctl manifest generate \
  --set values.gateways.istio-ingressgateway.sds.enabled=true \
  --set values.global.k8sIngress.enabled=true \
  --set values.global.k8sIngress.enableHttps=false \
  --set values.global.k8sIngress.gatewayName=ingressgateway \
  --set values.grafana.enabled=true \
  --set values.kiali.enabled=true \
  --set values.tracing.enabled=true

All things are working fine for HTTP traffic, but we are unable to get gRCP calls to work. We’re trying a simple service-2-service connection within the cluster using insecure mode (so no TLS yet).

What we do see is many many errors in the Envoy output:

[Envoy (Epoch 0)] [2020-01-21 18:06:00.646][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13, 

These occur very frequently, sometimes every minute, sometimes somewhat longer in between.

Anyone with more in-depth knowledge have a clue as to what could be wrong? I’m guessing our config is note quite correct after all? It’s certainly interesting that HTTP traffic doesn’t seem to be affected and works just fine – but then again this traffic is coming from the outside and is not going service-2-service.

Any help or input would be appreciated!

1 Like

Update on this issue: we proved gRPC communication does actually work (debugged using grpcurl).

But the error remains and is flooding our logs :-1:

So how can we get rid of this? Still smells like a bad config somewhere…

Hi, :slight_smile:

I’m no advanced user and we do not use gRPC anywhere except Istio, but have you followed the port naming conventions? https://istio.io/docs/ops/configuration/traffic-management/protocol-selection/

Yes we did, two ports are defined and named http and grpc.

Nice. :slightly_smiling_face: This log line occurs when gRPC connection is closed. For us it happen after 30m when connection between proxy and pilot is closed. Could it be that those streams are closed via an application?

Interesting point, but I don’t immediately see how – unless our way of dealing with the gRPC client is wrong. We’ll investigate, thanks for the input!

EDIT: you are talking about the connection between proxy and pilot – but that’s something our application certainly won’t be messing with I think.

Actually, we just realized those errors are simply occurring all the time, even when there are no gRPC calls happening at all. So it’s unlikely it’s our application’s gRPC client then :slight_smile:

An interesting observation is that the error seems to occur less on pods that are receiving gRPC traffic…

Could it be that that 30 min timeout is configured incorrectly on our setup perhaps?

So I double-checked and our timeout is also set to the default of 30m:

keepaliveMaxServerConnectionAge: 30m

I was talking about pilot-proxy connection because we don’t use gRPC in our stack and I hoped that it could help you out. :slightly_smiling_face:

If the only issue is that it is flooding your logs then you could turn the serenity of sidecar to warn or something similar. :slight_smile: