Supporting gRPC in Telemetry

gRPC uses HTTP/2 as a transport mechanism, so there’s a distinction between transport protocol/response and logical protocol/response. Notably, a failed gRPC request has HTTP status of 200. Istio’s default configuration handles this inconsistently between metrics and logs, and it does not report any useful metrics for failed gRPC requests.

I have a proposal here: https://docs.google.com/document/d/15c4plpRh-TFREGLl0Z5RHGy9zyo5M2dxuUqcZSBfEXk/edit

There’s an open question about the right way to represent response codes in metrics for a protocol like gRPC what uses HTTP as a transport mechanism. I hope to discuss this at the next P&T WG meeting.

(I’ll be offline on vacation so I won’t reply to anything for the next week, but can respond after.)

We discussed this on the P&T WG on 6/19, and I chatted further with @douglas-reid.

I updated my proposal: https://tinyurl.com/yy2cwmv5

New proposal is that we should add a grpc_response_status dimension on metrics.