We are using Istio in production, and weighting Destination Rules for releasing newer versions in a canary like fashion. We are also using the Stackdriver adaptor. We updated the weight distribution of a new version to 100%, and all other versions set to 0%.
Viewing the logs of the each version, we were able to confirm that indeed no traffic was being routed to the old versions and all traffic was correctly served from the new 100% version. But when reviewing Stackdriver, the metrics did not seem to reflect (for any metric really). We made this distribution change at approximately 2PM and Stackdriver didn’t trail off until ~7PM.
The following Stackdriver graph shows Istio Server Request Count (istio.io/service/server/request_count) for each version. Filtered by source=istio-ingressgateway & destination=(our service name)
We’re curious about these observations, and have struggled to come up with any good rational for why the graph suggests a five hour trail off which appears to still be staggered traffic (not a straight line to 0) while we confirmed traffic was routed as expected via application logging.