Production level telemetry

Telemetry encompasses a large number of items / subsystems across Istio. Use of these items is highly-customizable and depends on individual use cases. I can provide a brief overview here, but I’d suggest by starting with a read-thru of the concept guides (telemetry concept) on the website – especially if you are confused about components, their roles, and how they work.

Istio telemetry covers:

  • proxy-generated stats and traces (no/limited configurability)
  • Mixer-generated telemetry for mesh traffic based on operator-supplied config (logs, metrics, traces, edges, custom) - (highly configurable, optional)
  • Istio component metrics (for monitoring pilot, for instance)

By default (standard install), Mixer’s involvement in tracing is purely as an application. If a request is sampled and Mixer is called as part of that request (istio-policy check()), then Mixer will contribute spans related to its own processing of the request (including various adapter outcomes in release-1.1).

However, it is possible to also use Mixer to generate trace spans for the entire mesh (as opposed to having the proxies send tracespans directly to a tracing backend). This allows for operator control of the content and structure of the tracespan data. Mostly this is used for customization of the tags, etc., that get attached to spans.

This is possible because for each request that flows through a proxy in Istio, metadata is collected about the handling of that request. That metadata is batched across a number of requests and then asynchronously sent to Mixer for processing (istio-telemetry report()s).

Because Mixer is sent metadata about each request by each proxy within the mesh, it can generate telemetry of any shape or format desired, even custom formats. Tracing is just one example of the kind of telemetry that can be generated based on Mixer configuration (templates and instances).

Mixer generation is controlled by rules. Rules encode logic about which conditions should trigger their execution, what data to generate when triggered, and where to dispatch the generated data. By crafting these rules and the expressions that are used in data generation, operators are able to precisely control when and how telemetry is generated.

Common use cases are to exclude services or request types (think health checks) from telemetry generation as well as to customize the labels generated for metrics to include custom headers, etc.

I hope that helps,
Doug.

1 Like