Production level telemetry


#1

The telemetry part of the Tasks of Istio’s official website does not seem to be suitable for general production environments. I want to know how to deploy the production-level telemetry part, or if I use my own way to deploy Jaeger, EFK, etc., how to work with Istio. Thank you


#2

You are right that the general add-ons to Istio are not generally configured for high-reliability production usage. The intent of those add-ons is really to provide basic functionality / demo features through other open systems. Istio itself is trying to avoid owning the production-ization of the add-ons.

That being said, there is work planned to ease the use of things like the Prometheus Operator, the Jaeger operator, and adding persistent volumes for Grafana. We publish (though it needs to be more regular) the dashboards we use for Grafana, and our config for prometheus is consumable.

If you have specific questions about how to access the various components with your own deployments, I can try to provide some guidance. I think you’ll see more documentation around this concern in the next few months, as well.


#3

I think for starters it would be better to exactly specify what comprises of telemetry. Is it metrics of data plane? is it tracing data? what is mixer exactly doing? Until reading this faq I had no idea mixer was involved in tracing, which isn’t exactly obvious.

I also want to know more about

In this way, operators can precisely control when and how trace data is generated and perhaps remove certain services entirely from a trace or provide more detailed information for certain namespaces.

That the faq just brushes past but I am still clueless about how to do this because I don’t understand how mixer is working.


#4

Telemetry encompasses a large number of items / subsystems across Istio. Use of these items is highly-customizable and depends on individual use cases. I can provide a brief overview here, but I’d suggest by starting with a read-thru of the concept guides (telemetry concept) on the website – especially if you are confused about components, their roles, and how they work.

Istio telemetry covers:

  • proxy-generated stats and traces (no/limited configurability)
  • Mixer-generated telemetry for mesh traffic based on operator-supplied config (logs, metrics, traces, edges, custom) - (highly configurable, optional)
  • Istio component metrics (for monitoring pilot, for instance)

By default (standard install), Mixer’s involvement in tracing is purely as an application. If a request is sampled and Mixer is called as part of that request (istio-policy check()), then Mixer will contribute spans related to its own processing of the request (including various adapter outcomes in release-1.1).

However, it is possible to also use Mixer to generate trace spans for the entire mesh (as opposed to having the proxies send tracespans directly to a tracing backend). This allows for operator control of the content and structure of the tracespan data. Mostly this is used for customization of the tags, etc., that get attached to spans.

This is possible because for each request that flows through a proxy in Istio, metadata is collected about the handling of that request. That metadata is batched across a number of requests and then asynchronously sent to Mixer for processing (istio-telemetry report()s).

Because Mixer is sent metadata about each request by each proxy within the mesh, it can generate telemetry of any shape or format desired, even custom formats. Tracing is just one example of the kind of telemetry that can be generated based on Mixer configuration (templates and instances).

Mixer generation is controlled by rules. Rules encode logic about which conditions should trigger their execution, what data to generate when triggered, and where to dispatch the generated data. By crafting these rules and the expressions that are used in data generation, operators are able to precisely control when and how telemetry is generated.

Common use cases are to exclude services or request types (think health checks) from telemetry generation as well as to customize the labels generated for metrics to include custom headers, etc.

I hope that helps,
Doug.


#5

Thank you for your answer. I now deployed Jaeger in my own way, including three parts, agent, collector, and query, but I didn’t find a way to integrate and work with Istio. The deployed Jaeger and Istio are in the same K8s cluster. I deployed Jaeger with reference to this article, there is currently no data in Jaeger’s UI, that is mean the tracking information is not sent to Elasticsearch, and I don’t know how to configure the tracking information to be sent to the specified Elasticsearch.


#6

Have you pointed the mesh at your service? There is a helm option for install global.tracer.zipkin.address iirc.

If that is working as expected and you are still having issues with Jaeger itself, I suggest reaching out to the Jaeger team for further guidance on issues such as Elasticsearch configuration, etc.


#7

@Owen-Chen You need to have a service in istio-system ns called “zipkin”. All istio components use this service address to push their traces to Jaeger. It is explained in detail here - https://github.com/istio/istio/issues/8893