Multi-cluster destination_service labels

Hi,

I have a multi-cluster setup using the multiple control plane approach as described in https://istio.io/docs/setup/kubernetes/install/multicluster/gateways/.

I played a little in how to setup the ServiceEntries and Deployments. I ended with this:

In a “ClusterA”:

  • A traffic generator deployment making requests to httpbin.bar.global. This deployment is in namespace “bookinfo” (nothing related to the bookinfo sample, I just reused the namespace).
  • A ServiceEntry to capture requests to httpbin.bar.global and routing traffic to a remote cluster (let’s name it “ClusterB”). This service entry is created in the “test” namespace.

In ClusterB:

  • I created the httpbin service using the samples/httpbin/httpbin.yaml file. I applied this file in the “bar” namespace.

All is working fine. Traffic is flowing good. But I have a question about the telemetry. In ClusterB, timeseries are being recorded like this:

istio_requests_total{
  connection_security_policy="mutual_tls",
  destination_app="httpbin",
  destination_principal="cluster.local/ns/bar/sa/default",
  destination_service="httpbin.bar.global",
  destination_service_name="httpbin.bar.global",
  destination_service_namespace="test",
  destination_version="v1",
  destination_workload="httpbin",
  destination_workload_namespace="bar",
  instance="172.17.0.18:42422",
  job="istio-mesh",
  permissive_response_code="none",
  permissive_response_policyid="none",
  reporter="destination",
  request_protocol="http",
  response_code="200",
  response_flags="-",
  source_app="unknown",
  source_principal="cluster.local/ns/bookinfo/sa/default",
  source_version="unknown",
  source_workload="unknown",
  source_workload_namespace="unknown"}	

A want to emphasize that this is telemetry at ClusterB (the destination of the traffic). What I think is weird is that the destination_service_namespace label has “test” value. I want to ask if this is the intended behavior.

It’s weird because this is telemetry at ClusterB. I was expecting to see values from a point of view of ClusterB and since traffic is hitting the “httpbin” service in “bar” namespace, I was expecting a destination_service_namespace=“bar” label. I know that “default” “test” is the namespace of the ServiceEntry, but that is ClusterA.

Telemetry at ClusterA looks consistent.

This is the expected behavior.

The destination.service.* attributes are actually added by the client-side proxy, and forwarded to the server-side proxy for use in policy/telemetry. This is a bit counter-intuitive, but it reflects the fact that the destination workload does not know any information about the service through which it was invoked.

In your configuration, you defined the service entry in the namespace test and then called it from cluster A. For the client, this is equivalent to calling a service in namespace test.

The destination.workload.namespace attribute, on the other hand, reflects the running location of the workload that actually received the request (here, bar).

All this being said, improving telemetry around ServiceEntrys, especially in regards to multi-cluster scenarios is something we want to take a closer look at in the next cycle. One of the things we could consider is creating attributes like (assuming using the example you linked above):

destination.service.host: httpbin.bar.global
destination.service.name: httpbin-bar
destination.service.namespace: test

The key difference being that instead of using the hostname as both the destination.service.host and destination.service.name attributes, we would use the ServiceEntry name as the destination.service.name. That might be more straightforward. But, we probably want to think about that a bit more.

This would make a great topic for the next P&T working group meeting, if you would like to raise it there.