Mixer - telemetry throttling behavior

@douglas-reid I have a very basic setup where HTTP requests are coming in through istio-ingressgateway and are being routed onto a service which is backed by a replica set amazing-product-caller. Each pod from that replica set is then multiplying each incoming request by 10x by making 10 requests to another replica set amazing-product-callee for each incoming request.

There is also a custom out-of-process mixer adapter which is subscribed to tracespans and logs/plots every tracespan it receives from mixer. It is configured to receive every trace span with no limitations.

This is what it looks like after a single request has been issued (mixer/pilot pods are not shown on purpose, even though Check() and Report() tracespans are also being received):
image

The above picture is correct and expected.

At higher request loads, however, I’m starting to see that only a fraction of trace spans is being reported even with load shedding disabled in mixer. I’ve scaled everything out to ensure it’s not being dropped due to insufficient resources, but still I’m only seeing a fraction of trace spans. My load testing suite is showing a certain number of successful requests that have been executed, but I’m only seeing part of that in the adapter.

In this load test, for instance, the cluster has been under a constant load of about 50 requests per second for 15 minutes. The actual number of successful requests issued against the gateway is roughly 45,000.

But the adapter has only received about 22,000:

image

My guess is it must be sampled/throttled on the proxy side, otherwise it doesn’t make sense that mixer instances are not utilizing their resource quotas (and neither does the custom adapter), but trace spans are still somehow dropped.

Any idea what to look for here?