Metric expiry in TelemetryV2 proxies

Community,

Istio Mixer used to have a useful feature called metricsExpirationPolicy, which meant that Mixer would stop holding on to a metric after a certain period. This feature has not been implemented in Telemetry V2 yet.

In our environments, where new workloads are created and then destroyed periodically, Envoy proxies keep accumulating huge amount of time series (hundreds of thousands), referencing workloads that are long gone. It increases memory pressure on the Prometheus side, which eventually leads it to be OOM killed.

One solution proposed by the Istio Community was to drop or normalize some labels to decrease the cardinality. It’s not suitable for us since we want to keep the labels as they are. We just want Envoy proxies to stop exposing time series that were inactive for some time. Restarting Envoy proxies fixes the problem, but is obviously out of question for production environments.

Does anyone have an idea how to circumvent this issue?