Pilot produces non-deterministic envoy listeners when service endpoints overlap

Problem: Long-lived connections to the tcp-echo service are periodically reset without end-user configuration changes.

Analysis: The test deployment defines two services associated with tcp-echo port: one service for each pod in the statefulset and another service load balanced across all pods. Both are TCP services. We see the listener config for inbound port 9000 alternate between the two services defined for a single port/pod tuple (example listener change). The existing connections are drained each time the listener config changes. We also observed the “Conflicting inbound listener” message in the pilot logs. This is a simplified deployment we used to replicate the issue we first saw with our redis deployment.

Steps to Reproduce

  • Tested with istio 1.3.0
  • Deploy tcp-echo services with overlapping endpoints (example)
  • Monitor the state of a pod’s listener config using while true; do date; istioctl proxy-config listeners $pod -n $namespace -o=json | sha256sum; done
  • Expected behaviour: The listener config remains constant
  • Observed behaviour: The listener config periodically changes. See example diff related to change.

Questions / Potential Solutions:

  • Sort the services in pilot before assigning them to a listener. This way the same inbound service deterministically wins the pod/port contention.

  • Instead of creating two k8s services, create a VirtualService on a different port (9001) and somehow map it to the real port (9000). Haven’t landed a solution with this approach yet. Is there a configuration that we’re missing that supports overlapping listeners?

  • Create a single VirtualService covering each pod in place of the tcp-echo service that spans all pods. This causes a “blackhole” when one pod is out of service.

  • Create a TLS SNI filter chain match to multiplex multiple services on the same pod/port. This will not work for TCP services without TLS.

  • Update the istio requirements to state that a pod/port tuple cannot belong to more than one service.