Non-deterministic LDS updates causing connection draining

We have a deployment of istio using multiple pilots on both 1.2.4 and 1.3. A given pilot will periodically take over sending listener discovery service (LDS) updates to downstream istio-proxy containers. Unfortunately, it appears that these updates are not deterministically serialized and do not have a canonical representation. This is problematic for filter chains containing multiple role based access control (RBAC) policies. For example, one pilot might serialize the LDS as RBAC entity X and Y and Z are allowed … while another might serialize RBAC as entity Y and X and Z are allowed. (note: I have captures from istio proxy-config listeners and istio-proxy/envoy traces that appear to point to this). As a result, all previously established connections managed by the istio-proxy receiving the LDS update will begin to drain connections held on the previous listener while the new listener configuration is applied. This causes the tcp sessions on the old listener to get disconnected after the drain interval.

Any RBAC changes will also have this affect; regardless of fixes to deterministic serialization … just by having the filter change, it will trigger connection draining when the LDS is sent.

Is this a known issue? Are there mitigations or implementations folks are considering to address this problem to allow long lived connections to stay established?

The only short term mitigation we have right now is to operate with a single pilot to make the serialization deterministic, but this is not great given a lack of redundancy. … and we will eat the connection reset for any RBAC updates.

I’ve seen discussion about adding a filter chain discovery (FDS) service in envoy to prevent his kind of connection draining from chain updates:

Are these the current plans for istio pilots / proxy’s to move to? Any other idea/suggestions welcome to support longer lived sessions?

Is it non-deterministic because it connects to a 1.2.4 pilot then a 1.3.0 pilot? Or is it not deterministic within the same version?

Same version. Each cluster had 3 pilots running. The clusters were completely separate and isolated. One cluster was running 1.3 and the other (larger cluster) was running 1.2.4.

On the 1.2.4 cluster, I was seeing what appeared to be non-deterministic LDS serializations in the RBAC policy. This was confirmed by monitoring the connection state of a long lived tcp session from one pod to another pod and noticing its termination time. I could grab the listener config using istioctl proxy-config listeners on the server that received the LDS update before and after. The only perceived difference was the order of the RBAC policy.

I cannot say for certain that this was at play on the 1.3 cluster. I did not have sufficient logging at the time on the 1.3 cluster.

A colleague of mine referred me to:


which sounds relevant.

The 1.2.4 deployment had the following environment variables defined
POD_NAME: istio-pilot-58bb688bd8-t252x (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
GODEBUG: gctrace=1
PILOT_PUSH_THROTTLE: 100
PILOT_TRACE_SAMPLING: 1
PILOT_DISABLE_XDS_MARSHALING_TO_ANY: 1

The 1.3 had the following environment variables defined:
POD_NAME: istio-pilot-7d7969b47f-lbnpv (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
GODEBUG: gctrace=1
PILOT_PUSH_THROTTLE: 100
PILOT_TRACE_SAMPLING: 1
PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_OUTBOUND: true
PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_INBOUND: false

Thanks for the details! I suspect that issue is not related here, as it is turned off in 1.2 - it maybe could be impacting your 1.3 clusters though. Could you provide your rbac config and I can take a look at what we generate, it may be as simple as needing to sort a map or something

Sure thing! Thanks again for the help … very much appreciated!

Here are the relevant ServiceRoleBindings with redactions. Each binding provides a single service access. The bindings are providing access to other subjects in different namespaces.

apiVersion: v1
items:
- apiVersion: rbac.istio.io/v1alpha1
  kind: ServiceRoleBinding
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"rbac.istio.io/v1alpha1","kind":"ServiceRoleBinding","metadata":{"annotations":{},"name":"tcp-echo-access-sleep-c-test-ns","namespace":"a-test-ns"},"spec":{"roleRef":{"kind":"ServiceRole","name":"tcp-echo-access"},"subjects":[{"user":"redacted.trustdomain.com/clustername/ns/c-test-ns/sa/test-sleep-sa"}]}}
    creationTimestamp: "2019-09-20T18:56:51Z"
    generation: 1
    name: tcp-echo-access-sleep-c-test-ns
    namespace: a-test-ns
    resourceVersion: "6454183"
    selfLink: /apis/rbac.istio.io/v1alpha1/namespaces/a-test-ns/servicerolebindings/tcp-echo-access-sleep-c-test-ns
    uid: 681b37e4-dbd8-11e9-a557-a0423f35e8da
  spec:
    roleRef:
      kind: ServiceRole
      name: tcp-echo-access
    subjects:
    - user: redacted.trustdomain.com/clustername/ns/c-test-ns/sa/test-sleep-sa
- apiVersion: rbac.istio.io/v1alpha1
  kind: ServiceRoleBinding
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"rbac.istio.io/v1alpha1","kind":"ServiceRoleBinding","metadata":{"annotations":{},"name":"tcp-echo-access-sleep-a-test-ns","namespace":"a-test-ns"},"spec":{"roleRef":{"kind":"ServiceRole","name":"tcp-echo-access"},"subjects":[{"user":"redacted.trustdomain.com/clustername/ns/a-test-ns/sa/test-sleep-sa"}]}}
    creationTimestamp: "2019-09-20T18:56:53Z"
    generation: 1
    name: tcp-echo-access-sleep-a-test-ns
    namespace: a-test-ns
    resourceVersion: "6454190"
    selfLink: /apis/rbac.istio.io/v1alpha1/namespaces/a-test-ns/servicerolebindings/tcp-echo-access-sleep-a-test-ns
    uid: 68ffb653-dbd8-11e9-b5af-a0423f37743c
  spec:
    roleRef:
      kind: ServiceRole
      name: tcp-echo-access
    subjects:
    - user: redacted.trustdomain.com/clustername/ns/a-test-ns/sa/test-sleep-sa
- apiVersion: rbac.istio.io/v1alpha1
  kind: ServiceRoleBinding
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"rbac.istio.io/v1alpha1","kind":"ServiceRoleBinding","metadata":{"annotations":{},"name":"tcp-echo-access-sleep-b-test-ns","namespace":"a-test-ns"},"spec":{"roleRef":{"kind":"ServiceRole","name":"tcp-echo-access"},"subjects":[{"user":"redacted.trustdomain.com/clustername/ns/b-test-ns/sa/test-sleep-sa"}]}}
    creationTimestamp: "2019-09-20T18:56:50Z"
    generation: 1
    name: tcp-echo-access-sleep-b-test-ns
    namespace: a-test-ns
    resourceVersion: "6454173"
    selfLink: /apis/rbac.istio.io/v1alpha1/namespaces/a-test-ns/servicerolebindings/tcp-echo-access-sleep-b-test-ns
    uid: 673856cf-dbd8-11e9-957f-a0423f35ead2
  spec:
    roleRef:
      kind: ServiceRole
      name: tcp-echo-access
    subjects:
    - user: redacted.trustdomain.com/clustername/ns/b-test-ns/sa/test-sleep-sa
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

The resulting filter chain found in the LDS update on the tcp-echo’s istio-proxy shows:

"filters": [
    {
        "name": "envoy.filters.network.rbac",
        "config": {
            "rules": {
                "policies": {
                    "tcp-echo-access": {
                        "permissions": [
                            {
                                "and_rules": {
                                    "rules": [
                                        {
                                            "any": true
                                        }
                                    ]
                                }
                            }
                        ],
                        "principals": [
                            {
                                "and_ids": {
                                    "ids": [
                                        {
                                            "authenticated": {
                                                "principal_name": {
                                                    "exact": "spiffe://redacted.trustdomain.com/clustername/ns/b-test-ns/sa/test-sleep-sa"
                                                }
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "and_ids": {
                                    "ids": [
                                        {
                                            "authenticated": {
                                                "principal_name": {
                                                    "exact": "spiffe://redacted.trustdomain.com/clustername/ns/c-test-ns/sa/test-sleep-sa"
                                                }
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "and_ids": {
                                    "ids": [
                                        {
                                            "authenticated": {
                                                "principal_name": {
                                                    "exact": "spiffe://redacted.trustdomain.com/clustername/ns/a-test-ns/sa/test-sleep-sa"
                                                }
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            },
            "stat_prefix": "tcp."
        }
    },
    {
        "name": "envoy.tcp_proxy",
        "config": {
            "cluster": "inbound|9000|tcp|tcp-echo.a-test-ns.svc.cluster.local",
            "stat_prefix": "inbound|9000|tcp|tcp-echo.a-test-ns.svc.cluster.local"
        }
    }
]

The order of the and_ids are what appear to be non deterministic.

Thanks for all the details, you made it really easy to find what I think is the root cause. I opened an issue: https://github.com/istio/istio/issues/17347. Hopefully we can get this fixed asap in 1.3 and 1.2

Awesome! Thank you again!

I guess the follow-up to this is … are there any plans for prevent or more selectively control connection draining as RBAC policy updates are made?

If I understand this issue correctly, once any RBAC policy is updated for a service (i.e. new ServiceRoleBinding added or removed or any policy update), all downstream listeners will get drained due to a modified filter chain in the LDS update.

The ideal behavior would be that existing (and still) authorized connections are not drained. While no longer authorized connections are either immediately terminated or allowed to drain.

… this sounds much more complicated to address.

we do have plans to make listeners drains more isolated - which you linked the fcds stuff above. I don’t know there are any plans to allow much beyond that though. I do see the use case though, but it maybe very complicated- you would need to keep an arbitrary amount of configuration in memory in envoy

Isolating listener drains using FDS sounds like a great approach and goes a long way without much added complexity. Thank you again for the help!

Just to clarify, FDS will not solve your problem. FDS addresses a different problem wherein we use filter chains instead of orig dst listeners. But your problem is one wherein you want the filter itself to reload its configuration dynamically and re-authorize existing connections without impacting the listener. This is unfortunately not supported in the Istio RBAC filter today. You would have to write a custom filter that dynamically fetches configuration from a backend and applies the authorization on existing and new connections. But envoy’s architecture does not lend itself easily to do this kind of thing. A filter chain is newly instantiated for every connection, it gets a read only copy of its config during init and does not look back. So, even if a custom filter was to ask the control plane for updated config, it would end up causing every instance of this filter in every listener for every connection to ask the control plane for the configuration. Such a model would result in a very high load and SPOF on the control plane, creating availability issues.

One alternative is to create a single persistent config fetcher thread that fetches configuration and notifies all instances of this filter in envoy (across all cpus), causing these filter instances to reload their state and re-evaluate. We need to look into the complexity cost of such an implementation though.