Server authentication for TCP services relies on DNS

Take the scenario from the security concepts page:

Suppose the legitimate servers that run the service datastore only use the infra-team identity. A malicious user has certificate and key for the test-team identity. The malicious user intends to impersonate the service to inspect the data sent from the clients. The malicious user deploys a forged server with the certificate and key for the test-team identity. Suppose the malicious user successfully hacked the discovery service or DNS to map the datastore service name to the forged server.

When a client calls the datastore service, it extracts the test-team identity from the server’s certificate, and checks whether test-team is allowed to run datastore with the secure naming information. The client detects that test-team is not allowed to run the datastore service and the authentication fails.

It appears that envoy builds the allowed SANs list for a TCP server based on the listener config which is selected using the destination IP address of the outbound packet. There is not much else to use in TCP. So, in the example scenario, the client (envoy) would extract the test-team identity from the server’s certificate and check it against the list of valid names that are associated with the malicious destination IP address returned from DNS. Envoy is asking the question: is test-team allowed to run the malicious datastore. The answer is yes and envoy will make the connection, contrary to what’s stated in the documentation.

Did I miss something that mitigates this attack?

@pbohman, I think your analysis is spot on here. Would you be willing to suggest some alternate language for the docs that clarifies this point?

I believe we would be resilient to malicious manipulation of DNS responses in the case of HTTP, where we can examine the host header to see where the user was attempting to dial. But, as you say, for TCP we do depend on getting correct DNS responses for the secure naming to work.

@pbohman, thank you for catching this. I agree with Spike that we should update the doc to call this caveat out and make it more accurate on what secure naming is used for. We will send out a PR, please help comment.

Secure naming is for protecting against general network hijacking attack. DNS spoofing is one example. There are other means to hijack network like BGP/Route hijacking, ARP spoofing, etc. I doubt there is much we can do to protect against DNS spoofing for TCP services as you described, DNS lookup happens even before client-side envoy sees the traffic. But it is still useful to protect against other types of hijacking attack. @Tao_Li

Thanks for the quick response, WG discussion, and document updates. I look forward to discussing solutions.

Here is the PR, feel free to comment

For the simplest solution, can we defend DNS hack for TCP traffic by assigning a different port for each service?

We could do that, but it would no longer be “transparent” to the application. How would the application learn which port it should use to reach a particular service?

We cannot use DNS SRV records, as that would leave us with the same problem as we have now if DNS is compromised.

I agree it’s not transparent to the applications. Just thinking about possible approaches to distinguish the service at TCP layer. :slight_smile:

Yep. What I would say is that at pure TCP, in terms of distinguishing services we really only have

  • IP address
  • port

or a combination of the two. The issue we’re talking about above isn’t so much that we can’t distinguish services per se, it’s that we don’t have a secure way to go from application intent (typically a name) to the thing that we use to distinguish (IP/port). Switching from IP to port or vice versa doesn’t solve that fundamental issue of how to reliably capture the application intent.

One thing that would allow us to capture intent is to use named Unix Domain Sockets. For example, mount a shared volume and provide sockets like /var/run/envoy/services/my-svc.default.cluster.local

That wouldn’t be transparent, as applications would need to be modified to dial domain sockets instead of TCP sockets, but at least it does solve the problem of understanding user intent.