How to coordinate warm up and kubernetes rolling update?

Background

Istio provides warm up feature in destination rule (slow start mode in envoy) after v1.14.

It is a great feature to JVM-based service and we are planning to use it to resolve the slow start issue of our java service. Small amount of requests at start time can help JVM do JIT compilation while not eating up all cpu resources.

However, we found out that it is hard to combine the feature with kubernetes rolling update.
When a service is in slow start mode (warm up), it needs to receive traffic, so it have to be ready in kubernetes to make the endpoint available. At the same time, during kubernetes rolling update, the controller will regard it as ready too and go update the next instance, which may lead to a situation that all the available instances of the service are in slow start mode, which will make it effectless.

Question

So, my question is, how to elegantly coordinate warm up and kubernetes rolling update? The ideal would be a newly brought up instance can enter slow start mode and receive traffic, but the old instance should keep receive traffic until the new instance finish warming up.

Some approaches I thought about:

  1. Add shutdown hook to the pod to make old instance wait some time before shutting down.
    It does not work as when it triggers the shutdown hook, it no longer receives new traffic.
  2. Add wait time between rolling update
    It should work but I found no way to do it without changing the source code.
  3. Change the way kubernetes evaluate available node count during rolling update (regard nodes in warm up mode as unavailable)
    Found no way to do it without changing the source code.
  4. Change the implemenation of istio pilot to do EDS.
    It also needs changing the source code and kinda complex.

Did you try using these flags to control how many new pods should come up Rolling Updates with Kubernetes Deployments | Kubernetes?

Hi!
I read the doc of kubernetes rolling update before and I don’t think maxSurge can do the job.
It only affects the number of new instances coming up concurrently, but it doesn’t change the fact that all instances of the service will be replaced and still in warm up mode if the warm up time is relatively long like a few minutes and start time (time to be ready) is short.