How to design a deployment strategy for Istio applications that use traffic routing

I am trying to figure out a good Istio based build/deploy pipeline for an application. The application supports a number of distinct environments (e.g., pre-staging, staging, and production, etc.), as well as feature branch deployments (e.g., feature-jp12, bugfix-dp34, etc.). The environments will be routed based on the hostname, and feature branches based on a header value.

I’ve found a bunch of documentation and blog posts about how to set up Istio’s VirtualService, DestinationRule, and so on, but I’m struggling with how to actually implement it in a CI/CD pipeline.

The application is built from source code, and the image name and tag combine to specify the environment and feature branch (if any). This information is plugged into a kustomize pipeline to correctly build all the definitions.

The problem that I’m having is that I want builds to be independent. So when a branch is built, it produces a new k8s Deployment, and it should then be deployed to the k8s cluster. The problem is that other branches have been built, so there is already a VirtualService/DestinationRule for the application in the cluster! I can write code or use kustomize to get the currently running VritualService/DestinationRule and update it to add the routing to the Deployment that was just built and deployed, but this has a number of problems.

First, there is a timing problem, as there is no global lock, so two deployments could occur at the same time, with one deployment overwriting the changes in the VirtualService that the other made. I can work around this by building a lock in the CI/CD system, so this seems solvable. Are there better approaches?

If I go with the approach above, I have the problem of garbage collection. I don’t want every version ever deployed to be on the cluster, so I need some way to clean up the VirtualService and DestinationRules. In a perfect world, I would delete the Deployment and the parts of the VirtualService/DestinationRule that pointed to that deployment would be removed. However, it is my understanding that this isn’t possible.

What are other people doing in this space? Is the solution to either hand edit/maintain the VirtualService/DestinationRule, adding and removing versions of the application? Or more custom tooling to do this automatically? I feel like I must be missing something.

you can have multiple virtual services for same host (at gateway only). they get merged. same with dest rules. define dest rules with just the subset and nothing else

Thanks, that makes sense for deploying new subsets.

How does it work for deletes/cleanup?

It is the same thing. A good structure I’m thinking is to have service A v1 in a branch for repo A, and latest v2 in master for repo A. The route config and gateway config should be tracked separately in a different repo X. Then as you deploy v1/v2 onto your cluster, you can make changes to repo X to push through the desired gateway or route config changes.

Does this make sense?

That does make sense in terms of workflow, but I’m still trying to figure out how to automate the management of the routing. In your example it’s the “make changes to repo X to push through the desired gateway or route config changes” part.

Let’s say I have one service with three versions: v1, v2, v3. The versions are not feature branches, just versions. v1 and v2 are deployed, and the routing configured for 100% going to v2. We then build and deploy v3, and the following workflow is started:

  1. Configure routing so 1% of traffic is sent to v3
  2. If everything looks good, route 100% of traffic to v3
  3. If there are errors, route 100% of traffic to v2.

v1 acts as the “previous version” of v2. If v3 is good, and it is handling 100% of traffic, then v1 is no longer needed, and can be removed (both Deployment and traffic routing). It is this removal of v1 (or v2 later) that is tripping me up. I can’t delete the VirtualService (as it’s still in use) and k8s merges the yaml, so how do I remove the references to v1 in the VirtualService/Deployment?

The best I can think of right now is to write a script that looks for old versions, but I’d really like to avoid parsing yaml or json and directly modifying it, as this feels brittle and highly dependent on Istio data structures, and I’m not even sure if it’s possible (given the merge behavior of k8s).

1 Like

Sorry for the slow reply, but perhaps I’m missing something simple here. If you want to remove references of v1 in VS/deployment because v2 or v3 is good, you may simply modify your virtual service to not route any traffic to v1 then apply it to your mesh. Once that is done you may delete the v1 deployment.

Note: you always want to store your VS resource yaml in git and roll out these changes via a git workflow like gitOps.

Sorry for the delay in my reply.

The thing that I am trying to figure out is how to working with the Virtual Service when there is more than one process editing it. Each branch’s build/deploy is independent and automated. So the deploy would need to do something like the following

  1. Download current Virtual Service
  2. Add newly deployed version to the Virtual Service
  3. Save the changed Virtual Service

This has a host of problems including concurrency and garbage collection of old versions. We’re trying to not require a human to go in and modify the Virtual Service - this should be done by a tool.

GitOps does solve one of the problems (concurrent editing of Virtual Service), but doesn’t really solve any others (e.g., when does a version get removed from the Virtual Service?)

Hello Erick,
I am currently facing the same questions (i.e. how to add / remove special routes to canary versions of pods without risking corrupting the single VirtualService object).

Our ideas at the moment are:

  • use JSON PATCH to add / remove the canary routing rules
  • use an external script that would regenerate the VirtualService from the existing DestinationRules (that can be deployed independently, per subset)
  • or even go all the way with the Operator pattern to control the VirtualService and update them from a CRD + the DestinationRules) - this sounds like a lot of work…

Have you eventually found a good way to do it?

Thanks,

Alexis

I’m also facing this issues, mainly the garbage collection mentioned by @erick-thompson and how not to corrupt the single VirtualService object as @alexis2b stated.

Is there a good practice to follow here? @linsun @rshriram

I’m trying to get a whole CI/CD using istio, but this “dynamic URLs” created by the CI are kind of hard to deal with if the yaml doesn’t accept variables (there’s a bunch of “sed” substitution to be done). Using Knative could be a solution, but I have deployments with PVCs which are not supported by Knative.

Hi,

i would also be interested what the best practice would be…

I’m currently facing problems, because i deploy distinct VirtualServices and DestinationRules for each version of our services ( with prefix routing ) as you can see here Multiple DestinationRules for the same host

If i need to combine virtualservices or destinationrules i’m afraid it will get chaotic quite fast… and removal of old versions will need surgical skills :frowning:

I’m using envtpl to substitute variables.
just piping the yaml through envtpl before piping to kubectl…

This is my current solution, maybe it helps someone…

My CI/CD renders yaml files via helm template so i just end up with a bunch of standard kubernetes yaml files and not using any other helm stuff like chart repo…

I end up with a directory structure like this, for every branch/tag in my git repository

image

those get added to an environment repository from where they get deployed via kubectl

So with multiple versions it looks like this (branches demo,live,pilot)

All of those VirtualServices use the same host (demo and live version as an example here):

demo

---
# Source: gesellschafter-demo/templates/virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: gesellschafter-demo
  namespace: neo
spec:
  hosts:
  - gesellschafter-neo.${K8SDOMAIN}
  - gesellschafter-demo.neo.svc.cluster.local
  gateways:
  - neo-gateway
  - mesh
  http:
  - match:
    - uri:
        prefix: "/demo/"
    - uri:
        prefix: "/demo"
    rewrite:
      uri: "/"
    route:
    - destination:
        host: gesellschafter-demo
        port:
          number: 8888
        subset: demo

live

---
# Source: gesellschafter-live/templates/virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: gesellschafter-live
  namespace: neo
spec:
  hosts:
  - gesellschafter-neo.${K8SDOMAIN}
  - gesellschafter-live.neo.svc.cluster.local
  gateways:
  - neo-gateway
  - mesh
  http:
  - match:
    - uri:
        prefix: "/live/"
    - uri:
        prefix: "/live"
    rewrite:
      uri: "/"
    route:
    - destination:
        host: gesellschafter-live
        port:
          number: 8888
        subset: live

but they need dedicated DestinationRules because merging was caused problems here…

demo

---
# Source: gesellschafter-demo/templates/destinationrule.yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: gesellschafter-demo-destinationrule
  namespace: neo
spec:
  host: gesellschafter-demo.neo.svc.cluster.local
  subsets:
  - name: demo
    labels:
      version: demo

live

---
# Source: gesellschafter-pilot/templates/destinationrule.yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: gesellschafter-pilot-destinationrule
  namespace: neo
spec:
  host: gesellschafter-pilot.neo.svc.cluster.local
  subsets:
  - name: pilot
    labels:
      version: pilot

This all works great when coming in through the ingress-gateway and i can to this to reach my services

https://gesellschafter-neo.mydomain.com/demo
https://gesellschafter-neo.mydomain.com/live

But there is a limitation that i still struggle with…
As my services are attached to the mesh-Gateway they should be able to communicate directly… and they are… but only using distinct names like the internal ones gesellschafter-pilot.neo.svc.cluster.local because using mesh the VirtualHost config is applied to all the Sidecars and there host merging is not supported for some reason…
It would be great if this would be added as a feature !!

Is anyone is interested i can share more on how we deploy our services.