"service "istiod" not found" during canary upgrade 1.5 -> 1.6 -> 1.7

Hello
After upgrading istio from 1.5 to 1.6 and 1.7 with canary mode, I have the 3 istio version in my cluster.
But then I had the following error when trying to execute the istioctl 1.7 install again :

And I see that I have now 3 services in the cluster :
istiod , istiod-1-6-x and istiod-1-7-x
To bypass the above error, I tried to delete the “legacy” istiod from the cluster (to keep only the 1.6 and 1.7 revisions) but then the istioctl 1.7 install gave me this error :

2020-12-04T14:07:42.892106Z error installer Internal error occurred: failed calling webhook “validation.istio.io”: Post https://istiod.istio-system.svc:443/validate?timeout=30s: service “istiod” not found

Can you explain me why this “legacy” istiod service is still needed when installing the new revision please ?

i believe you are getting blocked by the istio validatingwebhookconfiguration. take a look at them here and you might need to delete the old ones as well if there is multiple kubectl get validatingwebhookconfiguration -o yaml

Hello @nick_tetrate and thank you for your help.
Yes it is linked to the validatingwebhook.
Here is the output :

$ kubectl get validatingwebhookconfiguration -o yaml istiod-istio-system
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  labels:
    app: istiod
    install.operator.istio.io/owning-resource: installed-state-1-7-4
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio: istiod
    istio.io/rev: 1-7-4
    operator.istio.io/component: Base
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.7.4
    release: istio
  name: istiod-istio-system
webhooks:
- admissionReviewVersions:
  - v1beta1
  - v1
  clientConfig:
    caBundle: LS0tL...Cg==
    service:
      name: istiod
      namespace: istio-system
      path: /validate
      port: 443
  failurePolicy: Fail
  matchPolicy: Exact
  name: validation.istio.io
  rules:
  - apiGroups:
    - config.istio.io
    - security.istio.io
    - authentication.istio.io
    - networking.istio.io
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - '*'
    scope: '*'
  sideEffects: None
  timeoutSeconds: 30

FYI : I manage to workaroud the issue by editing the istiod service selector to point to istio.io/rev: 1-7-4 revision instead of istio: pilot.

$ kubectl get svc -o yaml istiod
apiVersion: v1
kind: Service
metadata:
  labels:
    app: istiod
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.5.4
    release: istio
  name: istiod
  namespace: istio-system
spec:
  clusterIP: 10.97.243.237
  ports:
  - name: https-dns
    port: 15012
  - name: https-webhook
    port: 443
    targetPort: 15017
  selector:
    app: istiod
    istio.io/rev: 1-7-4

Seems to be an issue with this legacy istiod service when deploying canaries revisions no ?

Since both are just validating the CRDs i would recommend deleting the older one and only validate configuration with istio 1.7 istiod

By “deleting the older one”, you mean deleting the old validatingwebhook right ?

yes delete the old webhook validator

Ok I deleted the old one (kubectl delete validatingwebhookconfiguration istio-galley) and then I deployed again : no error.
But if I relaunch a deployment, I have the error again

istio-1.7.4/bin/istioctl install -n istio-system --set 'revision=1-7-4' -f -

! Components.Telemetry.Enabled is deprecated. Mixer is deprecated and will be removed from Istio with the 1.8 release. Please consult our docs on the replacement.

- Processing resources for Istio core.2021-01-28T15:41:56.872252Z	error	installer	Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io "istiod-istio-system": the object has been modified; please apply your changes to the latest version and try again

✘ Istio core encountered an error: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io "istiod-istio-system": the object has been modified; please apply your changes to the latest version and try again

- Processing resources for Istiod.
✔ Istiod installed
- Processing resources for Ingress gateways, Telemetry.
✔ Ingress gateways installed
- Processing resources for Telemetry.
✔ Telemetry installed
- Pruning removed resourcesError: failed to install manifests: errors occurred during operation
script returned exit code 1

whereas now I only have the 2 validationwebhooks :

$ kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
NAME                   WEBHOOKS   AGE
cert-manager-webhook   1          184d
istiod-istio-system    1          94m

I tried to delete the one mentionned in the error

kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io istiod-istio-system

When I deployed again, no error and the webhook is recreated by istio installation
But if I launch the script antoher time, the error appears again…
I do not understand the logic here :sweat_smile:

@nick_tetrate I can confirm you I have this error on all my clusters, and the only solution to this is to edit the istiod service to change :

selector:
    app: istiod
    istio: pilot

by

selector:
    app: istiod
    istio.io/rev: 1-7-4

Note: I found this because when I look at kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io istiod-istio-system -o yaml I see that it refers to the istiod svc and not the istiod-1-7-4 svc.

Do you think there is a problem in the upgrade process : is it normal that this validationwebhook is still connected to the old istiod service and not to the one from the newest revision ?

Update : I did a fresh install of istio 1.7.4 on a branch new cluster, and I do not have any istiod service, only a istiod-1.7.4 svc. But the validationwebhook still point to :sweat_smile:

service:
      name: istiod
      namespace: istio-system
      path: /validate
      port: 443
1 Like