"service "istiod" not found" during canary upgrade 1.5 -> 1.6 -> 1.7

Hello
After upgrading istio from 1.5 to 1.6 and 1.7 with canary mode, I have the 3 istio version in my cluster.
But then I had the following error when trying to execute the istioctl 1.7 install again :

And I see that I have now 3 services in the cluster :
istiod , istiod-1-6-x and istiod-1-7-x
To bypass the above error, I tried to delete the “legacy” istiod from the cluster (to keep only the 1.6 and 1.7 revisions) but then the istioctl 1.7 install gave me this error :

2020-12-04T14:07:42.892106Z error installer Internal error occurred: failed calling webhook “validation.istio.io”: Post https://istiod.istio-system.svc:443/validate?timeout=30s: service “istiod” not found

Can you explain me why this “legacy” istiod service is still needed when installing the new revision please ?

i believe you are getting blocked by the istio validatingwebhookconfiguration. take a look at them here and you might need to delete the old ones as well if there is multiple kubectl get validatingwebhookconfiguration -o yaml

Hello @nick_tetrate and thank you for your help.
Yes it is linked to the validatingwebhook.
Here is the output :

$ kubectl get validatingwebhookconfiguration -o yaml istiod-istio-system
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  labels:
    app: istiod
    install.operator.istio.io/owning-resource: installed-state-1-7-4
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio: istiod
    istio.io/rev: 1-7-4
    operator.istio.io/component: Base
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.7.4
    release: istio
  name: istiod-istio-system
webhooks:
- admissionReviewVersions:
  - v1beta1
  - v1
  clientConfig:
    caBundle: LS0tL...Cg==
    service:
      name: istiod
      namespace: istio-system
      path: /validate
      port: 443
  failurePolicy: Fail
  matchPolicy: Exact
  name: validation.istio.io
  rules:
  - apiGroups:
    - config.istio.io
    - security.istio.io
    - authentication.istio.io
    - networking.istio.io
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - '*'
    scope: '*'
  sideEffects: None
  timeoutSeconds: 30

FYI : I manage to workaroud the issue by editing the istiod service selector to point to istio.io/rev: 1-7-4 revision instead of istio: pilot.

$ kubectl get svc -o yaml istiod
apiVersion: v1
kind: Service
metadata:
  labels:
    app: istiod
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.5.4
    release: istio
  name: istiod
  namespace: istio-system
spec:
  clusterIP: 10.97.243.237
  ports:
  - name: https-dns
    port: 15012
  - name: https-webhook
    port: 443
    targetPort: 15017
  selector:
    app: istiod
    istio.io/rev: 1-7-4

Seems to be an issue with this legacy istiod service when deploying canaries revisions no ?

Since both are just validating the CRDs i would recommend deleting the older one and only validate configuration with istio 1.7 istiod

By “deleting the older one”, you mean deleting the old validatingwebhook right ?

yes delete the old webhook validator

Ok I deleted the old one (kubectl delete validatingwebhookconfiguration istio-galley) and then I deployed again : no error.
But if I relaunch a deployment, I have the error again

istio-1.7.4/bin/istioctl install -n istio-system --set 'revision=1-7-4' -f -

! Components.Telemetry.Enabled is deprecated. Mixer is deprecated and will be removed from Istio with the 1.8 release. Please consult our docs on the replacement.

- Processing resources for Istio core.2021-01-28T15:41:56.872252Z	error	installer	Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io "istiod-istio-system": the object has been modified; please apply your changes to the latest version and try again

✘ Istio core encountered an error: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io "istiod-istio-system": the object has been modified; please apply your changes to the latest version and try again

- Processing resources for Istiod.
✔ Istiod installed
- Processing resources for Ingress gateways, Telemetry.
✔ Ingress gateways installed
- Processing resources for Telemetry.
✔ Telemetry installed
- Pruning removed resourcesError: failed to install manifests: errors occurred during operation
script returned exit code 1

whereas now I only have the 2 validationwebhooks :

$ kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
NAME                   WEBHOOKS   AGE
cert-manager-webhook   1          184d
istiod-istio-system    1          94m

I tried to delete the one mentionned in the error

kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io istiod-istio-system

When I deployed again, no error and the webhook is recreated by istio installation
But if I launch the script antoher time, the error appears again…
I do not understand the logic here :sweat_smile:

@nick_tetrate I can confirm you I have this error on all my clusters, and the only solution to this is to edit the istiod service to change :

selector:
    app: istiod
    istio: pilot

by

selector:
    app: istiod
    istio.io/rev: 1-7-4

Note: I found this because when I look at kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io istiod-istio-system -o yaml I see that it refers to the istiod svc and not the istiod-1-7-4 svc.

Do you think there is a problem in the upgrade process : is it normal that this validationwebhook is still connected to the old istiod service and not to the one from the newest revision ?

Update : I did a fresh install of istio 1.7.4 on a branch new cluster, and I do not have any istiod service, only a istiod-1.7.4 svc. But the validationwebhook still point to :sweat_smile:

service:
      name: istiod
      namespace: istio-system
      path: /validate
      port: 443
1 Like

@yogeek Did you experience any issues with the validationwebhook still pointing to the old service (which is not existing?)

The instructions for uninstalling the old control plane is not very clear. Only deleting the iop is not sufficient, i have still old deployment & services running. I’m not sure if i should delete these ones, and if the new ones (eg. svc istiod-1-7-8) are correctly used