Upgrading istio 1.4.3 to 1.6.0

We have istio 1.4.3 running in production and staging both in GKE cluster. It was installed using Helm. We have one external LB and one internal LB. Installing both LB using Helm was pretty straightforward.

But, now Helm install method is deprecated. Can anyone tell me what would be the correct way to upgrade to 1.6.0 with minimal or no downtime. Also, we want keep using both the internal and external LB. Help much appreciated.

Can I upgrade from 1.4.3 directly to 1.6.4? Should I upgrade from 1.4.x to 1.5.x and then from 1.5.x to 1.6x

I still couldn’t find any docs or references that say otherwise. Any input, pointers will be highly appreciated.

@laurentiuspurba you should upgrade from 1.4.x to 1.5.x and then from 1.5.x to 1.6.x

@liptan AFAIK you should use expertimental istioctl upgrade to update from 1.4.x to 1.5.x, and then istioctl upgrade to latest 1.6.x

@jt97 Thanks for the info.

This is what I am doing right now.

@jt97 I just upgraded my Istio version from 1.4.3 to 1.4.10 before doing it to 1.5.x. It was a successful process, and I didn’t see any errors.

But when I checked using istioctl (version 1.4.10), I still have 4 proxies in version 1.4.3 - see below:

▶ ./istioctl version
client version: 1.4.10
control plane version: 1.4.10
data plane version: 1.4.3 (4 proxies), 1.4.10 (18 proxies)

Any thoughts on this?

Thank you.

@laurentiuspurba Maybe that’s just older pods which we’re deployed?

@jt97 I was able to upgrade from 1.4.3 -> 1.4.10 successfully. And then, I upgraded from 1.4.10 to 1.5.8. I had to use --force parameter to be able to upgrade to 1.5.8
istioctl upgrade -f helm-charts/istio/istio-operator.yaml --force.

But when I issued kubectl get pods -n istio-system, I can see all pods are in READY and RUNNING state, except istio-egressgateway and istio-ingressgateway - see below

NAME                                    READY   STATUS    RESTARTS   AGE
istio-egressgateway-5b99ffccdd-d8nf5    0/1     Running   0          64m
istio-egressgateway-5b99ffccdd-hfc4d    0/1     Running   0          4h5m
istio-ingressgateway-75db45b458-jktzq   0/1     Running   0          3h56m
istio-ingressgateway-75db45b458-qp62p   0/1     Running   0          3h57m
istio-tracing-7cf5f46848-v9vm9          1/1     Running   0          3h47m
istiocoredns-5f7546c6f4-5fcsq           2/2     Running   0          65m
istiocoredns-5f7546c6f4-gn995           2/2     Running   0          8h
istiod-678b7fb6dc-r4dt4                 1/1     Running   0          23m
kiali-b4b5b4fb8-wm2gf                   1/1     Running   0          3h46m

I checked logs in istio-ingress, this is what I got:
[Envoy (Epoch 0)] [2020-07-15 01:53:44.760][20][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected

while logs in istiod is as follows:
validationController Not ready to switch validation to fail-closed: dummy invalid rejected for the wrong reason: Timeout: request did not complete within requested timeout 30s 2020-07-15T01:49:57.916082Z info validationController validatingwebhookconfiguration istiod-istio-system (failurePolicy=Ignore, resourceVersion=129696691) is up-to-date. No change required. 2020-07-15T01:49:57.916136Z info validationController Reconcile(enter): retry dry-run creation of invalid config 2020-07-15T01:49:59.420370Z info grpc: Server.Serve failed to complete security handshake from "10.88.3.28:38798": EOF 2020-07-15T01:50:01.040067Z info http: TLS handshake error from 10.88.20.19:41018: remote error: tls: unknown certificate authority

I’m still looking into istio github as well as any other resources.

Thank you.

@laurentiuspurba I still have try to upgrade our set up. It has more than one ingress gateway. BTW, have you disabled the validation as stated here. https://istio.io/latest/docs/setup/upgrade/#upgrading-from-1-4

Or maybe you won’t need it at all as you are not doing canary upgrade.

@liptan Thank you sir. I did disable validation as you mentioned while upgrading from 1.4.3 to 1.4.10, but I did not do that from 1.4.10 to 1.5.8., I forgot to do that.

I couldn’t find istio-galley anymore in Istio 1.5.8; I just tried deleting that ValidatingWebhookConfiguration and re-ran istioctl upgrade -f helm-charts/istio/istio-operator.yaml and got the :heavy_check_mark: IInstallation complete message, but the istio-ingressgateway pods are still NOT in READY state.

And checking what version control and data plane is on, I see the following:

istio/istio-1.5.8/bin
▶ ./istioctl version
client version: 1.5.8
control plane version: 1.5.8
data plane version: none

I’m still looking into the solution to this issue. I might set the inggressgateway to false and then run the upgrade again, and set it back to true again. I’ll let you know once I have something.

Thank you.

I found this in Istio 1.5 Upgrade Notes - link which basically as follows:

kubectl delete policies.authentication.istio.io --all-namespaces --all
kubectl delete meshpolicies.authentication.istio.io --all

I could not find policies.authentication.isio.io, but I did find meshpolicies.authentication.istio.io. I don’t know if I should delete this object or not.

So, I finally decided to upgrade from 1.5.8 to 1.6.4. When I issued istioctl upgrade -f helm-charts/istio/istio-operator-1-6-4.yaml --dry-run, I got few warnings:

  • found 6 CRD of unsupported v1alpha1 security policy.
  • found 1 unsupported security policy
    So I deleted all those policies as documented on Istio website.

So, I re-ran the upgrade again with --dry-run, and got the successful message:

Confirm to proceed [y/N]? y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Addons installed
✔ Installation complete 

But when I did upgrade (without --dry-run), I got the following error
Ingress gateway error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition Deployment/istio-system/istio-ingressgateway

And kubectl get pods -n istio-system has 2 sets of istio-ingressgateway pods with 2 version 1.5.8 (75db45b458) and 1.6.4 (68c4b7698c).

NAME                                    READY   STATUS    RESTARTS   AGE
istio-ingressgateway-68c4b7698c-d5pjx   0/1     Running   0          83m
istio-ingressgateway-68c4b7698c-l5jnc   0/1     Running   0          83m
istio-ingressgateway-75db45b458-dbz6j   0/1     Running   0          5h14m
istio-ingressgateway-75db45b458-w2gbn   0/1     Running   0          3h42m
istio-tracing-7cf5f46848-8sbnm          1/1     Running   0          8h
istiocoredns-5f7546c6f4-9bmxf           2/2     Running   0          4h15m
istiocoredns-5f7546c6f4-qh7rb           2/2     Running   0          14m
istiod-75855f9565-7str5                 1/1     Running   0          12m
istiod-75855f9565-pvb5g                 1/1     Running   0          12m
kiali-7fcc47db9f-v87vw                  1/1     Running   0          83m

Some info that I got from logs on the istiod and istio-ingressgateway as follows:

istio-ingressgateway
istio-ingressgateway-68c4b7698c-d5pjx istio-proxy 2020-07-15T18:50:43.836660Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure

istiod

istiod-75855f9565-7str5 discovery 2020-07-15T19:13:14.360572Z info grpc: Server.Serve failed to complete security handshake from "10.88.6.9:43150": EOF

Any suggestions are really appreciated?

================================================================================

I went from 1.4.3 -> 1.4.10 -> 1.5.8 -> 1.6.4. So far, only 1.4.3 -> 1.4.10 upgrade was a success.

Looking at Istio doc, it says Istio 1.6 has been tested with K8s 1.15, 1.16, 1.17 and 1.18, while mine is 1.14.

So, I rolled it back from 1.6.4 -> 1.5.8 -> 1.4.10. I had to do --force upgrade to get it back to 1.4.10. Right now my Istio 1.4.10 is running as expected. All pods are in READY and RUNNING status.

I’m going to do the upgrade to 1.5.8, and hopefully this time is a success.

you need to patch 1.4 gw to use sds, run below command when you upgrade from 1.4 to 1.5, then the g/w should move to running state

kubectl -n istio-system \
  patch gateway istio-autogenerated-k8s-ingress --type=json \
  -p='[{"op": "replace", "path": "/spec/servers/1/tls", "value": {"credentialName": "ingress-cert", "mode": "SIMPLE", "privateKey": "sds", "serverCertificate": "sds"}}]'

Hi @deepak_deore, appreciate your comment on this.

I did patch the ingressgateway after the upgrade and now, I cloud see the ingressgateway pod was on READY state and RUNNING status and then it became in NOT READY state.

NAME                                    READY   STATUS    RESTARTS   AGE
istio-egressgateway-7769479cf4-qwnn7    1/1     Running   0          114m
istio-egressgateway-7769479cf4-sxrhk    1/1     Running   0          37m
istio-ingressgateway-75db45b458-l5m26   0/1     Running   0          21m
istio-ingressgateway-7774566b48-6xddh   1/1     Running   0          12h
istio-tracing-7cf5f46848-wtvqq          1/1     Running   0          21m
istiocoredns-5f7546c6f4-9x9vn           2/2     Running   0          113m
istiocoredns-5f7546c6f4-kj4vb           2/2     Running   0          8h
istiod-678b7fb6dc-vsrsq                 1/1     Running   0          21m
kiali-b4b5b4fb8-gcv2c                   1/1     Running   0          8h

Checking on the ingressgateway pod and got the following:
Readiness probe failed: HTTP probe failed with statuscode: 503

So, I did restart all the deployment and check the installation as follows:

▶ ./istioctl version
client version: 1.5.8
egressgateway version: 1.4.10
egressgateway version: 1.4.10
ingressgateway version: 1.5.8
ingressgateway version: 1.4.10
pilot version: 1.5.8
data plane version: 1.4.10 (14 proxies), 1.4.3 (2 proxies)

My IstioOperator has egressGateways set enabled to false, and I did not see any error messages on the upgrade.

In istiod
2020-07-16T13:51:18.216503Z info grpc: Server.Serve failed to complete security handshake from "10.88.5.20:33034": EOF

I checked that IP address, and found out that the IP address of the Pod that I rollout restart the deployment which terminated that pod and re-created with the latest istio-proxy 1.5.8; but this pod is in NOT READY state.

I need to see what other things that I can do and troubleshoot this.

============================================

So, I did toggle true or false on egressGateway on IstioOperator and re-ran the upgrade; and now istio-egressgateway pods are gone, but the istio-ingressgateway pod is still in NOT READY state.

============================================

So, I can see there are few istio-ingressgateway-xxxx ReplicaSets defined. Looking into this.

▶ kubectl get rs
NAME                              DESIRED   CURRENT   READY   AGE
istio-ingressgateway-68c4b7698c   0         0         0       21h
istio-ingressgateway-75db45b458   1         1         0       25h
istio-ingressgateway-7774566b48   1         1         1       14h
istio-tracing-7cf5f46848          1         1         1       42h
istio-tracing-cd67ddf8            0         0         0       93d
istiocoredns-5f7546c6f4           2         2         2       93d
istiod-678b7fb6dc                 1         1         1       86m
kiali-7964898d8c                  0         0         0       93d
kiali-7fcc47db9f                  0         0         0       21h
kiali-b4b5b4fb8                   1         1         1       46h

Deleting ReplicaSets that had 0 on all the columns (DESIRED, CURRENT, READY) fixed the issue with having multiple version istio-ingressgateway.

But still, the istio-ingressgateway pods status is NOT READY but RUNNING.

Anybody can help me on this? I increased the cpu and memory istio-ingressgateway pod, but they are still in NOT READY state.

I appreciate anyone’s suggestions on what else that I have to look into.

I was able to upgrade from 1.4.3 -> 1.4.10 -> 1.5.8 -> 1.6.4. I could not have done it without help from Istio community (Prune and Vito).

Prior upgrading to other version, make sure to add newline character on ca-cert.pem as describe here.

  • 1.4.3 -> 1.4.10: I did the upgrade using istioctl experimental upgrade -f IstioControlPlane.yaml
    • You might need to use --force
  • 1.4.10 -> 1.5.8:
    • First, convert IstioControlPlane.yaml to IstioOperator.yaml
    • Before the upgrade
      kubectl -n istio-system delete service/istio-galley deployment.apps/istio-galley
      kubectl delete validatingwebhookconfiguration.admissionregistration.k8s.io/istio-galley
    • Use istioctl upgrade -f IstioOperator.yaml to upgrade (--force might be required)
    • After the upgrade
      kubectl -n istio-system delete deployment istio-citadel istio-galley istio-pilot istio-policy istio-sidecar-injector istio-telemetry
      kubectl -n istio-system delete service istio-citadel istio-policy istio-sidecar-injector istio-telemetry
      kubectl -n istio-system delete horizontalpodautoscaler.autoscaling/istio-pilot horizontalpodautoscaler.autoscaling/istio-telemetry
      kubectl -n istio-system delete pdb istio-citadel istio-galley istio-pilot istio-policy istio-sidecar-injector istio-telemetry
      kubectl -n istio-system delete deployment istiocoredns
      kubectl -n istio-system delete service istiocoredns
  • 1.5.8 to 1.64
    • Make necessary changes of IstioOperator.yaml Version 1.5.8 file to support Version 1.6.4
    • Use istioctl upgrade -f IstioOperator.yaml to upgrade (--force might be required)