Error ReInstalling Kiali Operator v1.16.0

Running into issues getting the Kiali operator v1.16.0 reinstalled. We recently upgraded our EKS cluster to 1.15.x, so not sure if that is the issue. Our script calls the deploy-kiali-operator.sh script with the following params:
deploy-kiali-operator.sh -kcr kiali-cr.yaml -oiv v1.16.0

Getting the following:

Using ‘kubectl’ located here: /usr/local/bin/kubectl
=== UNINSTALL SETTINGS ===
UNINSTALL_EXISTING_KIALI=
UNINSTALL_EXISTING_OPERATOR=
UNINSTALL_MODE=
=== UNINSTALL SETTINGS ===
envsubst is here: /usr/local/bin/envsubst
IMPORTANT! The Kiali operator will be given permission to create cluster roles and
cluster role bindings in order to grant Kiali access to all namespaces in the cluster.
=== OPERATOR SETTINGS ===
OPERATOR_IMAGE_NAME=quay.io/kiali/kiali-operator
OPERATOR_IMAGE_PULL_POLICY=IfNotPresent
OPERATOR_IMAGE_VERSION=v1.16.0
OPERATOR_INSTALL_KIALI=true
OPERATOR_NAMESPACE=kiali-operator
OPERATOR_SKIP_WAIT=false
OPERATOR_VERSION_LABEL=v1.16.0
OPERATOR_WATCH_NAMESPACE=kiali-operator
OPERATOR_ROLE_CLUSTERROLES=- clusterroles
OPERATOR_ROLE_CLUSTERROLEBINDINGS=- clusterrolebindings
OPERATOR_ROLE_CREATE=- create
OPERATOR_ROLE_DELETE=- delete
OPERATOR_ROLE_PATCH=- patch
=== OPERATOR SETTINGS ===
Deploying Kiali operator to namespace [kiali-operator]
Using downloader: wget -q -O -
Applying yaml from URL via: [wget -q -O - https://raw.githubusercontent.com/kiali/kiali/v1.16.0/operator/deploy/namespace.yaml] to namespace [kiali-operator]
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
namespace/kiali-operator configured
Applying yaml from URL via: [wget -q -O - https://raw.githubusercontent.com/kiali/kiali/v1.16.0/operator/deploy/crd.yaml] to namespace [kiali-operator]
customresourcedefinition.apiextensions.k8s.io/monitoringdashboards.monitoring.kiali.io created
customresourcedefinition.apiextensions.k8s.io/kialis.kiali.io created
Applying yaml from URL via: [wget -q -O - https://raw.githubusercontent.com/kiali/kiali/v1.16.0/operator/deploy/service_account.yaml] to namespace [kiali-operator]
serviceaccount/kiali-operator created
Applying yaml from URL via: [wget -q -O - https://raw.githubusercontent.com/kiali/kiali/v1.16.0/operator/deploy/role.yaml] to namespace [kiali-operator]
clusterrole.rbac.authorization.k8s.io/kiali-operator created
Applying yaml from URL via: [wget -q -O - https://raw.githubusercontent.com/kiali/kiali/v1.16.0/operator/deploy/role_binding.yaml] to namespace [kiali-operator]
clusterrolebinding.rbac.authorization.k8s.io/kiali-operator created
Applying yaml file [/Users/e62062/dev/platform-istio/kubernetes/operators/operator.yaml] to namespace [kiali-operator]
deployment.apps/kiali-operator created
Waiting for the operator to start…
ERROR: The Kiali operator is not running yet. Please make sure it was deployed successfully.

If I check in the kiali-operator namespace, there is a kiali-operator pod running, but there is no kiali pod in the istio-system namespace. Any ideas what could be causing this? Are there compatibility issues with Kiali v1.16.0 and EKS 1.15.x?

ERROR: The Kiali operator is not running yet. Please make sure it was deployed successfully.

Something is wrong with the operator pod starting up. You’ll have to find out why it didn’t start (look at the k8s event logs as a start). The script won’t install the Kiali CR until it sees the operator come up. See:

Kiali Operator pod appears to start just fine. If I check my deployment logs hangs a long time deploying the operator, but the kiali operator seems to have started right away.

NAME READY STATUS RESTARTS AGE
kiali-operator-749bd695c8-fzk8s 2/2 Running 0 8m1s

Checked the events of the operator pod and don’t see anything out of the ordinary:

Events:
Type Reason Age From Message


Normal Scheduled 9m50s default-scheduler Successfully assigned kiali-operator/kiali-operator-749bd695c8-fzk8s to ip-xx-xx-93-xxx.ec2.internal
Normal Pulling 9m48s kubelet, ip-xx-xx-93-xxx.ec2.internal Pulling image “quay.io/kiali/kiali-operator:v1.16.0
Normal Pulled 9m34s kubelet, ip-xx-xx-93-xxx.ec2.internal Successfully pulled image “quay.io/kiali/kiali-operator:v1.16.0
Normal Created 9m30s kubelet, ip-xx-xx-93-xxx.ec2.internal Created container ansible
Normal Started 9m30s kubelet, ip-xx-xx-93-xxx.ec2.internal Started container ansible
Normal Pulled 9m30s kubelet, ip-xx-xx-93-xxx.ec2.internal Container image “quay.io/kiali/kiali-operator:v1.16.0” already present on machine
Normal Created 9m30s kubelet, ip-xx-xx-93-xxx.ec2.internal Created container operator
Normal Started 9m30s kubelet, ip-xx-xx-93-xxx.ec2.internal Started container operator

Still not seeing the kaili pod created in istio-system namespace:

kubectl get deployment -n istio-system
NAME READY UP-TO-DATE AVAILABLE AGE
istio-citadel 1/1 1 1 16m
istio-galley 1/1 1 1 16m
istio-ingressgateway 1/1 1 1 16m
istio-pilot 1/1 1 1 16m
istio-policy 1/1 1 1 16m
istio-sidecar-injector 1/1 1 1 16m
istio-telemetry 1/1 1 1 16m
prometheus 1/1 1 1 16m

If I manually run the kaili-cr.yaml I finally see the kaili pod in istio-system namespace.

It’s possible the script is doing something wrong while waiting and on your environment it doesn’t know the operator pod is up. You can use the --operator-skip-wait true option to have the script not wait for the operator and have it immediately create the Kiali CR. See: https://github.com/kiali/kiali-operator/blob/c7871d368fb3317133ab8d99395e8df384dd1402/deploy/deploy-kiali-operator.sh#L389-L391

FWIW: that deployment script has been deprecated and for the current release we instead ask people to use OLM or the operator helm chart to install the operator. See: https://kiali.io/documentation/latest/installation-guide/#_helm_chart

Thanks for the reply. Will give the operator skip wait a try. Last things I see in the operator logs are:

{“level”:“info”,“ts”:1597336665.7485445,“logger”:“controller-runtime.controller”,“msg”:“Starting Controller”,“controller”:“kiali-controller”}
{“level”:“info”,“ts”:1597336665.7485728,“logger”:“controller-runtime.controller”,“msg”:“Starting workers”,“controller”:“kiali-controller”,“worker count”:1}

We will also take a look at changing the deployment script in the near future, but for now would like to get this working again. Thanks so much for your help.