Galley crashing on Brand New Installation


#1

I’m getting an error on Galley installing istio 1.1.0-snapshot5.

2019-02-08T19:04:52.862581Z	info	kube	zipkins.config.istio.io/v1alpha2 resource type not found
2019-02-08T19:04:52.864994Z	info	kube	zipkins.config.istio.io/v1alpha2 resource type not found
2019-02-08T19:04:52.865065Z	fatal	Unable to initialize Galley Server: timed out waiting for the condition: the following resource type(s) were not found: [zipkins]
2019-02-08T19:04:52.865108Z	info	Istio Galley: root@49aebe88-2b82-11e9-9903-0a580a2c0404-docker.io/istio-master-20190208-09-16-92f3d9716f1bf6f35befc07fd5deb1974b381f16-dirty-Modified
Starting gRPC server on tcp://0.0.0.0:9901
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a79962]

goroutine 12 [running]:
istio.io/istio/galley/pkg/server.(*Server).Run(0x0)
	/workspace/go/src/istio.io/istio/galley/pkg/server/server.go:217 +0x22
istio.io/istio/galley/pkg/server.RunServer(0xc4204989c0, 0x21fac60, 0xc4204161e0, 0x21fac60, 0xc420416230)
	/workspace/go/src/istio.io/istio/galley/pkg/server/server.go:277 +0x1b3
created by istio.io/istio/galley/cmd/galley/cmd.GetRootCmd.func2
	/workspace/go/src/istio.io/istio/galley/cmd/galley/cmd/root.go:88 +0x3c8

Can anyone share any light? Can it be RBAC problems? I see another error in the beggining:

|2019-02-08T19:04:00.704096Z|error|validation|istio-galley validatingwebhookconfiguration update failed: validatingwebhookconfigurations.admissionregistration.k8s.io "istio-galley" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: User "system:serviceaccount:istio-system:istio-galley-service-account" cannot update deployments/finalizers.extensions at the cluster scope, <nil>|
|---|---|---|---|
|2019-02-08T19:04:00.704131Z|error|validation|Error creating monitoring context for reportValidationConfigUpdateError: invalid value: only ASCII characters accepted; max length must be 255 characters|

Thanks,
Marcelo


#2

Hi,

The crash itself is due to a bug that was recently fixed in the release 1.1 branch.

The underlying cause is the missing CRD

2019-02-08T19:04:52.864994Z	info	kube	zipkins.config.istio.io/v1alpha2 resource type not found

The zipkin CRD is not available. (You’d normally observe this as Galley quitting).


#3

I traced the code for https://github.com/istio/istio/blob/f1eab0a25dd6f37be03951698be0e84b120ec18f/galley/pkg/server/server.go#L217 and could not discover the root cause for the traceback.

The “invalid value: only ASCII characters accepted; max length must be 255 characters” message comes from go.opencensus.io/tag/validate.go and is probably caused by the previous message being longer than 255 characters (it’s 376 characters). I am not sure why istio-galley-service-account cannot set finalizers; I don’t see this message on my deployment.


#4

And the error about the ownerReference is a permissions issue, the galley cluster role needs to have the following added. I think there may already be a PR for this

apiGroups: [“extensions”]
resources: [“deployments/finalizers”]
resourceNames: [“istio-galley”]
verbs: [“update”]


#5

There wasn’t but there is now :), https://github.com/istio/istio/pull/11631/files


#6

This was fixed in master yesterday. The corresponding PR for 1.1 is https://github.com/istio/istio/pull/11631.


#7

@ayj will this be available on any kind of nightly build at Istio Release downloads page under 1.1.0-snapshot5?

@kconner I’ll try to add those by hand, thanks!

@ozevren where is this crd defined? I’ve applied all the crds needed. Can you point me to where it is defined or the details so I can apply it by hand? Also, the fix that you mentioned was merged in 1.1, will it be available on any snapshot Pre-Release under Istio Release downloads page?

Thanks guys, you’re relly fast!


#8

@feitnomore, this should be available in the latest daily release 1.1 build (e.g. https://gcsweb.istio.io/gcs/istio-prerelease/daily-build/release-1.1-20190211-09-16/)


#9

Folks, what about the segmentation fault? Will it be available? I’ve got the latest version stated by @ayj but I’m still crashing:

2019-02-12T10:52:02.917147Z	fatal	Unable to initialize Galley Server: timed out waiting for the condition: the following resource type(s) were not found: [servicecontrolreports servicecontrols]
2019-02-12T10:52:02.917166Z	info	Istio Galley: root@9bdf77d0-2d14-11e9-9903-0a580a2c0404-docker.io/istio-master-20190210-09-16-9be6b898893b2668b207f6e4f6450f6334ae9d4a-dirty-Modified
Starting gRPC server on tcp://0.0.0.0:9901
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a79942]

goroutine 51 [running]:
istio.io/istio/galley/pkg/server.(*Server).Run(0x0)
	/workspace/go/src/istio.io/istio/galley/pkg/server/server.go:217 +0x22
istio.io/istio/galley/pkg/server.RunServer(0xc4201200c0, 0x21fac60, 0xc4200e2e60, 0x21fac60, 0xc4200e2eb0)
	/workspace/go/src/istio.io/istio/galley/pkg/server/server.go:277 +0x1b3
created by istio.io/istio/galley/cmd/galley/cmd.GetRootCmd.func2
	/workspace/go/src/istio.io/istio/galley/cmd/galley/cmd/root.go:88 +0x3c8

Thanks,
Marcelo


#10

Ok, looks like there is something else missing. I needed to apply these by hand to fix the problem:

kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
  name: servicecontrols.config.istio.io
  annotations:
    "helm.sh/hook": crd-install
  labels:
    app: mixer
    package: servicecontrol
    istio: mixer-adapter
spec:
  group: config.istio.io
  names:
    kind: servicecontrol
    plural: servicecontrols
    singular: servicecontrol
    categories:
    - istio-io
    - policy-istio-io
  scope: Namespaced
  version: v1alpha2

---

kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
  name: servicecontrolreports.config.istio.io
  annotations:
    "helm.sh/hook": crd-install
  labels:
    app: mixer
    package: servicecontrolreport
    istio: mixer-instance
spec:
  group: config.istio.io
  names:
    kind: servicecontrolreport
    plural: servicecontrolreports
    singular: servicecontrolreport
    categories:
    - istio-io
    - policy-istio-io
  scope: Namespaced
  version: v1alpha2

Thanks,
Marcelo


#11

Marcelo,

Can you give me exact repro steps? I’ve tried installing the build ayj@ at was pointing, and I didn’t hit into any crashes with a stock installation, or when I remove some of the CRDs.


#12

Are you deleting CRDs by hand?


#13
cd install/kubernetes/helm/
tar -xvzf charts/istio-1.1.0.tgz
tar -xvzf charts/istio-init-1.1.0.tgz
cd /root/istio-1.1.0/

/root/helm-linux-amd64/helm template install/kubernetes/helm/istio --name istio --namespace istio-system --set tracing.enabled=true --set ingress.enabled=true --set gateways.enabled=true --set gateways.istio-ingressgateway.enabled=true --set gateways.istio-egressgateway.enabled=true --set sidecarInjectorWebhook.enabled=true --set galley.enabled=true --set mixer.enabled=true --set mixer.istio-policy.autoscaleEnabled=true --set mixer.istio-telemetry.autoscaleEnabled=true --set pilot.enabled=true --set telemetry-gateway.grafanaEnabled=true --set telemetry-gateway.prometheusEnabled=true --set grafana.enabled=true --set prometheus.enabled=true --set servicegraph.enabled=true --set tracing.ingress.enabled=true --set kiali.enabled=true --set global.proxy.privileged=true > /root/istio.yaml

for i in install/kubernetes/helm/istio-init/files/crd*yaml; do kubectl apply -f $i; done

kubectl label namespace istio-system istio-injection=disabled

After that I’ve applied istio.yaml, and saw the crash, so I’ve added the 2 CRDs by hand.

Thanks,
Marcelo


#14

@ozevren I’ve tried the latest build again, and those CRD were missing.

/root/helm-linux-amd64/helm template install/kubernetes/helm/istio-init/ --name istio-init --namespace istio-system > /root/istio-init.yaml
/root/helm-linux-amd64/helm template install/kubernetes/helm/istio --name istio --namespace istio-system --set tracing.enabled=true --set ingress.enabled=true --set gateways.enabled=true --set gateways.istio-ingressgateway.enabled=true --set gateways.istio-egressgateway.enabled=true --set sidecarInjectorWebhook.enabled=true --set galley.enabled=true --set mixer.enabled=true --set mixer.istio-policy.autoscaleEnabled=true --set mixer.istio-telemetry.autoscaleEnabled=true --set pilot.enabled=true --set telemetry-gateway.grafanaEnabled=true --set telemetry-gateway.prometheusEnabled=true --set grafana.enabled=true --set prometheus.enabled=true --set servicegraph.enabled=true --set tracing.ingress.enabled=true --set kiali.enabled=true --set global.proxy.privileged=true > /root/istio.yaml
kubectl apply -f /root/istio-init.yaml
kubectl apply -f /root/istio.yaml

Also, I’m getting some strange messages on the sidecarInjector:

2019-02-13T11:39:27.135724Z warn istio.io/istio/pilot/cmd/sidecar-injector/main.go:171: watch of *v1beta1.MutatingWebhookConfiguration ended with: The resourceVersion for the provided watch is too old.
2019-02-13T11:46:39.181696Z warn istio.io/istio/pilot/cmd/sidecar-injector/main.go:171: watch of *v1beta1.MutatingWebhookConfiguration ended with: The resourceVersion for the provided watch is too old.
2019-02-13T11:55:06.206616Z warn istio.io/istio/pilot/cmd/sidecar-injector/main.go:171: watch of *v1beta1.MutatingWebhookConfiguration ended with: The resourceVersion for the provided watch is too old.

My system is not getting the sidecar injected automatically even though the namespace is labeled. Not sure if this warning is causing it. I’ve got some other messages on galley as well:

2019-02-13T12:12:15.250733Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[kind:Status apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired code:410]}
2019-02-13T12:12:34.394328Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status apiVersion:v1]}
2019-02-13T12:12:39.304314Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status]}
2019-02-13T12:13:01.148306Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure]}
2019-02-13T12:13:10.493186Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status]}
2019-02-13T12:13:29.342294Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure]}
2019-02-13T12:14:02.928756Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure]}
2019-02-13T12:14:14.559491Z warn istio.io/istio/galley/pkg/source/kube/dynamic/source.go:131: watch of *unstructured.Unstructured ended with: unexpected object: &amp;{map[apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired code:410 kind:Status]}

Thanks,
Marcelo


#15

It was a role problem. Sorry keep bothering you guys. Seems like the Control Panel is working fine now.