Init:CrashLoopBackOff when using external CA key/cert

jaid · February 19, 2020, 12:25am

Hey guys,

I’ve seen this happen before, too. It happened again today. After leaving my cluster alone for a few days, the two helloworld pods that I have are now in some weird state. They show as this:

NAME                               READY   STATUS                  RESTARTS   AGE
helloworld-v1.0-54655d76f-c2fbp    0/2     Init:CrashLoopBackOff   46         4d21h
helloworld-v1.0-54655d76f-fskpw    0/2     Init:CrashLoopBackOff   46         4d21h

Notice how it has restarted 46 times already.

However, I can still hit the helloworld service with curl and get a proper response. And the logs do show those requests with a 200 status code:

127.0.0.1 - - [18/Feb/2020 22:13:43] "GET /hello HTTP/1.1" 200 -
127.0.0.1 - - [18/Feb/2020 22:22:48] "GET /hello HTTP/1.1" 200 -

The log for istio-init does not show anything that jumps out besides “Chain already exists”, which, from what I’ve read, is to be expected.

The log for istio-proxy has a bunch of this line:

[Envoy (Epoch 0)] [2020-02-18 23:54:16.184][20][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,

The log for helloworld does not have anything besides those GET ones I pasted above.

When I describe the pod, I see this for istio-init:

State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Tue, 18 Feb 2020 15:56:01 -0800
  Finished:     Tue, 18 Feb 2020 15:56:01 -0800
Ready:          False
Restart Count:  46

helloworld and istio-proxy both look fine.

helloworld:

State:          Running
  Started:      Thu, 06 Feb 2020 12:33:31 -0800
Ready:          True

istio-proxy:

State:          Running
  Started:      Thu, 06 Feb 2020 12:33:31 -0800
Ready:          True

Notice how helloworld and istio-proxy’s start dates are 2/6, while istio-init’s is 2/18, which leads me to believe that istio-init is really the only one that keeps dying.

Any idea what’s going on here? I don’t think I’ve seen this before, when Citadel was still using self-signed certs, so it’s gotta be the external CA key/cert that’s causing the issue somehow.

Thanks for any inputs.

jaid · February 20, 2020, 9:48pm

Update on this that I’m hoping will give someone ideas what could be going on…

I left my pods running overnight again…

This morning, they’d already had about 140 restarts (I forget the exact number, sorry). And the istio-proxy container’s logs had 151 of these, starting at 12 midnight PST:

[Envoy (Epoch 5)] [2020-02-20 08:00:38.704][585][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,

I don’t see any other logs in the other containers (istio-init and helloworld).

Then, every 5 minutes, I would get another log entry like that. And another restart. They’re not happening at the same time though. They’re a couple of minutes apart, it seems. So I don’t know if they’re correlated at all.

But any ideas given the above? What happens at midnight that could potentially trigger those gRPC errors to start happening every 5 minutes (I’m not even using gRPC) and/or the restart of the pods to start happening, also every 5 minutes?

Thanks,
Jaid

Tomas_Kohout · February 21, 2020, 6:20am

Hello @jaid,

those gRPC logs that you see are related to communication with control plane. I’m not sure why does it happen every 5 minutes, but maybe it’s because you’re using citadel which I don’t use yet. (I see those logs every 30 minutes.)

Now to your init container problem. This is pretty strange. Init containers run only once before normal containers so what are you seeing and describing doesn’t make sense at all. Are you sure that there are only those pods that you see?

I can’t help you with the external cacert, but have you read this? https://istio.io/docs/tasks/security/citadel-config/plugin-ca-cert/

You must disable selfsigned certs and upgrade Istio.

jaid · February 21, 2020, 6:38am

Thank you, @Tomas_Kohout. I appreciate it. Yes, that’s actually the exact page I followed.

I’m still trying to figure it out. For tonight, I reverted to self-signed certs. I will see tomorrow what state my pods will be in.

Thanks again!

Tomas_Kohout · February 21, 2020, 8:50am

And did you check the logs of Citadel? The lead could be there.

jaid · February 21, 2020, 5:07pm

Good question. I actually didn’t.

However, my pods are still restarting even when they’re now using Citadel’s self-signed certs. Something must not be right with my installation then. Another cluster is fine. My services have been running for 23 days there.

There’s really nothing in the Citadel logs except for the rootCertRotator stuff that happens every hour.

Ingress gateway’s log has the same gRPC errors every 5 minutes.

Pilot’s log has the following.

The “Handling event update” seems to coincide with the restarts. Is anyone able to decipher what could be happening here?

    2020-02-21T16:58:30.110327Z	info	Handling event update for pod helloworld-v1.0-54655d76f-vmqt8 in namespace hw -> 192.168.70.231
    2020-02-21T16:58:44.651208Z	info	Handling event update for pod helloworld-v1.0-54655d76f-vmqt8 in namespace hw -> 192.168.70.231
    2020-02-21T16:59:06.486775Z	info	Handling event update for pod helloworld-v1.0-54655d76f-v74cf in namespace hw -> 192.168.223.82
    2020-02-21T16:59:07.514642Z	info	Handling event update for pod helloworld-v1.0-54655d76f-v74cf in namespace hw -> 192.168.223.82
    2020-02-21T16:59:20.065137Z	info	Handling event update for pod helloworld-v1.0-54655d76f-v74cf in namespace hw -> 192.168.223.82
    2020-02-21T17:01:57.088210Z	info	ads	ADS: "127.0.0.1:38162" sidecar~192.168.223.82~helloworld-v1.0-54655d76f-v74cf.hw~hw.svc.cluster.local-1993 terminated rpc error: code = Canceled desc = context canceled
    2020-02-21T17:01:57.351478Z	info	ads	ADS:CDS: REQ 127.0.0.1:38162 sidecar~192.168.223.82~helloworld-v1.0-54655d76f-v74cf.hw~hw.svc.cluster.local-1997 500.508µs version:2020-02-21T01:21:25Z/35
    2020-02-21T17:01:57.352292Z	info	ads	CDS: PUSH for node:helloworld-v1.0-54655d76f-v74cf.hw clusters:36 services:12 version:2020-02-21T01:21:25Z/35
    2020-02-21T17:01:57.352734Z	info	ads	EDS: PUSH for node:helloworld-v1.0-54655d76f-v74cf.hw clusters:31 endpoints:41 empty:[]
    2020-02-21T17:01:57.354062Z	info	ads	LDS: PUSH for node:helloworld-v1.0-54655d76f-v74cf.hw listeners:28
    2020-02-21T17:01:57.354467Z	info	ads	RDS: PUSH for node:helloworld-v1.0-54655d76f-v74cf.hw routes:11
    2020-02-21T17:01:57.454052Z	info	ads	ADS: "127.0.0.1:39150" sidecar~192.168.70.231~helloworld-v1.0-54655d76f-vmqt8.hw~hw.svc.cluster.local-1995 terminated rpc error: code = Canceled desc = context canceled
    2020-02-21T17:01:57.763663Z	info	ads	ADS:CDS: REQ 127.0.0.1:39150 sidecar~192.168.70.231~helloworld-v1.0-54655d76f-vmqt8.hw~hw.svc.cluster.local-1998 472.553µs version:2020-02-21T01:21:25Z/35
    2020-02-21T17:01:57.764503Z	info	ads	CDS: PUSH for node:helloworld-v1.0-54655d76f-vmqt8.hw clusters:36 services:12 version:2020-02-21T01:21:25Z/35
    2020-02-21T17:01:57.764937Z	info	ads	EDS: PUSH for node:helloworld-v1.0-54655d76f-vmqt8.hw clusters:31 endpoints:41 empty:[]
    2020-02-21T17:01:57.765973Z	info	ads	LDS: PUSH for node:helloworld-v1.0-54655d76f-vmqt8.hw listeners:28
    2020-02-21T17:01:57.766346Z	info	ads	RDS: PUSH for node:helloworld-v1.0-54655d76f-vmqt8.hw routes:11
    2020-02-21T17:03:03.297909Z	info	ads	ADS: "127.0.0.1:41364" router~192.168.223.126~istio-ingressgateway-6469996f79-2drzd.istio-system~istio-system.svc.cluster.local-1996 terminated rpc error: code = Canceled desc = context canceled
    2020-02-21T17:03:03.422122Z	info	ads	ADS:CDS: REQ 127.0.0.1:41364 router~192.168.223.126~istio-ingressgateway-6469996f79-2drzd.istio-system~istio-system.svc.cluster.local-1999 549.39µs version:2020-02-21T01:21:25Z/35
    2020-02-21T17:03:03.423983Z	info	ads	CDS: PUSH for node:istio-ingressgateway-6469996f79-2drzd.istio-system clusters:63 services:12 version:2020-02-21T01:21:25Z/35
    2020-02-21T17:03:03.424733Z	info	ads	EDS: PUSH for node:istio-ingressgateway-6469996f79-2drzd.istio-system clusters:62 endpoints:82 empty:[]
    2020-02-21T17:03:03.424907Z	info	ads	LDS: PUSH for node:istio-ingressgateway-6469996f79-2drzd.istio-system listeners:2
    2020-02-21T17:03:03.425075Z	info	ads	RDS: PUSH for node:istio-ingressgateway-6469996f79-2drzd.istio-system routes:2
    2020-02-21T17:03:31.096313Z	info	Handling event update for pod helloworld-v1.0-54655d76f-vmqt8 in namespace hw -> 192.168.70.231
    2020-02-21T17:03:44.649434Z	info	Handling event update for pod helloworld-v1.0-54655d76f-vmqt8 in namespace hw -> 192.168.70.231
    2020-02-21T17:04:13.004731Z	info	Handling event update for pod helloworld-v1.0-54655d76f-v74cf in namespace hw -> 192.168.223.82
    2020-02-21T17:04:24.063512Z	info	Handling event update for pod helloworld-v1.0-54655d76f-v74cf in namespace hw -> 192.168.223.82

Topic		Replies	Views
Init:CrashLoopBackOff Networking	1	3501	January 7, 2020
Istio-init container has gone to crash lookbackoff	0	728	September 28, 2020
Istio-1.13.3 : sample sleep pod in Init:CrashLoopBackOff state	0	361	May 16, 2022
Issues faced while installing istio	1	1598	July 31, 2020
Istio-ingressgateway with CrashLoopBackOff after appling Gateway	1	964	June 2, 2020

Init:CrashLoopBackOff when using external CA key/cert

Related topics