Hey all, I’m having an issue with Istio Ingress Gateway pods crashing with a segfault when load increases. I would see error messages in the logs like this.
Epoch 0 exited with error: signal: segmentation fault (core dumped)
About 50 users are browsing through the gateway generating moderate activity. At first, I had 20 ingress gateway pods with 1GB / 500m CPU pod requests, and 8GB / 4CPU limits. So there should be plenty of resources available for 50 users.
When activity picks up, big groups of Ingress Gateway pods at a time would occasionally change to “Completed” and then “CrashLoopBackoff”. It was surprising to see these pods to crash often times groups, but they’d also crash individually sometimes. Sometimes, all pods would be in CrashLoopBackoff state, breaking ingress completely. Scaling up to 40 pods causes the crash loop backoff to happen less frequently, but it would still happen. The nodes that these pods reside on each have 90% memory available, and the Ingress Gateway pods seem to be using only 200MB to 900MB each, according to p8s. So unless the memory suddenly spikes, there seems to be plenty of memory available.
I was able to get the full stracktrace here. My ingress gateway has a bunch of different Lua filters used for auth on different parts of the site. I suspect the Lua filters might be a problem, but don’t know how to confirm and debug.
What can I try to debug this issue? Is this a familiar issue?
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.323][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x8
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.323][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:70] Backtrace (use tools/stack_decode.py to get line numbers):
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.323][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:71] Envoy version: 5a93703db9baee125294bb50c9541a1a11d526b4/1.13.4/Clean/RELEASE/BoringSSL
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.325][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #0: __restore_rt [0x7f0948f3a8a0]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.334][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #1: luaL_openlibs [0x55e8d1f207b7]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.342][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #2: Envoy::Extensions::Filters::Common::Lua::ThreadLocalState::LuaThreadLocal::LuaThreadLocal() [0x55e8d1eb6ec7]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.350][56][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #3: std::__1::__function::__func<>::operator()() [0x55e8d1eb712c]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.907][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:83] Caught Segmentation fault, suspect faulting address 0x8
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.907][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:70] Backtrace (use tools/stack_decode.py to get line numbers):
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.907][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:71] Envoy version: 5a93703db9baee125294bb50c9541a1a11d526b4/1.13.4/Clean/RELEASE/BoringSSL
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.908][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #0: __restore_rt [0x7f6b6a37f8a0]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.919][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #1: luaL_openlibs [0x55e6c91497b7]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.929][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #2: Envoy::Extensions::Filters::Common::Lua::ThreadLocalState::LuaThreadLocal::LuaThreadLocal() [0x55e6c90dfec7]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.938][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #3: std::__1::__function::__func<>::operator()() [0x55e6c90e012c]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.947][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #4: std::__1::__function::__func<>::operator()() [0x55e6ca20a0a8]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.956][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #5: std::__1::__function::__func<>::operator()() [0x55e6ca20b2d8]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.965][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #6: Envoy::Event::DispatcherImpl::runPostCallbacks() [0x55e6ca276986]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.974][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #7: event_process_active_single_queue [0x55e6ca5aa806]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.984][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #8: event_base_loop [0x55e6ca5a938e]
"[Envoy (Epoch 0)] [2020-12-14 18:20:07.993][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #9: Envoy::Server::WorkerImpl::threadRoutine() [0x55e6ca26d308]
"[Envoy (Epoch 0)] [2020-12-14 18:20:08.002][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #10: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::$_0::__invoke() [0x55e6ca775ad3]
"[Envoy (Epoch 0)] [2020-12-14 18:20:08.002][47][critical][backtrace] [bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:75] #11: start_thread [0x7f6b6a3746db]
Istio version: 1.5.8
Envoy version: 1.13.4