Pod shows error connecting to the Kubernetes service when Istio is enabled

Hi all,

I am trying to get Promtail up and running inside minikube using the official Helm chart. I noticed that when I use Istio, Promtail shows the following error messages when starting up:

E0402 10:26:35.471498       1 reflector.go:127] github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:451: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: connection refused
E0402 10:26:36.295109       1 reflector.go:127] github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:451: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.96.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: connection refused

I spent the last couple of days checking the network policies and RBAC, and I am sure that they are correct because I can use curl to emulate the call to that API and it works fine.

When I disable Istio, Promtail doesn’t show any error, so I guess the problem is at the level of the side-car proxy, possibly how it is configured. Here are the logs from the side-car proxy:

2021-04-02T10:26:37.167573Z	info	FLAG: --concurrency="2"
2021-04-02T10:26:37.167606Z	info	FLAG: --domain="default.svc.cluster.local"
2021-04-02T10:26:37.167612Z	info	FLAG: --help="false"
2021-04-02T10:26:37.167615Z	info	FLAG: --log_as_json="false"
2021-04-02T10:26:37.167617Z	info	FLAG: --log_caller=""
2021-04-02T10:26:37.167620Z	info	FLAG: --log_output_level="default:info"
2021-04-02T10:26:37.167622Z	info	FLAG: --log_rotate=""
2021-04-02T10:26:37.167625Z	info	FLAG: --log_rotate_max_age="30"
2021-04-02T10:26:37.167627Z	info	FLAG: --log_rotate_max_backups="1000"
2021-04-02T10:26:37.167630Z	info	FLAG: --log_rotate_max_size="104857600"
2021-04-02T10:26:37.167632Z	info	FLAG: --log_stacktrace_level="default:none"
2021-04-02T10:26:37.167642Z	info	FLAG: --log_target="[stdout]"
2021-04-02T10:26:37.167645Z	info	FLAG: --meshConfig="./etc/istio/config/mesh"
2021-04-02T10:26:37.167647Z	info	FLAG: --outlierLogPath=""
2021-04-02T10:26:37.167650Z	info	FLAG: --proxyComponentLogLevel="misc:error"
2021-04-02T10:26:37.167652Z	info	FLAG: --proxyLogLevel="warning"
2021-04-02T10:26:37.167663Z	info	FLAG: --serviceCluster="promtail.default"
2021-04-02T10:26:37.167666Z	info	FLAG: --stsPort="0"
2021-04-02T10:26:37.167668Z	info	FLAG: --templateFile=""
2021-04-02T10:26:37.167671Z	info	FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2021-04-02T10:26:37.167721Z	info	Version 1.9.0-b63e1966c245924b10a0915a671a656540ed7a45-Clean
2021-04-02T10:26:37.167942Z	info	Apply proxy config from env {}

2021-04-02T10:26:37.168722Z	info	Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: promtail.default
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2021-04-02T10:26:37.168792Z	info	Proxy role	ips=[10.244.120.66] type=sidecar id=promtail-f5jqq.default domain=default.svc.cluster.local
2021-04-02T10:26:37.168798Z	info	JWT policy is third-party-jwt
2021-04-02T10:26:37.168806Z	info	Pilot SAN: [istiod.istio-system.svc]
2021-04-02T10:26:37.168808Z	info	CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2021-04-02T10:26:37.168842Z	info	Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2021-04-02T10:26:37.168923Z	info	citadelclient	Citadel client using custom root cert: istiod.istio-system.svc:15012
2021-04-02T10:26:37.214226Z	info	ads	All caches have been synced up in 47.613332ms, marking server ready
2021-04-02T10:26:37.214672Z	info	sds	SDS server for workload certificates started, listening on "./etc/istio/proxy/SDS"
2021-04-02T10:26:37.214704Z	info	xdsproxy	Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes"
2021-04-02T10:26:37.214936Z	info	Starting proxy agent
2021-04-02T10:26:37.215731Z	info	sds	Start SDS grpc server
2021-04-02T10:26:37.215815Z	info	Opening status port 15020
2021-04-02T10:26:37.215975Z	info	Received new config, creating new Envoy epoch 0
2021-04-02T10:26:37.216040Z	info	Epoch 0 starting
2021-04-02T10:26:37.225935Z	info	Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster promtail.default --service-node sidecar~10.244.120.66~promtail-f5jqq.default~default.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format %Y-%m-%dT%T.%fZ	%l	envoy %n	%v -l warning --component-log-level misc:error --concurrency 2]
2021-04-02T10:26:37.267207Z	warning	envoy runtime	Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-04-02T10:26:37.267295Z	warning	envoy runtime	Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-04-02T10:26:37.267654Z	warning	envoy runtime	Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-04-02T10:26:37.267700Z	warning	envoy runtime	Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2021-04-02T10:26:37.317802Z	info	xdsproxy	connected to upstream XDS server: istiod.istio-system.svc:15012
2021-04-02T10:26:37.342615Z	info	ads	ADS: new connection for node:sidecar~10.244.120.66~promtail-f5jqq.default~default.svc.cluster.local-1
2021-04-02T10:26:37.343732Z	info	ads	ADS: new connection for node:sidecar~10.244.120.66~promtail-f5jqq.default~default.svc.cluster.local-2
2021-04-02T10:26:37.498112Z	info	cache	Root cert has changed, start rotating root cert
2021-04-02T10:26:37.498156Z	info	ads	XDS: Incremental Pushing:0 ConnectedEndpoints:2 Version:
2021-04-02T10:26:37.498173Z	info	cache	generated new workload certificate	latency=283.330466ms ttl=23h59m59.50183582s
2021-04-02T10:26:37.498212Z	info	cache	returned delayed workload certificate from cache	ttl=23h59m59.50179024s
2021-04-02T10:26:37.498667Z	info	sds	SDS: PUSH	resource=default
2021-04-02T10:26:37.543291Z	info	sds	SDS: PUSH	resource=ROOTCA
2021-04-02T10:26:37.543735Z	info	sds	SDS: PUSH	resource=ROOTCA
2021-04-02T10:26:37.590342Z	warning	envoy filter	mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
2021-04-02T10:26:37.593259Z	warning	envoy filter	mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
2021-04-02T10:26:39.338989Z	info	Initialization took 2.184529936s
2021-04-02T10:26:39.339003Z	info	Envoy proxy is ready
[2021-04-02T10:26:41.574Z] "POST /loki/api/v1/push HTTP/1.1" 204 - via_upstream - "-" 169674 0 28 28 "-" "promtail/2.2.0" "f21ddee0-bd0c-91d5-b50b-91834c615568" "loki:3100" "10.244.120.65:3100" outbound|3100||loki.default.svc.cluster.local 10.244.120.66:50206 10.101.200.103:3100 10.244.120.66:39250 - default
[2021-04-02T10:26:42.669Z] "POST /loki/api/v1/push HTTP/1.1" 204 - via_upstream - "-" 825 0 2 1 "-" "promtail/2.2.0" "2134ca32-1e36-902d-a43f-40a210490e19" "loki:3100" "10.244.120.65:3100" outbound|3100||loki.default.svc.cluster.local 10.244.120.66:50206 10.101.200.103:3100 10.244.120.66:39250 - default
...

After that, I can see only data sent to Loki by Promtail. So I there is no logs about Promtail trying to access the Kubernetes service. Could anyone tell me why is it that Envoy is not showing any logs about that?

Thanks a lot for any help!

I forgot to mention that I install Istio with:

$ istioctl install --set profile=demo

OK, I found a workaround, described here.

In essence, I need to add the following annotations to the pod:

  traffic.sidecar.istio.io/includeOutboundIPRanges: "*"
  traffic.sidecar.istio.io/excludeOutboundIPRanges: 10.96.0.1/32

That tells Envoy to leave any traffic to 10.96.0.1 untouched. That’s not ideal, but better than turning off Envoy completely…