Hi community,
We have been running Istio 1.4.6 with istio operator in a multicluster setup with replicated control planes with our own Root CA, and decided to upgrade to Istio 1.5.1, also using the operator.
It seems everything is working okay when using the self signed certificates (not in multicluster), but when we plug in our own CA, envoy proxies fail to load the certificates.
Below, are the logs from an envoy proxy:
2020-04-03T11:18:29.393893Z info Using user-configured CA istio-pilot.istio- system.svc:15012
2020-04-03T11:18:29.393967Z info istiod uses self-issued certificate
2020-04-03T11:18:29.394152Z info the CA cert of istiod is: -----BEGIN CERTIFICATE-----
MIIGdTCCBF2gAwIBAgIRAJwJ+0liyhvFmJ0o6pN3kOgwDQYJKoZIhvcNAQEMBQAw
STEPMA0GA1UECgwGTGF5ZXI4MTYwNAYDVQQDDC1MYXllcjggSXN0aW8gUm9vdCBD
ZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSBERVYwHhcNMjAwMjI3MDAwMDAwWhcNMzAw
MjI3MTYxNTQ4WjBJMQ8wDQYDVQQKDAZMYXllcjgxNjA0BgNVBAMMLUxheWVyOCBJ
c3RpbyBSb290IENlcnRpZmljYXRpb24gQXV0aG9yaXR5IERFVjCCAiIwDQYJKoZI
....
Z1VizAYNXCiBGa406o43aKvImvZwPA3khw==
-----END CERTIFICATE-----
2020-04-03T11:18:29.394764Z info parsed scheme: ""
2020-04-03T11:18:29.394827Z info scheme "" not registered, fallback to default scheme
2020-04-03T11:18:29.394957Z info ccResolverWrapper: sending update to cc: {[{istio-pilot.istio-system.svc:15012 <nil> 0 }] }
2020-04-03T11:18:29.394998Z info ClientConn switching balancer to "pick_first"
2020-04-03T11:18:29.395455Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000349d20, {CONNECTING <nil>}
2020-04-03T11:18:29.433962Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000349d20, {READY <nil>}
2020-04-03T11:18:29.527240Z info sds SDS gRPC server for workload UDS starts, listening on "/etc/istio/proxy/SDS"
2020-04-03T11:18:29.527352Z info PilotSAN []string{"istiod.istio-system.svc"}
2020-04-03T11:18:29.527387Z info Starting proxy agent
2020-04-03T11:18:29.527915Z info Received new config, creating new Envoy epoch 0
2020-04-03T11:18:29.527357Z info sds Start SDS grpc server
2020-04-03T11:18:29.528422Z info Epoch 0 starting
2020-04-03T11:18:29.528548Z info Opening status port 15020
2020-04-03T11:18:30.934139Z info Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: dial tcp 127.0.0.1:15000: connect: connection refused
2020-04-03T11:18:31.538366Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster <pod>.<ns> --service-node sidecar~10.36.0.2~<omitted>.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l info --component-log-level misc:error --concurrency 2]
...
2020-04-03T11:18:39.837542Z info sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-1 resource:ROOTCA new connection
2020-04-03T11:18:39.837880Z info sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-2 resource:default new connection
2020-04-03T11:18:40.838820Z error cache node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-1 resource:ROOTCA failed to get root cert for proxy
2020-04-03T11:18:40.838951Z error sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-1 resource:ROOTCA Close connection. Failed to get secret for proxy "sidecar~10.36.0.2~<omited>.svc.cluster.local" from secret cache: failed to get root cert
2020-04-03T11:18:40.839164Z info sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-1 resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled
[Envoy (Epoch 0)] [2020-04-03 11:18:40.839][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 2, failed to get root cert
2020-04-03T11:18:40.885652Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-04-03T11:18:40.886495Z info sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-2 resource:default pushed key/cert pair to proxy
2020-04-03T11:18:40.886570Z info sds node:sidecar~10.36.0.2~<omitted>.svc.cluster.local-2 resource:default pushed secret
2020-04-03T11:18:40.935589Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-03T11:18:40.940688Z info sds node:sidecar~10.36.0.2~gvm11-postgres-0.vuln8-qa~vuln8-qa.svc.cluster.local-3 resource:ROOTCA new connection
2020-04-03T11:18:40.941062Z info sds node:sidecar~10.36.0.2~gvm11-postgres-0.vuln8-qa~vuln8-qa.svc.cluster.local-3 resource:ROOTCA pushed root cert to proxy
2020-04-03T11:18:40.941106Z info sds node:sidecar~10.36.0.2~gvm11-postgres-0.vuln8-qa~vuln8-qa.svc.cluster.local-3 resource:ROOTCA pushed secret
[Envoy (Epoch 0)] [2020-04-03 11:18:40.942][19][warning][config] [external/envoy/source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.auth.Secret rejected: Failed to load certificate chain from <inline>
The IstioOperator CR we are using:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istiocontrolplane
namespace: istio-system
spec:
addonComponents:
istiocoredns:
enabled: true
components:
egressGateways:
- enabled: true
name: istio-egressgateway
profile: default
values:
gateways:
enabled: true
istio-egressgateway:
env:
ISTIO_META_REQUESTED_NETWORK_VIEW: external
global:
mtls:
auto: true
enabled: false
multiCluster:
clusterName: <omitted>
enabled: true
podDNSSearchNamespaces:
- global
proxy:
accessLogFile: /dev/stdout
clusterDomain: cluster.local
image: proxyv2
logLevel: info
resources:
limits:
cpu: 2000m
memory: 1024Mi
requests:
cpu: 10m
memory: 40Mi
sds:
enabled: false
kiali:
enabled: true
tag: latest
pilot:
autoscaleEnabled: false
enabled: true
image: pilot
resources:
requests:
cpu: 10m
memory: 100Mi
sidecar: true
traceSampling: 100
prometheus:
enabled: false
security:
selfSigned: false
sidecarInjectorWebhook:
enabled: true
rewriteAppHTTPProbe: false
The certificates are exactly the same we were previously using, so no there should be no problems with them.
Are we missing something here?
Thank You