Ingress/egress/sidecar proxies not running when using SDS in istio 1.5.0

I installed istio 1.5.0 using helm template with demo_values.yaml and following options.

–set global.controlPlaneSecurityEnabled=true --set global.mtls.enabled=true --set global.sds.enabled=true --set global.sds.udsPath=“unix:/var/run/sds/uds_path”

Any service which uses envoy-proxy including istio-gateways and sidecar proxies is not able to start. it’s failing at fetching configuration using SDS.

Below are the logs from a sidecar container.

2020-03-17T04:21:13.443129Z     info    FLAG: --binaryPath="/usr/local/bin/envoy"
2020-03-17T04:21:13.443158Z     info    FLAG: --concurrency="2"
2020-03-17T04:21:13.443163Z     info    FLAG: --configPath="/etc/istio/proxy"
2020-03-17T04:21:13.443168Z     info    FLAG: --connectTimeout="10s"
2020-03-17T04:21:13.443171Z     info    FLAG: --controlPlaneAuthPolicy="MUTUAL_TLS"
2020-03-17T04:21:13.443176Z     info    FLAG: --controlPlaneBootstrap="true"
2020-03-17T04:21:13.443179Z     info    FLAG: --customConfigFile=""
2020-03-17T04:21:13.443182Z     info    FLAG: --datadogAgentAddress=""
2020-03-17T04:21:13.443185Z     info    FLAG: --disableInternalTelemetry="false"
2020-03-17T04:21:13.443188Z     info    FLAG: --discoveryAddress="istio-pilot.istio-system:15011"
2020-03-17T04:21:13.443191Z     info    FLAG: --dnsRefreshRate="300s"
2020-03-17T04:21:13.443194Z     info    FLAG: --domain="scpsvc.svc.cluster.local"
2020-03-17T04:21:13.443197Z     info    FLAG: --drainDuration="45s"
2020-03-17T04:21:13.443200Z     info    FLAG: --envoyAccessLogService=""
2020-03-17T04:21:13.443203Z     info    FLAG: --envoyMetricsService=""
2020-03-17T04:21:13.443206Z     info    FLAG: --help="false"
2020-03-17T04:21:13.443209Z     info    FLAG: --id=""
2020-03-17T04:21:13.443212Z     info    FLAG: --ip=""
2020-03-17T04:21:13.443215Z     info    FLAG: --lightstepAccessToken=""
2020-03-17T04:21:13.443218Z     info    FLAG: --lightstepAddress=""
2020-03-17T04:21:13.443220Z     info    FLAG: --lightstepCacertPath=""
2020-03-17T04:21:13.443223Z     info    FLAG: --lightstepSecure="false"
2020-03-17T04:21:13.443230Z     info    FLAG: --log_as_json="false"
2020-03-17T04:21:13.443232Z     info    FLAG: --log_caller=""
2020-03-17T04:21:13.443235Z     info    FLAG: --log_output_level="default:info"
2020-03-17T04:21:13.443239Z     info    FLAG: --log_rotate=""
2020-03-17T04:21:13.443242Z     info    FLAG: --log_rotate_max_age="30"
2020-03-17T04:21:13.443245Z     info    FLAG: --log_rotate_max_backups="1000"
2020-03-17T04:21:13.443249Z     info    FLAG: --log_rotate_max_size="104857600"
2020-03-17T04:21:13.443252Z     info    FLAG: --log_stacktrace_level="default:none"
2020-03-17T04:21:13.443260Z     info    FLAG: --log_target="[stdout]"
2020-03-17T04:21:13.443266Z     info    FLAG: --mixerIdentity=""
2020-03-17T04:21:13.443269Z     info    FLAG: --outlierLogPath=""
2020-03-17T04:21:13.443288Z     info    FLAG: --parentShutdownDuration="1m0s"
2020-03-17T04:21:13.443293Z     info    FLAG: --pilotIdentity=""
2020-03-17T04:21:13.443298Z     info    FLAG: --proxyAdminPort="15000"
2020-03-17T04:21:13.443301Z     info    FLAG: --proxyComponentLogLevel="misc:error"
2020-03-17T04:21:13.443304Z     info    FLAG: --proxyLogLevel="warning"
2020-03-17T04:21:13.443308Z     info    FLAG: --serviceCluster="scpc-ss-configuration.scpsvc"
2020-03-17T04:21:13.443311Z     info    FLAG: --serviceregistry="Kubernetes"
2020-03-17T04:21:13.443315Z     info    FLAG: --statsdUdpAddress=""
2020-03-17T04:21:13.443318Z     info    FLAG: --statusPort="15020"
2020-03-17T04:21:13.443321Z     info    FLAG: --stsPort="0"
2020-03-17T04:21:13.443323Z     info    FLAG: --templateFile=""
2020-03-17T04:21:13.443327Z     info    FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2020-03-17T04:21:13.443330Z     info    FLAG: --trust-domain=""
2020-03-17T04:21:13.443333Z     info    FLAG: --zipkinAddress="zipkin.istio-system:9411"
2020-03-17T04:21:13.443408Z     info    Version 1.5.0-c3c353285578eb68b334fc8766746b754b6b3789-Clean
2020-03-17T04:21:13.443598Z     info    Obtained private IP [192.168.140.217]
2020-03-17T04:21:13.443642Z     info    Proxy role: &model.Proxy{ClusterID:"", Type:"sidecar", IPAddresses:[]string{"192.168.140.217", "192.168.140.217"}, ID:"scpc-ss-configuration-7647f78779-btsnk.scpsvc", Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:"scpsvc.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), WorkloadLabels:labels.Collection(nil), IstioVersion:(*model.IstioVersion)(nil)}
2020-03-17T04:21:13.443666Z     info    PilotSAN []string{"spiffe://cluster.local/ns/istio-system/sa/istio-pilot-service-account"}
2020-03-17T04:21:13.443677Z     info    MixerSAN []string{"spiffe://cluster.local/ns/istio-system/sa/istio-mixer-service-account"}
2020-03-17T04:21:13.444113Z     info    Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: /etc/istio/proxy
connectTimeout: 10s
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istio-pilot.istio-system:15011
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: scpc-ss-configuration.scpsvc
statNameLength: 189
tracing:
  zipkin:
address: zipkin.istio-system:9411

2020-03-17T04:21:13.444131Z     info    JWT policy is third-party-jwt
2020-03-17T04:21:13.444181Z     info    waiting 1m0s for /var/run/sds/uds_path
2020-03-17T04:22:12.958755Z     info    waiting for file
2020-03-17T04:22:13.058943Z     info    waiting for file
2020-03-17T04:22:13.159133Z     info    waiting for file
2020-03-17T04:22:13.259337Z     info    waiting for file
2020-03-17T04:22:13.359552Z     info    waiting for file
2020-03-17T04:22:13.459802Z     warn    file still not available after1m0s
2020-03-17T04:22:13.459904Z     info    Istio Agent uses default istiod CA
2020-03-17T04:22:13.459919Z     info    istiod uses self-issued certificate
2020-03-17T04:22:13.460000Z     warn    Failed to load root cert, assume IP secure network: open var/run/secrets/istio/root-cert.pem: no such file or directory
2020-03-17T04:22:13.513162Z     info    parsed scheme: ""
2020-03-17T04:22:13.513246Z     info    scheme "" not registered, fallback to default scheme
2020-03-17T04:22:13.513323Z     info    ccResolverWrapper: sending update to cc: {[{istiod.istio-system.svc:15010  <nil> 0 <nil>}] <nil> <nil>}
2020-03-17T04:22:13.513343Z     info    ClientConn switching balancer to "pick_first"
2020-03-17T04:22:13.513597Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000b9c230, {CONNECTING <nil>}
2020-03-17T04:22:13.513783Z     info    sds     SDS gRPC server for workload UDS starts, listening on "/etc/istio/proxy/SDS"

2020-03-17T04:22:13.513860Z     info    PilotSAN []string{"spiffe://cluster.local/ns/istio-system/sa/istio-pilot-service-account"}
2020-03-17T04:22:13.513902Z     info    Starting proxy agent
2020-03-17T04:22:13.514043Z     info    sds     Start SDS grpc server
2020-03-17T04:22:13.514622Z     info    Opening status port 15020

2020-03-17T04:22:13.514708Z     info    Received new config, creating new Envoy epoch 0
2020-03-17T04:22:13.514812Z     info    Epoch 0 starting
2020-03-17T04:22:13.524689Z     info    grpc: addrConn.createTransport failed to connect to {istiod.istio-system.svc:15010  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.96.0.10:53: no such host". Reconnecting...
2020-03-17T04:22:13.524828Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000b9c230, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.96.0.10:53: no such host"}
2020-03-17T04:22:13.526250Z     warn    failed to read pod labels: open ./etc/istio/pod/labels: no such file or directory
2020-03-17T04:22:13.527603Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster scpc-ss-configuration.scpsvc --service-node sidecar~192.168.140.217~scpc-ss-configuration-7647f78779-btsnk.scpsvc~scpsvc.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error --concurrency 2]
[Envoy (Epoch 0)] [2020-03-17 04:22:13.567][27][critical][main] [external/envoy/source/server/server.cc:96] error initializing configuration '/etc/istio/proxy/envoy-rev0.json': Invalid path: ./var/run/secrets/istio/root-cert.pem
Invalid path: ./var/run/secrets/istio/root-cert.pem
2020-03-17T04:22:13.569767Z     error   Epoch 0 exited with error: exit status 1
2020-03-17T04:22:13.569792Z     info    No more active epochs, terminating

Does anyone has any idea how to deploy istio-1.5.0 mesh with SDS and mTLS enabled. Any configurable option that I missed above ?

Note: istio-pilot, istio-proxy and telemetry pods are running fine even though they use sidecar proxy. Only difference i found in config is presence of ‘–templateFile="/etc/istio/proxy/envoy_pilot.yaml.tmpl"’ in these sidecars which is not present in applications sidecars/ingress-egress-istio-gateways

Hmm… I think SDS is broken in helm. It was an Alpha feature in helm and we did not give high priority to fix it. Is there a reason you can’t use istioctl? Or you can choose to use the secret volume mount with helm.

Yes, we can use istioctl. In fact, I was going to try that next but got a little confused with the options to enabled SDS in istioctl. I guess it uses the same options as helm. Will the following work ?
istioctl manifest apply --set values.global.sds.enabled=true
Do I need to add udsPath as well ? can you help with all applicable options if I want to enable strict mTLS with SDS ?
Secret volume mount we have already tried and it works well. We just want to explore SDS more since it brings few advantages over file mount (https://istio.io/pt-br/docs/tasks/security/citadel-config/auth-sds/).

@rajat Did you get this working? We are in the same boat.

using helm ? NO.
I tried the ‘istioctl’ way and it went well. Following options got us mTLS enabled globally.

bin/istioctl manifest apply --set profile=demo --set values.global.controlPlaneSecurityEnabled=true --set values.gateways.istio-ingressgateway.sds.enabled=true --set values.global.sds.enabled=true --set values.global.sds.udsPath=“unix:/var/run/sds/uds_path” --set values.global.mtls.enabled=true

@rajat I am not using helm, using Istioctl + kubectl. It seems to me that SDS is broken on this version… I cannot get it working on my custom gateway… I’ve enabled sds and the certs are not being picked-up

If I put SDS as path I get this

[Envoy (Epoch 0)] [2020-06-12 16:33:14.818][20][warning][config] [external/envoy/source/common/config/grpc_mux_subscription_impl.cc:82] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) 0.0.0.0_443: Invalid path: sds

If I remove it and just use the credentialsName only I get this

[Envoy (Epoch 0)] [2020-06-12 16:33:34.230][20][warning][config] 
[external/envoy/source/common/config/grpc_mux_subscription_impl.cc:82] gRPC config for 
type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) 0.0.0.0_443: 
Proto constraint validation failed (DownstreamTlsContextValidationError.CommonTlsContext: 
["embedded message failed validation"] | caused by 
CommonTlsContextValidationError.TlsCertificates[i]: ["embedded message failed validation"] | caused 
by TlsCertificateValidationError.CertificateChain: ["embedded message failed validation"] | caused by  
DataSourceValidationError.Filename: ["value length must be at least " '\x01' " bytes"]): 
common_tls_context {
  tls_certificates {
    certificate_chain {
    filename: ""
  }
  private_key {
     filename: ""
  }
}
 alpn_protocols: "h2"
 alpn_protocols: "http/1.1"
}
 require_client_certificate {

}

its so frustrating. We have multiple certs so we need to use sds otherwise I need to change the mounted certs