Ingress Gateway SDS not working with local build of 1.2.5

For internal rules reasons I have to build my own images for Istio, this is further complicated by the fact that right now I am only allowed to use a Centos7 base or a scratch base.

We cannot use Centos7 for the proxy due to older C++ ABI which requires Centos8 so right now we build it with a multistage dockerfile using an Ubuntu image as the builder and transfer pilot-agent and envoy with required libraries to a scratch image for runtime.

Whilst convoluted this has worked for us for a number of months until now.

We want to switch to using SDS for ingressgateway TLS credentials. To enable this we have moved to Istio version 1.2.5.
We have configured deployment appropriately and can see that the node-agent sidecar is correctly watching secrets.

However we notice that there is no GRPC request coming to the nodeagent sidecar from the proxy to trigger the push of the secrets.

(i.e. we do not see this message in nodeagent sidecar logs -https://github.com/istio/istio/blob/d9e231eda0e163d0f3df0103546c7a06b72cc48d/security/pkg/nodeagent/sds/sdsservice.go#L208)
Hence the sidecar does not push secrets to the proxy.

I have validated that the gateway deployment is ok by replacing my local build image with the upstream from docker (docker.io/istio/proxyv2:1.2.5). When we swap the image we do get the initial request to the sidecar and the credential is pushed to the proxy.

However we have compared our build with the upstream and found:

  • both build from the same istio proxy SHA
  • server_info dump from port 15000 on the proxy shows the same values
  • log output from proxy identical (apart from timestamps) _ set at debug level
  • no differences in the /etc/istio/proxy/ json files
  • temporarily added a static Unix Domain Socket echo server as the entrypoint in our proxy and checked we could write and get echo back from the node-agent sidecar (i.e. shared mount for UDS is ok)

Following the debug steps in TLS SDS/credentialName not working with Ingress Gateway we can see that the specified dynamicactivelistener config is not created for our build of the proxy.

Pretty much out of ideas on this right now.

My deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: istio-ingressgateway
  namespace: istio-system
  labels:
    chart: gateways
    heritage: Tiller
    release: istio
    app: istio-ingressgateway
    istio: ingressgateway
spec:
  progressDeadlineSeconds: 900
  replicas: 3
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway
  template:
    metadata:
      labels:
        chart: gateways
        heritage: Tiller
        release: istio
        app: istio-ingressgateway
        istio: ingressgateway
      annotations:
        sidecar.istio.io/inject: "false"
        scheduler.alpha.kubernetes.io/critical-pod: ""
    spec:
      serviceAccountName: istio-ingressgateway-service-account
      containers:
        - name: ingress-sds
          image: "myregistry/istio/node-agent-k8s:1.2.5-253"
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              cpu: 2000m
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 128Mi
            
          env:
          - name: "ENABLE_WORKLOAD_SDS"
            value: "false"
          - name: "ENABLE_INGRESS_GATEWAY_SDS"
            value: "true"
          - name: "INGRESS_GATEWAY_NAMESPACE"
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          volumeMounts:
          - name: ingressgatewaysdsudspath
            mountPath: /var/run/ingress_gateway
        - name: istio-proxy
          image: "myregistry/istio/proxyv2:1.2.5-253"
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
            - containerPort: 443
            - containerPort: 15020
            - containerPort: 15090
              protocol: TCP
              name: http-envoy-prom
          args:
          - proxy
          - router
          - --domain
          - $(POD_NAMESPACE).svc.cluster.local
          - --log_output_level=default:info
          - --drainDuration
          - '45s' #drainDuration
          - --parentShutdownDuration
          - '1m0s' #parentShutdownDuration
          - --connectTimeout
          - '10s' #connectTimeout
          - --serviceCluster
          - istio-ingressgateway
          - --zipkinAddress
          - zipkin:9411
          - --proxyAdminPort
          - "15000"
          - --statusPort
          - "15020"
          - --controlPlaneAuthPolicy
          - NONE
          - --discoveryAddress
          - istio-pilot:15010
          readinessProbe:
            failureThreshold: 30
            httpGet:
              path: /healthz/ready
              port: 15020
              scheme: HTTP
            initialDelaySeconds: 1
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: 2000m
              memory: 2Gi
            requests:
              cpu: 2000m
              memory: 2Gi
            
          env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: INSTANCE_IP
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: status.podIP
          - name: HOST_IP
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: status.hostIP
          - name: ISTIO_META_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: ISTIO_META_CONFIG_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: ISTIO_META_USER_SDS
            value: "true"
          - name: ISTIO_META_ROUTER_MODE
            value: sni-dnat
          volumeMounts:
          - name: sdsudspath
            mountPath: /var/run/sds
            readOnly: true
          - name: ingressgatewaysdsudspath
            mountPath: /var/run/ingress_gateway
          - name: istio-certs
            mountPath: /etc/certs
            readOnly: true
          - name: ingressgateway-certs
            mountPath: "/etc/istio/ingressgateway-certs"
            readOnly: true
          - name: ingressgateway-ca-certs
            mountPath: "/etc/istio/ingressgateway-ca-certs"
            readOnly: true
      volumes:
      - name: ingressgatewaysdsudspath
        emptyDir: {}
      - name: sdsudspath
        hostPath:
          path: /var/run/sds
      - name: istio-certs
        secret:
          secretName: istio.istio-ingressgateway-service-account
          optional: true
      - name: ingressgateway-certs
        secret:
          secretName: "istio-ingressgateway-certs"
          optional: true
      - name: ingressgateway-ca-certs
        secret:
          secretName: "istio-ingressgateway-ca-certs"
          optional: true
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: istio
                  operator: In
                  values:
                  - ingressgateway
              topologyKey: kubernetes.io/hostname
            weight: 100
           
      nodeSelector:
        affinity: ingress
        
      securityContext:
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "80"

My dockerfile for building:

ARG ISTIO_VERSION
FROM myregistryistio-package/istiobuilder:${ISTIO_VERSION} as builder

ARG ARTIFACTORY_USER
ARG ARTIFACTORY_PW
USER root

ENV REQUIRED='git build-essential g++-7 gcc-7 cmake jq \
    pkg-config zip zlib1g-dev unzip python \
    openjdk-8-jdk ninja-build curl vim libgomp1\
    autoconf autogen libtool ca-certificates'

RUN apt-get update --fix-missing && \
    apt-get install -y ${REQUIRED} && \
    rm -rf /var/lib/apt/lists/*
RUN cd /usr/local/ && curl -f -s -u ${ARTIFACTORY_USER}:${ARTIFACTORY_PW} "https://artifactory.my.com/artifactory/developer-tools/go/1.12.5/go1.12.5.linux-amd64.tar.gz" | tar -xz
ENV PATH=/usr/local/go/bin:$PATH
ENV BAZEL_VERSION=0.26.1
ENV BAZEL_BIN=bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh


RUN curl -Lfs https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/${BAZEL_BIN} -o /tmp/${BAZEL_BIN} && \
    chmod +x /tmp/${BAZEL_BIN} && \
    /tmp/${BAZEL_BIN} && \
    rm -f /tmp/${BAZEL_BIN}

USER myuser

ENV GOPATH=/home/myuser/go
ENV PROXY_BUILD=/home/myuser/proxybuild
ENV ARTIFACTS_LOG=/home/myuser/artifacts.txt
ENV PROXY_FOLDER=istio-proxy

RUN export CC=/usr/bin/gcc && \
    export CXX=/usr/bin/g++ && \ 
    mkdir -p ${GOPATH}/src/github.com/istio && \
    cd ${GOPATH}/src/github.com/istio && \
    git clone https://github.com/istio/proxy.git ${PROXY_FOLDER} && \
    cd ${PROXY_FOLDER} && \
    export PROXY_VERSION=`cat /home/myuser/go/src/istio.io/istio/istio.deps | jq -r '.[]|select(.name=="PROXY_REPO_SHA")|.lastStableSHA'` && \
    git checkout ${PROXY_VERSION} && \
    ln -s $(pwd) ${PROXY_BUILD} && \
    echo "ISTIO-VERSION: ${ISTIO_VERSION}" > ${ARTIFACTS_LOG} && \
    echo "ISTIO-PROXY-SHA: ${PROXY_VERSION}" >> ${ARTIFACTS_LOG} && \
    export ENVOY_SHA=`cat /home/myuser/go/src/github.com/istio/${PROXY_FOLDER}/istio.deps|jq -r '.[]|select(.name=="ENVOY_SHA")|.lastStableSHA'` && \
    echo "ENVOY-SHA: ${ENVOY_SHA}" >> ${ARTIFACTS_LOG} && \
    cd ${GOPATH}/src/github.com/istio/istio-proxy && make BAZEL_BUILD_ARGS=""  build && \
    cp ${GOPATH}/src/github.com/istio/istio-proxy/tools/deb/envoy.json ${PROXY_BUILD}

USER root

RUN groupadd --gid 10001 istio-user && useradd --gid 10001 --uid 10001 istio-user && \
    mkdir -p ${PROXY_BUILD}/var/lib/istio/envoy && \
    mkdir -p ${PROXY_BUILD}/var/lib/istio/proxy && \
    mkdir -p ${PROXY_BUILD}/var/lib/istio/config && \
    mkdir -p ${PROXY_BUILD}/var/log/istio && \
    chown -R 10001.10001 ${PROXY_BUILD}/var/lib/istio && \
    chown -R 10001.10001 ${PROXY_BUILD}/var/log/istio && \
    mkdir -p ${PROXY_BUILD}/etc/istio/proxy && \
    mkdir -p ${PROXY_BUILD}/usr/local/bin && \
    mkdir -p ${PROXY_BUILD}/lib/x86_64-linux-gnu/ && \
    mkdir -p ${PROXY_BUILD}/usr/lib/x86_64-linux-gnu/ && \
    mkdir -p ${PROXY_BUILD}/var/log/istio && \
    mkdir -p ${PROXY_BUILD}/var/run/sds && mkdir -p /var/run/ingress_gateway && \
    bash -c 'cp /lib/x86_64-linux-gnu/{libc.so.6,libm.so.6,libpthread.so.0,libdl.so.2,librt.so.1} ${PROXY_BUILD}/lib/x86_64-linux-gnu/' && \
    bash -c 'cp -r /usr/lib/x86_64-linux-gnu/{libstdc++.so*,libssl.so.1.1,libcrypto.so.1.1,gconv,libgomp.so.1*} ${PROXY_BUILD}/usr/lib/x86_64-linux-gnu/' && \
    cp /home/myuser/go/out/linux_amd64/release/pilot-agent ${PROXY_BUILD}/usr/local/bin && \
    cp ${PROXY_BUILD}/bazel-out/k8-fastbuild/bin/src/envoy/envoy ${PROXY_BUILD}/usr/local/bin && \
    cp /home/myuser/go/src/istio.io/istio/tools/packaging/common/istio-iptables.sh ${PROXY_BUILD}/usr/local/bin && \
    cp /home/myuser/go/src/istio.io/istio/tools/packaging/common/envoy_bootstrap_v2.json ${PROXY_BUILD}/var/lib/istio/envoy/envoy_bootstrap_tmpl.json && \
    cp /home/myuser/go/src/istio.io/istio/tools/packaging/common/sidecar.env ${PROXY_BUILD}/var/lib/istio/envoy/ && \
    cp /home/myuser/go/src/istio.io/istio/tools/packaging/common/envoy_bootstrap_drain.json ${PROXY_BUILD}/var/lib/istio/envoy/envoy_bootstrap_drain.json && \
    echo "istio-user:10001:10001:istio-user:/var/lib/istio" > ${PROXY_BUILD}/etc/passwd && \
    cp ${GOPATH}/src/istio.io/istio/pilot/docker/*.yaml.tmpl  ${PROXY_BUILD}/etc/istio/proxy  && \
    mkdir -p /home/myuser/etc/ssl/certs/ && \
    for cert in $(find /usr/share/ca-certificates -type f | sort); do cat "$cert" >> /home/myuser/etc/ssl/certs/ca-certificates.crt;done && \
    tar -czf ${PROXY_BUILD}/ca-certificates.tgz --owner=0 --group=0 /home/myuser/etc/ssl/certs/ca-certificates.crt

FROM docker.io/istio/proxyv2:1.2.5 as upstream

FROM scratch
LABEL Maintainer="me<my@my.com>"

ENV PROXY_BUILD=/home/myuser/proxybuild


COPY --from=builder /home/myuser/artifacts.txt /artifacts.txt
COPY --from=builder --chown=10001:10001 ${PROXY_BUILD} /
COPY --from=builder /lib64/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2
COPY --from=upstream /usr/local/bin/pilot-agent  /usr/local/bin/pilot-agent
USER istio-user

ENTRYPOINT ["/usr/local/bin/pilot-agent"]

SHAs on my built image:
ISTIO-PROXY-SHA: 490fa3febe68bd0ced2f92ba29df899896376ced
ENVOY-SHA: 9ddb95ee406303c484b1ce1d95810f1ca8bfb22d

so based on further exploration the issue is not related to the binaries I have built but the image itself.
I have tried copying in the upstream binaries for pilot-agent and envoy to my scratch image. All linker dependencies are resolved and proxy works ok apart from SDS sidecar interaction.
What am I missing?

I have revisited this, my image build and linking was fine. It seems I was missing something in copying over to the scratch target.
I updated to 1.3 and changed to use the fetch_ca_certs script. All is good now.
Consider this resolved