Ingress Gateway returns 404 and STOP letting in external traffic and doesn't route to Virtual Services #27080

(NOTE: This is used to report product bugs:
To report a security vulnerability, please visit https://istio.io/about/security-vulnerabilities/
To ask questions about how to use Istio, please visit https://discuss.istio.io
)

Bug description

  • The same gateway, virtualservice configurations used to work but suddenly stop working.
  • Routes not seen in proxy-config despite correct Gateway and VirtualService configs
  • I have deleted istio-system namespace and re-installed istio from scratch, then re-injected sidecar by kubectl rollout restart deployment --namespace staging. But no luck.
  • Connectivity within mesh works. Just external traffic through ingress gateway doesn’t work with gateway and virtual service.

Using EKS v1.6

Versions

$ istioctl version
client version: 1.6.7
privateingressgateway version: 
pilot version: 1.6.7
data plane version: 1.6.7 (6 proxies)

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.10-eks-bac369", GitCommit:"bac3690554985327ae4d13e42169e8b1c2f37226", GitTreeState:"clean", BuildDate:"2020-02-21T23:37:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Installed istio

istioctl install -f overrides.yaml

Where overrides.yaml is below

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    gateways:
      istio-ingressgateway:
        sds:
          enabled: true
  components: # ref: https://istio.io/latest/docs/reference/config/istio.operator.v1alpha1/#IstioOperatorSpec
    # sidecarInjector:
    #   enabled: true
    ingressGateways:
    - enabled: true
      name: istio-ingressgateway
      k8s:
        resources:
          requests:
            cpu: 10m
            memory: 40Mi
        service:
          ports:
          - name: status-port
            port: 15020
            targetPort: 15020
          - name: http2
            port: 80
            targetPort: 8080
          - name: https
            port: 443
            targetPort: 8443
          - name: tcp
            port: 31400
            targetPort: 31400
          - name: tls
            port: 15443
            targetPort: 15443      
        serviceAnnotations:
          # enable ELB access log
          # ref: https://www.giantswarm.io/blog/load-balancer-service-use-cases-on-aws
          service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
          # The interval for publishing the access logs (can be 5 or 60 minutes).
          service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: "5"
          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "xxx"
          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "public-elb" 
          # enable TLS termination at AWS ELB level
          service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-1:xxx:certificate/xxx" 
          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
          service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
    - enabled: true
      name: istio-private-ingressgateway
      label:
        istio: privateingressgateway # this will be needed as gateway will look for this selector
        app: istio-private-ingressgateway
      k8s:
        resources:
          requests:
            cpu: 10m
            memory: 40Mi
        service:
          ports:
          - name: status-port
            port: 15020
            targetPort: 15020
          - name: http2
            port: 80
            targetPort: 8080
          - name: https
            port: 443
            targetPort: 8443
          - name: tcp
            port: 31400
            targetPort: 31400
          - name: tls
            port: 15443
            targetPort: 15443      
        serviceAnnotations:
          # ref: https://medium.com/swlh/public-and-private-istio-ingress-gateways-on-aws-f968783d62fe
          service.beta.kubernetes.io/aws-load-balancer-internal: "true" # make this CLB private. Refs for service annotations for AWS ELB: https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#aws, https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html
          # enable ELB access log
          # ref: https://www.giantswarm.io/blog/load-balancer-service-use-cases-on-aws
          service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
          # The interval for publishing the access logs (can be 5 or 60 minutes).
          service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: "60"
          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "xxxx"
          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "internal-elb" 
          # enable TLS termination at AWS ELB level
          # service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-1:xxx:certificate/xxx" 
          # service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
          # service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
  values:
    gateways: # configure gateways: https://istio.io/latest/docs/setup/install/istioctl/#configure-gateways
      istio-ingressgateway: # for internal ELB
        applicationPorts: ""
        autoscaleEnabled: false
        debug: info
        domain: ""
        env: {}
        meshExpansionPorts:
        - name: tcp-pilot-grpc-tls
          port: 15011
          targetPort: 15011
        - name: tcp-istiod
          port: 15012
          targetPort: 15012
        - name: tcp-citadel-grpc-tls
          port: 8060
          targetPort: 8060
        - name: tcp-dns-tls
          port: 853
          targetPort: 8853
        name: istio-private-ingressgateway
        secretVolumes:
        - mountPath: /etc/istio/ingressgateway-certs
          name: ingressgateway-certs
          secretName: istio-ingressgateway-certs
        - mountPath: /etc/istio/ingressgateway-ca-certs
          name: ingressgateway-ca-certs
          secretName: istio-ingressgateway-ca-certs
        type: LoadBalancer

gateway.yaml

apiVersion: networking.istio.io/v1alpha3
kind: Gateway 
metadata:
  name: example-gateway
  namespace: staging
spec:
  selector:
    istio: ingressgateway # use istio default controller
  servers: # defines L7 host, port, and protocol
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts: # all other external requests will be rejected with a 404 response.
    - "example.co"
    - "*.example.co"
    # tls: 
    #   httpsRedirect: true # sends 301 redirect for http requests

VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: example-api-virtualservice
  namespace: staging
spec:
  hosts:
  - "example.co"
  - "*.example.co" 
  - example-api # for internal request among apps within Istio mesh
  gateways: # names of gateways and sidecars that should apply these routes
  - example-gateway.staging.svc.cluster.local 
  # - mesh  # applies to all the sidecars in the mesh. The reserved word mesh is used to imply all the sidecars in the mesh. When gateway field is omitted, the default gateway (mesh) will be used, which would apply the rule to all sidecars in the mesh. If a list of gateway names is provided, the rules will apply only to the gateways. To apply the rules to both gateways and sidecars, specify mesh as one of the gateway names. Ref: https://istio.io/latest/docs/reference/config/networking/virtual-service/#VirtualService
  http:
  - route: # default route
    - destination:
        host: example-api.staging.svc.cluster.local # specify service name
        port:
          number: 9999

Public ingress gateway’s route config

istioctl proxy-config routes istio-ingressgateway-59d7487bd6-55x58  -n istio-system -o json
[
    {
        "name": "http.80",
        "virtualHosts": [
            {
                "name": "blackhole:80",
                "domains": [
                    "*"
                ],
                "routes": [
                    {
                        "name": "default",
                        "match": {
                            "prefix": "/"
                        },
                        "directResponse": {
                            "status": 404
                        }
                    }
                ]
            }
        ],
        "validateClusters": false
    },
    {
        "virtualHosts": [
            {
                "name": "backend",
                "domains": [
                    "*"
                ],
                "routes": [
                    {
                        "match": {
                            "prefix": "/healthz/ready"
                        },
                        "route": {
                            "cluster": "agent"
                        }
                    }
                ]
            }
        ]
    },
    {
        "virtualHosts": [
            {
                "name": "backend",
                "domains": [
                    "*"
                ],
                "routes": [
                    {
                        "match": {
                            "prefix": "/stats/prometheus"
                        },
                        "route": {
                            "cluster": "prometheus_stats"
                        }
                    }
                ]
            }
        ]
    }
]

When I curl, I get 404

curl example.com/status.html -v
*   Trying xx.xx.xx.xx...
* TCP_NODELAY set
* Connected to example.com (xx.xx.xx.xx) port 80 (#0)
> GET /status.html HTTP/1.1
> Host: example.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Fri, 04 Sep 2020 19:44:02 GMT
< server: istio-envoy
< Content-Length: 0
< Connection: keep-alive
< 
* Connection #0 to host example.com left intact

Istio-proxy log shows request is reaching envoy in ingressgateway pod, but not sending it to virtual service

2020-09-04T19:44:12.323089827Z [2020-09-04T19:44:03.154Z] "GET /status.html HTTP/1.1" 404 - "-" "-" 0 0 0 - "1.20.48.191,10.1.103.200" "curl/7.54.0" "73a3b7d3-b2d1-9fe9-aaf5-c558bd2f4901" "stagingapi.peerwell.co" "-" - - 10.1.103.100:8080 10.1.103.200:6402 - default

I can connect to this backend pod through service or directly from another curl pod from within the cluster

What baffles me the most is gateway and virtual service configs haven’t changed.

DNS is okay, it’s resolving to AWS ELB’s IP

$ host example.com 
xx.xx.xx.xx

I don’t know why route configs are not seen in proxy-config, this seems to be the cause, although I see gateway and virtualservice yamls on Kiali dashboard without any errors.

$ kubectl get gw,vs -n staging
NAME                                           AGE
gateway.networking.istio.io/example-gateway   57m

NAME                                                             GATEWAYS                                       HOSTS                                               AGE
virtualservice.networking.istio.io/example-api-virtualservice   [example-gateway.staging.svc.cluster.local]   [stagingapi.example.co example.co example-api]   57m

istioctl proxy-config routes istio-ingressgateway-59d7487bd6-55x58  -n istio-system -o json
[
    {
        "name": "http.80",
        "virtualHosts": [
            {
                "name": "blackhole:80",
                "domains": [
                    "*"
                ],
                "routes": [
                    {
                        "name": "default",
                        "match": {
                            "prefix": "/"
                        },
                        "directResponse": {
                            "status": 404
                        }
                    }
                ]
            }
        ],
        "validateClusters": false
    },

Affected product area (please put an X in all that apply)

Docs
Installation
[x ] Networking
Performance and Scalability
Extensions and Telemetry
Security
Test and Release
[x ] User Experience
Developer Infrastructure

Affected features (please put an X in all that apply)

Multi Cluster
Virtual Machine
Multi Control Plane

Expected behavior

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

How was Istio installed?

Environment where bug was observed (cloud vendor, OS, etc)

Additionally, please consider attaching a cluster state archive by attaching
the dump file to this issue.

Hi,
a bit late but had a similar problem on GCP kubernetes v1.18.20-gke.4501 with Istio 1.5.9.

At some point perfectly working cluster stopped running Istio VS routing.
VS yaml files accepted, no problem, but not applied in real.
Two days later all ingress stopped and gateway returned HTTP/2 404 - no problems or issues had been seen anywhere.

A lot of debugging but no luck at all.
Reinstalled Istio, new GCP LB L4. A lot of work, new DNS records and so on.
Worked for about an hour, after the same.

What I did to fix it?
Deleted all gateways. (This was a real pain for a production cluster, all of these problems, but it was not working anyway.)
After that I applied the gateways one by one to see if it works or not. (Why? Just tried something, had no other solution and could not loose the cluster.)
One gateway did somehow crush the cluster ingress, so deleted it again and “renamed” it. Instead of gateway-name I put gateway-name2. Updated essential Istio VSes with this new gateway name and applied one by one to see it works.
And it has been.

It looks like a lot of VS definitions to one gateway make a problem. Although it did not show any validation issues or something at all, nowhere.

At this point the cluster was working, letting external traffic in and routed it through VS to applications.

1 Like