Inconsistent networking issues with mongo replica sets

I’m trying to spin up a new set of K8S clusters on AWS (using kops, cluster at 1.14.1) with Istio 1.3.1 and I’m having a few issues getting pods to communicate.

The TLDR on this issue is the following:

I have two separate namespaces, in one I have a mongo cluster with replica sets across multiple shards, in a second namespace I have a simpler set of mongo replicas (not sharded).

It seems which ever one I setup first works properly, and the other one fails to connect (ie. I cannot run mongo’s rs.initiate() command with success).

I cannot find any supporting details on how to debug such an issue and I would love some direction on this, I can provide additional logs as needed.

Setup

Without being overly verbose here is what I’m doing:

  1. I follow the multi-cluster, replicated control plane setup b/c eventually this will be one of many clusters - link.
  2. I created the istio-system namespace, CA certs (and confirmed that cluster-to-cluster communication via mTLS worked with the ServiceEntry with sleep/httpbin example so I know istio is working in some capacity), and then loaded in istio-init and the istio helm generated files:
    i. I got istio via: curl -L https://git.io/getLatestIstio | ISTIO_VERSION=1.3.1 sh -
    ii. I setup the init properly and confirmed the crds, etc…
    iii. I used the following config for setup: helm template install/kubernetes/helm/istio --name istio --namespace istio-system -f install/kubernetes/helm/istio/example-values/values-istio-multicluster-gateways.yaml --set global.mtls.enabled=true --set grafana.enabled=true --set tracing.enabled=true --set tracing.provider=zipkin > istio-multicluster.yaml
    iv. I wait for all pods in istio-system to be in a ready state
  3. I setup the DNS for eventual ServiceEntry stubbing:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-dns
  namespace: kube-system
data:
  stubDomains: |
    {"global": ["$(kubectl get svc -n istio-system istiocoredns -o jsonpath={.spec.clusterIP})"]}
EOF
  1. I setup both namespaces with istio-injection: enabled set properly
  2. I launch a replica stateful set for mongo
Config for statefulset and service here
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongod-capture
  labels:
    app: mongod-capture
  namespace: capt-db
spec:
  serviceName: mongod-capture
  selector:
    matchLabels:
      app: mongod-capture
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: mongod-capture
        replicaset: rs0
        cluster: mendota
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: replicaset
                  operator: In
                  values:
                  - rs0
              topologyKey: kubernetes.io/hostname
      terminationGracePeriodSeconds: 10
      containers:
        - name: main
          image: mongo:4.0.10
          command:
            - "mongod"
            - "--port"
            - "27017"
            - "--bind_ip"
            - "0.0.0.0"
            - "--auth"
            - "--wiredTigerCacheSizeGB"
            - "0.5"
            - "--replSet"
            - "rs0"
            - "--keyFile"
            - "/etc/db-keys/keys"
          resources:
            requests:
              cpu: 50m
              memory: 100Mi
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongod-capture-persistent-storage
              mountPath: /data/db
            - name: db-keys-internal-capt
              mountPath: "/etc/db-keys"
              readOnly: true
      nodeSelector:
        kops.k8s.io/instancegroup: capt-db
      priorityClassName: db-config
      volumes:
      - name: db-keys-internal-capt
        secret:
          secretName: db-keys-internal-capt
          defaultMode: 0400
  volumeClaimTemplates:
  - metadata:
      name: mongod-capture-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: aws-hdd-db
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 50Gi
# Service def:
apiVersion: v1
kind: Service
metadata:
  name: mongod-capture
  labels:
    name: mongod-capture
    cluster: mendota
  namespace: capt-db
spec:
  ports:
  - name: "mongo"
    port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    app: mongod-capture

The config above is for the simpler replicated but not sharded cluster. There is a separate set of configs I use for sharding the other mongo cluster (mostly just more stateful sets, mongos routers, config instances, etc…).

Essentially, when I run the rs.initiate command (which looks like this for example: rs.initiate({"configsvr": true, "_id": "ConfigDBRepSet", "members": [{"host": "mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local:27017", "_id": 0}, {"host": "mongod-configdb-1.mongod-configdb.prod-db.svc.cluster.local:27017", "_id": 1}, {"host": "mongod-configdb-2.mongod-configdb.prod-db.svc.cluster.local:27017", "_id": 2}]})) the first one I run it on works, the second one fails with the following message:

{
	"ok" : 0,
	"errmsg" : "replSetInitiate quorum check failed because not all proposed set members responded affirmatively: mongod-configdb-2.mongod-configdb.prod-db.svc.cluster.local:27017 failed with Connection reset by peer, mongod-configdb-1.mongod-configdb.prod-db.svc.cluster.local:27017 failed with Connection reset by peer",
	"code" : 74,
	"codeName" : "NodeNotFound",
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("000000000000000000000000")
	},
	"lastCommittedOpTime" : Timestamp(0, 0)
}

If you look at the mongo logs after doing this the following happens:

2019-10-13T19:24:41.480+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:54152 #1 (1 connection now open)
2019-10-13T19:24:41.483+0000 I NETWORK  [conn1] end connection 127.0.0.1:54152 (0 connections now open)
2019-10-13T19:24:41.692+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:54158 #2 (1 connection now open)
2019-10-13T19:24:41.693+0000 I NETWORK  [conn2] end connection 127.0.0.1:54158 (0 connections now open)
2019-10-13T19:24:41.814+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:54162 #3 (1 connection now open)
2019-10-13T19:24:41.817+0000 I NETWORK  [conn3] end connection 127.0.0.1:54162 (0 connections now open)
2019-10-13T19:24:42.015+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:54164 #4 (1 connection now open)
2019-10-13T19:24:42.017+0000 I NETWORK  [conn4] end connection 127.0.0.1:54164 (0 connections now open)
2019-10-13T19:24:42.027+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:54166 #5 (1 connection now open)

Whereas the one that works properly looks like this:

2019-10-13T19:20:19.268+0000 I REPL     [conn3] replSetInitiate config object with 3 members parses ok
2019-10-13T19:20:19.268+0000 I ASIO     [Replication] Connecting to mongod-capture-1.mongod-capture.capt-db.svc.cluster.local:27017
2019-10-13T19:20:19.269+0000 I ASIO     [Replication] Connecting to mongod-capture-2.mongod-capture.capt-db.svc.cluster.local:27017
2019-10-13T19:20:19.280+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:58148 #8 (2 connections now open)

When I exec into the one that doesn’t work, and try to access the other replicas via mongo here is what it says:

mongo mongodb://mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local:27017
MongoDB shell version v4.0.10
connecting to: mongodb://mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local:27017/?gssapiServiceName=mongodb
2019-10-13T20:29:49.991+0000 E QUERY    [js] Error: network error while attempting to run command 'isMaster' on host 'mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local:27017'  :
connect@src/mongo/shell/mongo.js:344:17
@(connect):2:6
exception: connect failed

However DNS between them is working properly, when I dig I get the proper IP:

dig mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local

; <<>> DiG 9.10.3-P4-Ubuntu <<>> mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30833
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local. IN	A

;; ANSWER SECTION:
mongod-configdb-0.mongod-configdb.prod-db.svc.cluster.local. 2 IN A 100.96.101.11

Also I have confirmed that for the mongo cluster that doesn’t work, if I destroy the whole thing, remove istio from that namespace and start it back up, it works properly (however if I try to then add istio to the namespace and perform a rolling update to convert it into istio-enabled, this doesn’t work either - but that could be for other reasons I’m not thinking about for istio to non-istio pod communication).

Also I have tried this 2 times so far, the first time I setup the primary sharded cluster and that worked perfectly and the replicated one failed, and then I destroyed the whole cluster (istio and all) and did it in reverse and found the replicated one worked and the sharded one failed, so not sure if there is something odd going on there either.