Hi
I’ve been struggleing with istio… So here I am seeking help from the expert.
Background
I’m trying to deploy my kubeflow application for multi-tenency with dex.
Refering to the kubeflow offical document with the manifest file from github
Here is a table of some of the key information
name | version | description |
---|---|---|
kubernetes | 1.15 | I’m running kubernetes 1.15 on GKE |
istio | 1.1.6 | It’s been used in kubeflow for service me |
kubeflow | 1.0 | kubeflow for ML |
dex | 1.0 | For authn |
With the manifest file I successfully deployed the kubeflow on my cluster. Here’s what I’ve done.
- Deploy the kubeflow application on the cluster
- Deploy Dex with OIDC service to enable authn to google Oauth2.0
- Enable the RBAC
- create envoy filter to append header “kubeflow-userid” as the login user
Here is a verification of step 3 and 4
Check RBAC enabled and envoyfilter added for kubeflow-userid
[root@gke-client-tf leilichao]# k get clusterrbacconfigs -o yaml
apiVersion: v1
items:
- apiVersion: rbac.istio.io/v1alpha1
kind: ClusterRbacConfig
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.istio.io/v1alpha1","kind":"ClusterRbacConfig","metadata":{"annotations":{},"name":"default"},"spec":{"mode":"ON"}}
creationTimestamp: "2020-07-04T01:28:52Z"
generation: 2
name: default
resourceVersion: "5986075"
selfLink: /apis/rbac.istio.io/v1alpha1/clusterrbacconfigs/default
uid: db70920e-f364-40ec-a93b-a3364f88650f
spec:
mode: "ON"
kind: List
metadata:
resourceVersion: ""
selfLink: ""
[root@gke-client-tf leilichao]# k get envoyfilter -n istio-system -o yaml
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.istio.io/v1alpha3","kind":"EnvoyFilter","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"oidc-authservice","app.kubernetes.io/instance":"oidc-authservice-v1.0.0","app.kubernetes.io/managed-by":"kfctl","app.kubernetes.io/name":"oidc-authservice","app.kubernetes.io/part-of":"kubeflow","app.kubernetes.io/version":"v1.0.0"},"name":"authn-filter","namespace":"istio-system"},"spec":{"filters":[{"filterConfig":{"httpService":{"authorizationRequest":{"allowedHeaders":{"patterns":[{"exact":"cookie"},{"exact":"X-Auth-Token"}]}},"authorizationResponse":{"allowedUpstreamHeaders":{"patterns":[{"exact":"kubeflow-userid"}]}},"serverUri":{"cluster":"outbound|8080||authservice.istio-system.svc.cluster.local","failureModeAllow":false,"timeout":"10s","uri":"http://authservice.istio-system.svc.cluster.local"}},"statusOnError":{"code":"GatewayTimeout"}},"filterName":"envoy.ext_authz","filterType":"HTTP","insertPosition":{"index":"FIRST"},"listenerMatch":{"listenerType":"GATEWAY"}}],"workloadLabels":{"istio":"ingressgateway"}}}
creationTimestamp: "2020-07-04T01:40:43Z"
generation: 1
labels:
app.kubernetes.io/component: oidc-authservice
app.kubernetes.io/instance: oidc-authservice-v1.0.0
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/name: oidc-authservice
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v1.0.0
name: authn-filter
namespace: istio-system
resourceVersion: "4715289"
selfLink: /apis/networking.istio.io/v1alpha3/namespaces/istio-system/envoyfilters/authn-filter
uid: e599ba82-315a-4fc1-9a5d-e8e35d93ca26
spec:
filters:
- filterConfig:
httpService:
authorizationRequest:
allowedHeaders:
patterns:
- exact: cookie
- exact: X-Auth-Token
authorizationResponse:
allowedUpstreamHeaders:
patterns:
- exact: kubeflow-userid
serverUri:
cluster: outbound|8080||authservice.istio-system.svc.cluster.local
failureModeAllow: false
timeout: 10s
uri: http://authservice.istio-system.svc.cluster.local
statusOnError:
code: GatewayTimeout
filterName: envoy.ext_authz
filterType: HTTP
insertPosition:
index: FIRST
listenerMatch:
listenerType: GATEWAY
workloadLabels:
istio: ingressgateway
kind: List
metadata:
resourceVersion: ""
selfLink: ""
RBAC Issue problem analysis
After I finished my deployment. I performed below functional testing:
- I can login with my google account with google oauth
- I was able to create my own profile/namespace
- I was able to create a notebook server
- However I can NOT connect to the notebook server
RBAC Issue investigation
I’m getting “RBAC: access denied” error after I successfully created the notebook server on kubeflow and trying to connect the notebook server.
I managed to updated the envoy log level and get the log below.
[2020-08-06 13:32:43.290][26][debug][rbac] [external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:64] checking request: remoteAddress: 10.1.1.2:58012, localAddress: 10.1.2.66:8888, ssl: none, headers: ':authority', 'compliance-kf-system.ml'
':path', '/notebook/roger-l-c-lei/aug06/'
':method', 'GET'
'user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
'accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
'accept-encoding', 'gzip, deflate'
'accept-language', 'en,zh-CN;q=0.9,zh;q=0.8'
'cookie', 'authservice_session=MTU5NjY5Njk0MXxOd3dBTkZvMldsVllVMUZPU0VaR01sSk5RVlJJV2xkRFVrRTFTVUl5V0RKV1EwdEhTMU5QVjFCVlUwTkpSVFpYUlVoT1RGVlBUa0U9fN3lPBXDDSZMT9MTJRbG8jv7AtblKTE3r84ayeCYuKOk; _xsrf=2|1e6639f2|10d3ea0a904e0ae505fd6425888453f8|1596697030'
'referer', 'http://compliance-kf-system.ml/jupyter/'
'upgrade-insecure-requests', '1'
'x-forwarded-for', '10.10.10.230'
'x-forwarded-proto', 'http'
'x-request-id', 'babbf884-4cec-93fd-aea6-2fc60d3abb83'
'kubeflow-userid', 'roger.l.c.lei@XXXX.com'
'x-istio-attributes', 'CjAKHWRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZXNwYWNlEg8SDXJvZ2VyLWwtYy1sZWkKIwoYZGVzdGluYXRpb24uc2VydmljZS5uYW1lEgcSBWF1ZzA2Ck4KCnNvdXJjZS51aWQSQBI+a3ViZXJuZXRlczovL2lzdGlvLWluZ3Jlc3NnYXRld2F5LTg5Y2Q0YmQ0Yy1kdnF3dC5pc3Rpby1zeXN0ZW0KQQoXZGVzdGluYXRpb24uc2VydmljZS51aWQSJhIkaXN0aW86Ly9yb2dlci1sLWMtbGVpL3NlcnZpY2VzL2F1ZzA2CkMKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBInEiVhdWcwNi5yb2dlci1sLWMtbGVpLnN2Yy5jbHVzdGVyLmxvY2Fs'
'x-envoy-expected-rq-timeout-ms', '300000'
'x-b3-traceid', '3bf35cca1f7b75e7a42a046b1c124b1f'
'x-b3-spanid', 'a42a046b1c124b1f'
'x-b3-sampled', '1'
'x-envoy-original-path', '/notebook/roger-l-c-lei/aug06/'
'content-length', '0'
'x-envoy-internal', 'true'
, dynamicMetadata: filter_metadata {
key: "istio_authn"
value {
}
}
[2020-08-06 13:32:43.290][26][debug][rbac] [external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:108] enforced denied
From the source code it looks like the allowed function is returnning false so it’s giving the “RBAC: access denied” response.
if (engine.has_value()) {
if (engine->allowed(*callbacks_->connection(), headers,
callbacks_->streamInfo().dynamicMetadata(), nullptr)) {
ENVOY_LOG(debug, "enforced allowed");
config_->stats().allowed_.inc();
return Http::FilterHeadersStatus::Continue;
} else {
ENVOY_LOG(debug, "enforced denied");
callbacks_->sendLocalReply(Http::Code::Forbidden, "RBAC: access denied", nullptr,
absl::nullopt);
config_->stats().denied_.inc();
return Http::FilterHeadersStatus::StopIteration;
}
}
I took a search on the dumped envoy, it looks like the rule should be allowing any request with a header key as my mail address. Now I can confirm I’ve got that in my header from above log.
{
"name": "envoy.filters.http.rbac",
"config": {
"rules": {
"policies": {
"ns-access-istio": {
"permissions": [
{
"and_rules": {
"rules": [
{
"any": true
}
]
}
}
],
"principals": [
{
"and_ids": {
"ids": [
{
"header": {
"exact_match": "roger.l.c.lei@XXXX.com"
}
}
]
}
}
]
}
}
}
}
}
With the understand that the envoy config that’s been used to validate RBAC authz is from this config. And it’s distributed to the sidecar by mixer, The log and code leads me to the rbac.istio.io config of servicerolebinding.
[root@gke-client-tf leilichao]# k get servicerolebinding -n roger-l-c-lei -o yaml
apiVersion: v1
items:
- apiVersion: rbac.istio.io/v1alpha1
kind: ServiceRoleBinding
metadata:
annotations:
role: admin
user: roger.l.c.lei@XXXX.com
creationTimestamp: "2020-07-04T01:35:30Z"
generation: 5
name: owner-binding-istio
namespace: roger-l-c-lei
ownerReferences:
- apiVersion: kubeflow.org/v1
blockOwnerDeletion: true
controller: true
kind: Profile
name: roger-l-c-lei
uid: 689c9f04-08a6-4c51-a1dc-944db1a66114
resourceVersion: "23201026"
selfLink: /apis/rbac.istio.io/v1alpha1/namespaces/roger-l-c-lei/servicerolebindings/owner-binding-istio
uid: bbbffc28-689c-4099-837a-87a2feb5948f
spec:
roleRef:
kind: ServiceRole
name: ns-access-istio
subjects:
- properties:
request.headers[]: roger.l.c.lei@XXXX.com
status: {}
kind: List
metadata:
resourceVersion: ""
selfLink: ""
```
I wanted to have a try updating this ServiceRoleBinding to validate some assumption since I can't debug the envoy source code and there's not enough log to show why exactly is the "allow" method returnning false.
However I find myself cannot update the servicerolebinding. It resumes to its orriginal version everytime right after I finish editing it.
I'm stuck here for more than 1 week. So any tip/advice is welcome on below questions!
# Please help advice my query below
I find that there's this istio-galley validatingAdmissionConfig map that monitors these istio rbac resources.
```
[root@gke-client-tf leilichao]# k get validatingwebhookconfigurations istio-galley -oyaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
creationTimestamp: "2020-08-04T15:00:59Z"
generation: 1
labels:
app: galley
chart: galley
heritage: Tiller
istio: galley
release: istio
name: istio-galley
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: istio-galley
uid: 11fef012-4145-49ac-a43c-2e1d0a460ea4
resourceVersion: "22484680"
selfLink: /apis/admissionregistration.k8s.io/v1beta1/validatingwebhookconfigurations/istio-galley
uid: 6f485e28-3b5a-4a3b-b31f-a5c477c82619
webhooks:
- admissionReviewVersions:
- v1beta1
clientConfig:
caBundle:
.
.
.
service:
name: istio-galley
namespace: istio-system
path: /admitpilot
port: 443
failurePolicy: Fail
matchPolicy: Exact
name: pilot.validation.istio.io
namespaceSelector: {}
objectSelector: {}
rules:
- apiGroups:
- config.istio.io
apiVersions:
- v1alpha2
operations:
- CREATE
- UPDATE
resources:
- httpapispecs
- httpapispecbindings
- quotaspecs
- quotaspecbindings
scope: '*'
- apiGroups:
- rbac.istio.io
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
scope: '*'
- apiGroups:
- authentication.istio.io
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
scope: '*'
- apiGroups:
- networking.istio.io
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- destinationrules
- envoyfilters
- gateways
- serviceentries
- sidecars
- virtualservices
scope: '*'
sideEffects: Unknown
timeoutSeconds: 30
- admissionReviewVersions:
- v1beta1
clientConfig:
caBundle:
.
.
.
service:
name: istio-galley
namespace: istio-system
path: /admitmixer
port: 443
failurePolicy: Fail
matchPolicy: Exact
name: mixer.validation.istio.io
namespaceSelector: {}
objectSelector: {}
rules:
- apiGroups:
- config.istio.io
apiVersions:
- v1alpha2
operations:
- CREATE
- UPDATE
resources:
- rules
- attributemanifests
- circonuses
- deniers
- fluentds
- kubernetesenvs
- listcheckers
- memquotas
- noops
- opas
- prometheuses
- rbacs
- solarwindses
- stackdrivers
- cloudwatches
- dogstatsds
- statsds
- stdios
- apikeys
- authorizations
- checknothings
- listentries
- logentries
- metrics
- quotas
- reportnothings
- tracespans
scope: '*'
sideEffects: Unknown
timeoutSeconds: 30
```
Here's something I don't understand
## I cannot update the ServiceRoleBinding even after I deleted the validating webhook
I tried to delete this webhook to update the servicerolebinding. The resource resumes right after I save it.
## Is there some kind of cache in galley that mixer uses to distribute the config
I can't find any relevent log that indicates the rbac.istio.io resource is protected/validated by some service in the istio-system namespace.
## How can I get the log of the MIXER
I need to understand which component exactly controls the policy
## Most importantly How do I debug the envoy container
I need to debug the envoy app to understand why it's returnning false for the allow function.
If we can not debug it easily. Is there a document that lets me update the code to add more log and build a new image to GCR so I can have another run and based on the log to see what's going on behind the scene.