普罗米修斯 - Kubernetes RBAC

时间:2017-04-07 23:22:53

标签: kubernetes google-kubernetes-engine prometheus

我将我的GKE API服务器升级到1.6,并且正在将节点升级到1.6,但遇到了麻烦......

我有一个prometheus服务器(版本1.5.2)在由Kubernetes部署管理的pod中运行,其中一些节点运行1.5.4版Kubelet,一个新节点运行1.6。

Prometheus无法连接到新节点 - 它的指标端点正在返回401 Unauthorized。

这似乎是一个RBAC问题,但我不确定如何继续。我无法找到有关Prometheus服务器所需角色的文档,甚至无法找到如何将它们授予服务器的文档。

从coreos / prometheus-operator repo我能够拼凑出我可能希望工作的配置:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
secrets:
- name: prometheus-token-xxxxx

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: prometheus-prometheus
    component: server
    release: prometheus
  name: prometheus-server
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus-prometheus
        component: server
        release: prometheus
    spec:
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: prometheus
      serviceAccountName: prometheus
      ...

但普罗米修斯仍然有401人。

更新:看起来像约旦所说的kubernetes身份验证问题。在这里看到新的,更集中的问题; https://serverfault.com/questions/843751/kubernetes-node-metrics-endpoint-returns-401

3 个答案:

答案 0 :(得分:2)

401表示未经身份验证,这意味着它不是RBAC问题。我相信GKE不再允许匿名访问1.6中的kubelet。您使用哪些凭据对kubelet进行身份验证?

答案 1 :(得分:1)

这就是我为角色定义和绑定所做的工作。



apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default




答案 2 :(得分:0)

根据@ JorritSalverda票的讨论; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099

由于GKE不允许您获得允许您使用kubelet对自己进行身份验证的客户端证书,因此GKE用户的最佳解决方案似乎是使用kubernetes API服务器作为对节点的代理请求。 / p>

要做到这一点(引用@JorritSalverda);

"对于在GKE中运行的我的Prometheus服务器,我现在使用以下重新标记运行它:

relabel_configs:
- action: labelmap
  regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
  replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
  replacement: https
- source_labels: [__meta_kubernetes_node_name]
  regex: (.+)
  target_label: __metrics_path__
  replacement: /api/v1/nodes/${1}/proxy/metrics

以下ClusterRole绑定到Prometheus使用的服务帐户:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]

由于在RBAC失败的情况下GKE集群仍然具有ABAC回退,因此我不能100%确定这包括所有必需的权限。