Prometheus监视kubernetes节点

时间:2019-05-15 15:53:41

标签: kubernetes prometheus amazon-eks aws-eks

我正在尝试通过以下scrape_configs

来设置prometheus来监视kubernetes节点
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
    - api_servers:
      - 'https://kubernetes.default.svc.cluster.local'
      in_cluster: true
      role: node

    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - target_label: __scheme__
      replacement: https

node-exporter

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    name: node-exporter
spec:
  template:
    metadata:
      labels:
        name: node-exporter
      annotations:
         prometheus.io/scrape: "true"
         prometheus.io/port: "9100"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
        - ports:
            - containerPort: 9100
              protocol: TCP
          resources:
            requests:
              cpu: 0.15
          securityContext:
            privileged: true
          image: prom/node-exporter:v0.15.2
          args:
            - --path.procfs
            - /host/proc
            - --path.sysfs
            - /host/sys
            - --collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc)($|/)"'
          name: node-exporter
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

使用此配置,prometheus pod失败,并显示:

level=error ts=2019-05-15T15:18:45.472Z caller=main.go:717 
err="error loading config from \"/etc/prometheus/prometheus.yml\": 
couldn't load configuration (--config.file=\"/etc/prometheus/prometheus.yml\"): parsing YAML file 
/etc/prometheus/prometheus.yml: yaml: unmarshal errors:\n  line 33: field api_servers not found in type kubernetes.plain\n  
line 35: field in_cluster not found in type kubernetes.plain"

更新:

scrape_configs更正为

  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
    - role: endpoints
    scheme: http
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https

  - job_name: 'kubernetes-nodes'
    scheme: http
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics

普罗米修斯用户界面中未显示节点指标

1 个答案:

答案 0 :(得分:0)

首先,what's wrong with just omitting the api server是因为这是默认行为?您不是自定义,而是在生成错误消息

第二,reading the fine manual有什么问题,它明确表示 api_server: 而不是复数(如果有多个,甚至意味着什么?!)

第三,有{strong>太多 mechanismsinstall个有效的普罗米修斯,为什么不从他们提供的内容中学习,即使您最终没有使用他们吗?