Kubelet的cAdvisor指标端点无法可靠地返回所有指标

时间:2017-09-02 15:58:58

标签: kubernetes prometheus cadvisor

我遇到了cAdvisor的问题,当我查询其指标端点时,并未可靠地返回所有指标。具体来说,通过Prometheus查询container_fs_limit_bytes{device=~"^/dev/.*$",id="/",kubernetes_io_hostname=~"^.*"}通常只显示我的Kubernetes集群中一小部分节点的结果。如果相应的指标未被抓取超过5分钟(由于指标变为stale),就会发生这种情况,但我不确定为什么每次成功查询终端时都不会显示所有指标。

一遍又一遍地卷曲端点显示某些指标仅在特定时间返回,因此上述Prometheus查询只有在最后5分钟内碰到它们时才返回所有节点的数据,但通常情况下事实并非如此。

一种解决方法是在超过5分钟的时间内获取指标的平均值,但这并不理想。

kubectl版本:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

普罗米修斯版本:1.7.1

普罗米修斯配置:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093
    scheme: http
    timeout: 10s
rule_files:
- /etc/prometheus-rules/alert.rules
scrape_configs:
- job_name: kubernetes-nodes
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: node
    namespaces:
      names: []
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: []
    separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: []
    separator: ;
    regex: (.*)
    target_label: __address__
    replacement: kubernetes.default.svc:443
    action: replace
  - source_labels: [__meta_kubernetes_node_name]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}:4194/proxy/metrics
    action: replace
  metric_relabel_configs:
  - source_labels: [id]
    separator: ;
    regex: ^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$
    target_label: rkt_container_name
    replacement: ${2}-${1}
    action: replace
  - source_labels: [id]
    separator: ;
    regex: ^/system\.slice/(.+)\.service$
    target_label: systemd_service_name
    replacement: ${1}
    action: replace

1 个答案:

答案 0 :(得分:2)

这是cAdvisor如何使用Prometheus客户端库的known bug