Prometheus:无法从连接的Kubernetes集群导出指标

时间:2017-01-25 07:00:06

标签: kubernetes prometheus

问题:我在Kubernetes集群之外有一个普罗米修斯。所以,我想从远程集群中导出指标。

我从Prometheus Github repo获取了配置示例并对其进行了一些修改。所以,这是我的工作配置。

  - job_name: 'kubernetes-apiservers'

    scheme: http

    kubernetes_sd_configs:
    - role: endpoints
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;http

  - job_name: 'kubernetes-nodes'

    scheme: http

    kubernetes_sd_configs:
    - role: node
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)

  - job_name: 'kubernetes-service-endpoints'

    scheme: http

    kubernetes_sd_configs:
    - role: endpoints
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (http?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+)(?::\d+);(\d+)
      replacement: $1:$2
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name

  - job_name: 'kubernetes-services'

    scheme: http

    metrics_path: /probe
    params:
      module: [http_2xx]

    kubernetes_sd_configs:
    - role: service
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
      action: keep
      regex: true
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: blackbox
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_service_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      target_label: kubernetes_name

  - job_name: 'kubernetes-pods'

    scheme: http

    kubernetes_sd_configs:
    - role: pod
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: ${1}:${2}
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

我没有使用与API的TLS连接,因此我想禁用它。

当我从Prometheus主机卷曲/metrics网址时 - 它会打印出来。

最后我连接到群集,但是...工作没有启动,因此Prometheus没有公开重新标记的指标。

我在Console中看到的内容。

enter image description here

enter image description here

目标状态:

enter image description here

我还检查了Prometheus调试。它认为系统获得了任何必要的信息,并且请求成功完成。

time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10055\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10054\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:8080\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"beta.kubernetes.io/arch\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"10.32.0.2\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"cluster-manager.dev.example.net\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"54.194.166.39\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}" 
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"10.43.0.0\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"172.17.101.25\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"10.43.0.0\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}" 

Prometheus提取更新,但是......并没有将它们转换为指标。 所以,我已经打破了我的大脑,弄清楚为什么会这样。所以,请帮助,如果你能找出可能出错的地方。

2 个答案:

答案 0 :(得分:3)

如果要从外部Prometheus服务器监控Kubernetes群集,我建议设置Prometheus federation拓扑:

  • 在K8内部,安装节点导出器盒和具有短期存储空间的Prometheus实例。
  • 通过入口控制器(LB)或节点端口将Prometheus服务暴露在K8s群集之外。您可以使用HTTPS +基本身份验证来保护此端点。
  • 配置中心Prometheus,使用适当的身份验证和标记从上方端点抓取指标。

这是可扩展的解决方案。您可以将监视器添加到所需的多个K8群集中,直到达到中心Prometheus的容量。然后,您可以添加另一个中心Prometheus实例来监视其他实例。

答案 1 :(得分:0)

最后我来到了尽管在集群之外设置Kubernetes集群监控并不是一件容易的事。原因Kubernetes体系结构建议将所有基础结构保留在一个本地网络中。因此,每个解决方法都会变得混乱。 此外,我遇到了一个问题,试图调试为什么所有关于Kubernetes角色的配置作业,如点头,吊舱,服务和端点都会显示在目标状态页面中。我可能认为错了,但我没有找到如何在普罗米修斯中调试此问题。

我在外面监控Kubernetes集群的解决方案是kube-api-exporter。非常简单的Python脚本,它获取有关ds,部署和pod的所有指标,最后提供了获取它们的URL。所以,我建议每个遇到这种集成的人都要找到这个解决方案。

我也开始从etcd获取指标。这很酷,etcd提供了开箱即用的普罗米修斯式指标。

P.S。:感谢FuzzyAmi寻求帮助。