我安装了一个minikube kubernetes集群,可通过prometheus对其进行监视。 kubernetes版本是v1.13.4,可直接在vm主机上运行(--vm-driver = none)。我在Prometheus配置文件中添加了一项特定的工作,以便取消cadvisor容器指标。问题在于Prometheus无法从cadvisor端点中删除指标。
我在prometheus.yml中包含了以下配置
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
但是在prometheus目标Web UI(http://my_ip:30900/targets)上,我看到“ kubernetes-cadvisor”处于DOWN状态,并显示以下错误消息
http://kubernetes.default.svc:443/api/v1/nodes/minikube/proxy/metrics/cadvisor: context deadline exceeded
kubernetes.default.svc:443应该是默认的群集DNS,可以从Pod内访问,但是正如我所料,我无法在Prometheus Pod内对其进行ping操作。
幸运的是,我注意到我可以从url成功获取所有cadvisor容器指标:
http://my_dashboard_ip_and_port/api/v1/nodes/minikube/proxy/metrics/cadvisor
Prometheus窗格中的日志是:
kubectl logs prometheus-deployment-6f64ff68f9-8c9xm
level=info ts=2019-03-29T14:33:18.939973334Z caller=main.go:285 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2019-03-29T14:33:18.940326462Z caller=main.go:321 msg="Starting Prometheus" version="(version=2.8.1, branch=HEAD, revision=4d60eb36dcbed725fcac5b27018574118f12fffb)"
level=info ts=2019-03-29T14:33:18.94039376Z caller=main.go:322 build_context="(go=go1.11.6, user=root@bfdd6a22a683, date=20190328-18:04:08)"
level=info ts=2019-03-29T14:33:18.940455316Z caller=main.go:323 host_details="(Linux 4.15.0 #1 SMP Tue Mar 26 02:53:14 UTC 2019 x86_64 prometheus-deployment-6f64ff68f9-8c9xm (none))"
level=info ts=2019-03-29T14:33:18.94050961Z caller=main.go:324 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-03-29T14:33:18.940570849Z caller=main.go:325 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-03-29T14:33:18.941555805Z caller=main.go:640 msg="Starting TSDB ..."
level=info ts=2019-03-29T14:33:18.941946171Z caller=web.go:418 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-03-29T14:33:18.946861683Z caller=main.go:655 msg="TSDB started"
level=info ts=2019-03-29T14:33:18.947193152Z caller=main.go:724 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2019-03-29T14:33:18.948922627Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-29T14:33:18.950164896Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-29T14:33:18.951281382Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-29T14:33:18.952276845Z caller=main.go:751 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2019-03-29T14:33:18.952303937Z caller=main.go:609 msg="Server is ready to receive web requests."
但是我不知道如何正确配置Prometheus yml文件,以便也能够通过Prometheus获取公开的指标。
非常感谢。
答案 0 :(得分:0)
我想您可能必须使用HTTPS模式来针对HTTP抓取请求,该请求似乎仍保留在您的配置中:
assign
要跳过API服务器证书验证,您可以将 destVect.assign(first, last); // contains 2,3,4,5,6,7,8 overwriting whatever was there before
参数添加到现有的- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
中:
insecure_skip_verify: true