我已经在kubernetes集群(EKS)上部署了普罗米修斯。我可以通过以下
成功抓取prometheus
和traefik
scrape_configs:
# A scrape configuration containing exactly one endpoint to scrape:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['prometheus.kube-monitoring.svc.cluster.local:9090']
- job_name: 'traefik'
static_configs:
- targets: ['traefik.kube-system.svc.cluster.local:8080']
但是部署为DaemonSet
且具有以下定义的node-exporter并未公开节点度量。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
name: node-exporter
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:v0.18.1
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
并在prometheus中遵循scrape_configs
scrape_configs:
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.kube-monitoring.svc.cluster.local:9100
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
我也尝试从其中一个容器中curl http://localhost:9100/metrics
,但是得到了curl: (7) Failed to connect to localhost port 9100: Connection refused
我在这里缺少什么配置?
答案 0 :(得分:1)
以前使用Helm的建议非常有效,我也建议这样做。
关于您的问题:问题是您不是直接刮节点,而是使用node-exporter。因此role: node
是不正确的,您应该改用role: endpoints
。为此,您还需要为DaemonSet的所有Pod创建服务。
这是我的环境(由Helm安装)中的有效示例:
- job_name: monitoring/kube-prometheus-exporter-node/0
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
答案 1 :(得分:0)
您是如何部署Prometheus的?每当我使用helm-chart(https://github.com/helm/charts/tree/master/stable/prometheus)时,都会部署node-exporter。也许这是一个更简单的解决方案。
答案 2 :(得分:0)
我被困在了类似的地方。但是这里我的节点导出器不是 helm 部署的一部分,因为我们从 Tanzu kubernetes grid(k8s 集群)获得了附加节点导出器。所以我创建了服务监视器,现在我可以看到服务发现和计数应该是什么。但在目标部分,它说的是 0/4 计数。无法看到节点的指标,但是当我可以卷曲 localhost:9100/metrics 时,我可以看到数据。有些地方我缺少逻辑。
我检查了 helm 部署的节点导出器数据,它看起来一样,但我在这里遗漏了什么?
请忽略缩进,因为在移动设备中复制粘贴时会遗漏它们。
- job_name: node-exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels:
[__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels:
[__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace