Question

我是prometheus / alertmanager的新手。

我创建了一个每分钟执行shell脚本的cron作业。此shell脚本在同一目录中生成“test.prom”文件（其中包含一个度量标准），该目录分配给--textfile.collector.directory参数（到节点导出器）。我验证了（使用curl http://localhost:9100/metrics）节点导出器正确公开了该自定义指标。

当我尝试在prometheus仪表板中针对该自定义指标运行查询时，它不会显示任何结果（它表示未找到任何数据）。

我无法弄清楚为什么针对通过node-exporter文本文件收集器公开的指标的查询失败。 我错过了什么线索？另请告诉我如何检查并确保prometheus刮掉我的自定义指标'test_metric`？

我在prometheus仪表板中的查询是test_metric != 0（在prometheus仪表板中），但没有给出任何结果。但我通过node-exporter textfile公开了test_metric。

任何帮助表示赞赏!!

顺便说一句，节点导出器在Kubernetes环境中作为docker容器运行。

Answer 1

这是我的坏事。我没有在prometheus.yaml文件中包含节点导出器的scrape说明。它包括在内后就起作用了。

Answer 2

我有类似的情况，但这不是配置问题。

相反，我的数据包含时间戳：

# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 53.87 1541426242
network_connectivity_rtt{host="hop_1"} 58.8 1541426242
network_connectivity_rtt{host="hop_2"} 21.93 1541426242
network_connectivity_rtt{host="hop_3"} 71.69 1541426242

PNE在重新加载它们后就没问题了。由于prometheus在systemd下运行，因此我必须检查以下日志：

journalctl --system -u prometheus.service --follow

我在这里读到了这一行：

msg="Error on ingesting samples that are too old or are too far into the future"

删除时间戳记后，值开始出现。这使我更详细地了解了时间戳，我发现它们必须在 milliseconds 之内。因此现在可以使用这种格式：

# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 50.47 1541429581376
network_connectivity_rtt{host="hop_1"} 3.38 1541429581376
network_connectivity_rtt{host="hop_2"} 11.2 1541429581376
network_connectivity_rtt{host="hop_3"} 20.72 1541429581376

我希望它可以帮助其他人。

Answer 3

由于陈旧的指标，此问题正在发生。假设您已在文件13.00中写入了指标默认情况下，5分钟后，普罗米修斯会认为您的指标已过时，并且在您进行查询时可能会从那里消失。

查询通过prometheus节点导出器文本文件收集器公开的自定义指标失败

3 个答案: