Question

我们在数据中心运行多个Prometheus实例（我将它们称为DC Prometheus实例），另外还有一个Prometheus实例（我们在下文中将其称为“main”），我们从DC收集指标Prometheus实例使用联合功能。

Main Prometheus正在从自身中抓取{job ='prometheus'}值，但也来自DC Prometheus实例（每次从localhost：9090抓取）。

问题是Main prometheus抱怨无序样品：

警告[1585]摄取无序样本时出错numDropped = 369 source = target.go：475 target = dc1-prometheus：443

我发现这是因为在'match []'参数中包含{job="prometheus"}。

我正在尝试通过标签重新标记来解决这个问题，但是当我尝试使用单个DC Prometheus并且不断更换时，我无法使其工作（我仍然会出现无序样本错误），我甚至不知道在使用多个目标时要用什么作为替代品。

  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/prometheus/federate'
    scheme: 'https'

    params:
      'match[]':
        - '{job="some-jobs-here..."}'
        - '{job="prometheus"}'

    relabel_configs:
    - source_labels: ['instance']
      target_label: 'instance'
      regex: 'localhost:9090'
      replacement: '??' # I've tried with 'dc1-prometheus:9090' and single target only.. no luck

    target_groups:
      - targets:
        - 'dc1-prometheus'
        - 'dc2-prometheus'
        - 'dc3-prometheus'

我的问题是如何使用relabel_configs来摆脱无序错误。我到处都在使用Prometheus 0.17。

Answer 1

您需要在此处执行的操作是在每个数据中心Prometheus服务器上指定唯一的external_labels。这将导致他们在/federate端点上添加这些标签，并防止您遇到的碰撞时间序列。

关于联邦普罗米修斯的博客文章在这样的案例中有一个例子：http://www.robustperception.io/scaling-and-federating-prometheus/

（我应该补充一点，relabel_configs无法帮助您，因为这只会更改目标标签。metric_relabel_configs会改变从刮擦中返回的内容。请参阅http://www.robustperception.io/life-of-a-label/）< / p>

如何使用联合从多个Prometheus实例（每个使用instance =“localhost：9090”）收集Prometheus的指标

1 个答案: