Question

我可以使用以下Prometheus作业配置从Kubernetes服务中抓取Prometheus指标：

- job_name: 'prometheus-potapi'
  static_configs:
  - targets: ['potapi-service.potapi:1234']

它使用Kubernetes DNS，并提供了我用于服务的三个Pod中任何一个的指标。

我希望看到每个吊舱的结果。

使用此配置，我可以看到我想要的数据：

- job_name: 'prometheus-potapi-pod'
  static_configs:
  - targets: ['10.1.0.126:1234']

我已经使用Prometheus中提供的服务发现机制进行了搜索和试验。不幸的是，我不知道应该如何设置。如果您不知道service discovery reference的工作原理，它并没有真正的帮助。

我正在寻找一个示例，其中使用IP号的作业被某种服务发现机制代替。指定IP足以让我看到所需的数据已公开。

我要从所有场景中提取指标的Pod位于同一命名空间potapi中。

指标始终通过同一端口1234公开。

最后，所有文件都这样命名：

potapi-deployment-754d96f855-lkh4x
potapi-deployment-754d96f855-pslgg
potapi-deployment-754d96f855-z2zj2

当我这样做

kubectl describe pod potapi-deployment-754d96f855-pslgg -n potapi

我得到这个描述：

Name:           potapi-deployment-754d96f855-pslgg
Namespace:      potapi
Node:           docker-for-desktop/192.168.65.3
Start Time:     Tue, 07 Aug 2018 14:18:55 +0200
Labels:         app=potapi
  pod-template-hash=3108529411
Annotations:    <none>
Status:         Running
IP:             10.1.0.127
Controlled By:  ReplicaSet/potapi-deployment-754d96f855
Containers:
  potapi:
    Container ID:   docker://72a0bafbda9b82ddfc580d79488a8e3c480d76a6d17c43d7f7d7ab18458c56ee
    Image:          potapi-service
    Image ID:       docker://sha256:d64e94c2dda43c40f641008c122e6664845d73cab109768efa0c3619cb0836bb
    Ports:          4567/TCP, 4568/TCP, 1234/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 07 Aug 2018 14:18:57 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4fttn (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  default-token-4fttn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4fttn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
  node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

考虑到这些先决条件，您将如何重写工作定义？

Answer 1

他们在这里使用example.io/scrape=true（如果不是/metrics，则使用scrape port和scrape path的类似注释），这就是实现“自动发现”部分的方式

如果您将该注释（以及Prom配置中的相关配置摘要）应用于Service，则Prom将在Service上抓取端口和路径，这意味着您将获得统计信息Service本身，而不是其后面的各个端点。同样，如果标记了Pod，则将收集Pod的度量，但是需要汇总它们以对事务状态进行跨Pod的查看。可以自动发现多种不同的资源类型，包括node和ingress。它们的行为都相似。

除非您对Prom实例有严重的CPU或存储问题，否则我绝对不会枚举配置中的scrape目标：我将使用scrape批注，这意味着您可以更改被刮擦的人，哪个端口等，而不必每次都重新配置舞会。

请注意，如果您想按原样使用其示例，并且希望从kubernetes资源YAML中应用这些注释，请确保引用: 'true'值，否则YAML会将其推广为布尔文字，而kubernetes注释只能是字符串值。

在命令行中应用注释会很好：

kubectl annotate pod -l app=potapi example.io/scrape=true

（顺便说一句，他们在示例中使用example.io/，但是该字符串没有什么特别之处，除了它在scrape部分的名称空间中，以防止它与名为scrape的其他内容冲突。因此，如果您希望避免在集群中使用名为example.io/的奇怪名称，请随时使用组织的名称空间。

Answer 2

我最终得到了这个解决方案：

...

- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__address__]
    action: replace
    regex: ([^:]+)(?::\d+)?
    replacement: $1:1234
    target_label: __address__

...

有两个部分。

检查值为prometheus.io/scrape的注释'true'。它是在第一个source_labels中完成的。 prometheus_io_scrape可能会转化为prometheus.io/scrape
获取地址并向其中添加所需的端口。它是在第二个source_labels上完成的。将向__address__源查询主机名或IP地址。在这种情况下，将使用隐秘的正则表达式([^:]+)(?::\d+)?找到一个ip号。我要使用的端口是“ 1234”，因此我在replacement:中对其进行了硬编码，结果是 __address__现在将包含带有格式为1234的端口10.1.0.172:1234附加的Pod的IP，其中10.1.0.172是找到的IP编号。

在Prometheus中使用此配置，我应该能够找到带有正确注释的吊舱。

然后应在何处添加注释？我最终在Kubernetes部署模板描述中添加了它。

完整的部署说明如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: potapi-deployment
  namespace: potapi
  labels:
    app: potapi
spec:
  replicas: 3
  selector:
    matchLabels:
      app: potapi
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
      labels:
        app: potapi
    spec:
      containers:
      - name: potapi
        image: potapi-service
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 4567
          name: service
        - containerPort: 1234
          name: metrics

有趣的注释已添加到template部分

无法从广告连播中抓取指标

2 个答案: