首先,我想清楚地了解一些事情,如果我在kubernetes集群中运行telegraf守护程序集,它将收集pod的指标?还是会收集物理节点的指标?
我已经基于this kubernetes集群安装,在在hyperv下在笔记本电脑上运行的测试kubernetes集群中创建了telegraf守护程序集:
我想收集Pod的指标,但是没有到达kafka机器。我在日志中收到此错误:
2019-05-08T02:36:35Z I! Starting Telegraf 1.9.2
2019-05-08T02:36:35Z I! Using config file: /etc/telegraf/telegraf.conf
2019-05-08T02:46:36Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)'
这是守护程序定义文件:
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
data:
telegraf.conf: |+
[global_tags]
env = "$ENV"
[agent]
hostname = "$HOSTNAME"
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "2s"
precision = ""
debug = false
quiet = true
logfile = ""
[[outputs.kafka]]
brokers = ["10.121.63.5:9092", "10.121.63.18:9092", "10.121.62.64:9092", "10.121.62.80:9092", "10.121.63.22:9092"]
topic = "telegraf-measurements-json"
client_id = "golangsarama__1.18.0__serverinfra__telegraf"
routing_tag = "host"
version = "0.11.0.2"
compression_codec = 2
required_acks = 1
data_format = "json"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
[[inputs.kubernetes]]
url = "https://192.168.213.18:6443"
insecure_skip_verify = true
---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
spec:
selector:
matchLabels:
name: telegraf
template:
metadata:
labels:
name: telegraf
spec:
containers:
- name: telegraf
image: docker.io/telegraf:1.9.2
resources:
limits:
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: "HOST_PROC"
value: "/rootfs/proc"
- name: "HOST_SYS"
value: "/rootfs/sys"
- name: ENV
valueFrom:
secretKeyRef:
name: telegraf
key: env
volumeMounts:
- name: sys
mountPath: /rootfs/sys
readOnly: true
- name: proc
mountPath: /rootfs/proc
readOnly: true
- name: docker-socket
mountPath: /var/run/docker.sock
- name: utmp
mountPath: /var/run/utmp
readOnly: true
- name: config
mountPath: /etc/telegraf
terminationGracePeriodSeconds: 30
volumes:
- name: sys
hostPath:
path: /sys
- name: docker-socket
hostPath:
path: /var/run/docker.sock
- name: proc
hostPath:
path: /proc
- name: utmp
hostPath:
path: /var/run/utmp
- name: config
configMap:
name: telegraf
This是我创建守护程序集所遵循的文章。
这是豆荚:
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65f88748fd-jztrz 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-rl48l 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-x8fvx 1/1 Running 0 7d18h
kube-system etcd-k8s-master 1/1 Running 2 7d18h
kube-system kube-apiserver-k8s-master 1/1 Running 2 7d18h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-96tsl 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-b884r 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-pdqmq 1/1 Running 0 7d18h
kube-system kube-proxy-42k2g 1/1 Running 0 7d18h
kube-system kube-proxy-77pw9 1/1 Running 0 7d18h
kube-system kube-proxy-n5mbs 1/1 Running 0 7d18h
kube-system kube-scheduler-k8s-master 1/1 Running 2 7d18h
monitoring telegraf-dvtcl 1/1 Running 5 117m
monitoring telegraf-n2mqz 1/1 Running 5 117m
tcpdump显示从守护程序发送的内容:
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
z#0........................
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
但是我在grafana仪表板上看不到任何东西。 如果我在节点上安装了一个基于rpm的独立telelegraf,它就会发送出去,并且我可以看到这些指标。但是我很好奇pod指标。
答案 0 :(得分:0)
Telegraf的此错误仅表示在配置中,您的代理数组中的代理的10类IP地址范围未建立连接。根据您设置网络和路由的方式,您可能只是简单地遇到了拥有Kafka群集的私有IP的路由问题。