我们最近发现我们的kafka集群在生产环境中脱机 有四个代理,replicationFactor是2,KAFKA_HEAP_OPTS是-Xmx30G -Xms30G
Server.log:
./controller.log.2018-10-18-12:[2018-10-18 12:05:51,300] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-12:[2018-10-18 12:42:43,576] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:00:54,919] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:12:26,598] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:24:22,851] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:29:09,095] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:33:14,948] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:37:45,249] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:43:55,640] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:48:53,711] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:51:29,411] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:57:27,588] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-14:[2018-10-18 14:03:20,452] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-14:[2018-10-18 14:06:14,026] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
在此之前,集群在几小时前也已经失效了很多ZK
import operator
from functools import reduce
def multi_level_indexing(nested_dict, key_list):
"""Multi level index a nested dictionary, nested_dict through a list of keys in dictionaries, key_list
"""
return reduce(operator.getitem, key_list, nested_dict)
def filtered_dict(my_dict, filtered_options):
return {k : v for k, v in my_dict.items() if all(multi_level_indexing(my_dict, [k,f_k]) == f_v for f_k, f_v in filtered_options.items())}
有人可以看看吗?
================================================ ===============
更多详细信息的屏幕截图: 那天有一种不正常的情况如下 Zabbix monitor information for Kafka topic producer volume
某些主题的传入量是平常的一万倍。但是,当我们检查上游生产者的日志时,正常的产量就可以了