卡夫卡经纪人由于堆OOM而脱机

时间:2018-11-08 02:04:15

标签: apache-kafka

我们最近发现我们的kafka集群在生产环境中脱机 有四个代理,replicationFactor是2,KAFKA_HEAP_OPTS是-Xmx30G -Xms30G

Server.log:

./controller.log.2018-10-18-12:[2018-10-18 12:05:51,300] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-12:[2018-10-18 12:42:43,576] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:00:54,919] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:12:26,598] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:24:22,851] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:29:09,095] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:33:14,948] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:37:45,249] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:43:55,640] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:48:53,711] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:51:29,411] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-13:[2018-10-18 13:57:27,588] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-14:[2018-10-18 14:03:20,452] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
./controller.log.2018-10-18-14:[2018-10-18 14:06:14,026] INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)

在此之前,集群在几小时前也已经失效了很多ZK

import operator
from functools import reduce

def multi_level_indexing(nested_dict, key_list):
    """Multi level index a nested dictionary, nested_dict through a list of keys in dictionaries, key_list
    """
    return reduce(operator.getitem, key_list, nested_dict)

def filtered_dict(my_dict, filtered_options):
    return {k : v for k, v in my_dict.items() if all(multi_level_indexing(my_dict, [k,f_k]) == f_v for f_k, f_v in filtered_options.items())}

有人可以看看吗?

================================================ ===============

更多详细信息的屏幕截图: 那天有一种不正常的情况如下 Zabbix monitor information for Kafka topic producer volume

某些主题的传入量是平常的一万倍。但是,当我们检查上游生产者的日志时,正常的产量就可以了

0 个答案:

没有答案