心跳会话已过期,标志协调器已死

时间:2019-01-30 04:47:45

标签: python apache-kafka

当我使用python-kafka(1.4.4)库使用消息时(Kafka版本1.1.0,Python 3.7)。它一次又一次抛出此错误。我不知道哪里出了问题,这里是我的Python代码,消费者姓名缩写:

consumer = KafkaConsumer('dolphin-spider-google-book-bookinfo',
                         bootstrap_servers=['mq-server:9092'],
                         group_id = "google-book",
                         client_id = "dolphin-pipline-google-bookinfo-consumer-foolman",
                         # Manage kafka offsets manual
                         enable_auto_commit = False,
                         consumer_timeout_ms=50000,
                         # consume from beginning
                         auto_offset_reset = "earliest",
                         max_poll_interval_ms =350000,
                         session_timeout_ms = 60000,
                         request_timeout_ms = 700000
                         ) 

这是消费逻辑:

def consume_bookinfo(self):
        while True:
            try:
                for books in self.consumer:
                    logger.info("Get books info offset: %s" ,books.offset)                    
                    self.sub_process_handle(books.value,books.offset)                    
            except Exception as e:
                logger.error(e)

    def sub_process_handle(self,bookinfo,offset):     
        number_of_threadings = len(threading.enumerate())
        if(number_of_threadings < 13):
            t = threading.Thread(target=self.background_process,name="offset-" + str(offset), args=(bookinfo,), kwargs={})
            t.start()
        else:
            # If all threading running
            # Using main thread to handle
            # Slow down kafka consume speed
            logger.info("Reach max handle thread,sleep 20s to wait thread release...")
            time.sleep(20)            
            self.sub_process_handle(bookinfo,offset)

    def background_process(self,bookinfo):        
        self.parse_bookinfo(bookinfo)
        self.consumer.commit_async(callback=self.offset_commit_result) 

我启动多线程来处理消耗逻辑。但是运行一段时间,会抛出此错误:

2019-01-30 02:46:52,948 - /home/dolphin/source/dolphin-pipline/dolphin/biz/spider_bookinfo_consumer.py[line:37] - INFO: Get books info offset: 9304
2019-01-30 02:46:52,948 - /home/dolphin/source/dolphin-pipline/dolphin/biz/spider_bookinfo_consumer.py[line:51] - INFO: Reach max handle thread,sleep 20s to wait thread release...
2019-01-30 02:47:12,968 - /home/dolphin/source/dolphin-pipline/dolphin/biz/spider_bookinfo_consumer.py[line:61] - INFO: commit offset success,offsets: {TopicPartition(topic='dolphin-spider-google-book-bookinfo', partition=0): OffsetAndMetadata(offset=9305, metadata='')}
2019-01-30 04:27:47,322 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:964] - WARNING: Heartbeat session expired, marking coordinator dead
2019-01-30 04:27:47,323 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:698] - WARNING: Marking the coordinator dead (node 0) for group google-book: Heartbeat session expired.
2019-01-30 04:27:47,433 - /usr/local/lib/python3.5/site-packages/kafka/cluster.py[line:353] - INFO: Group coordinator for google-book is BrokerMetadata(nodeId=0, host='35.229.69.193', port=9092, rack=None)
2019-01-30 04:27:47,433 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:676] - INFO: Discovered coordinator 0 for group google-book
2019-01-30 04:27:47,433 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/consumer.py[line:341] - INFO: Revoking previously assigned partitions {TopicPartition(topic='dolphin-spider-google-book-bookinfo', partition=0)} for group google-book
2019-01-30 04:27:47,433 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:434] - INFO: (Re-)joining group google-book
2019-01-30 04:27:47,437 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:504] - INFO: Elected group leader -- performing partition assignments using range
2019-01-30 04:27:47,439 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/base.py[line:333] - INFO: Successfully joined group google-book with generation 470
2019-01-30 04:27:47,439 - /usr/local/lib/python3.5/site-packages/kafka/consumer/subscription_state.py[line:257] - INFO: Updated partition assignment: [TopicPartition(topic='dolphin-spider-google-book-bookinfo', partition=0)]
2019-01-30 04:27:47,439 - /usr/local/lib/python3.5/site-packages/kafka/coordinator/consumer.py[line:238] - INFO: Setting newly assigned partitions {TopicPartition(topic='dolphin-spider-google-book-bookinfo', partition=0)} for group google-book
2019-01-30 04:27:47,694 - /home/dolphin/source/dolphin-pipline/dolphin/biz/spider_bookinfo_consumer.py[line:63] - ERROR: commit offset failed,detail: CommitFailedError: Commit cannot be completed since the group has already
            rebalanced and assigned the partitions to another member.
            This means that the time between subsequent calls to poll()
            was longer than the configured max_poll_interval_ms, which
            typically implies that the poll loop is spending too much
            time message processing. You can address this either by
            increasing the rebalance timeout with max_poll_interval_ms,
            or by reducing the maximum size of batches returned in poll()
            with max_poll_records.

2019-01-30 04:27:47,694 - /home/dolphin/source/dolphin-pipline/dolphin/biz/spider_bookinfo_consumer.py[line:63] - ERROR: commit offset failed,detail: CommitFailedError: Commit cannot be completed since the group has already
            rebalanced and assigned the partitions to another member.
            This means that the time between subsequent calls to poll()
            was longer than the configured max_poll_interval_ms, which
            typically implies that the poll loop is spending too much
            time message processing. You can address this either by
            increasing the rebalance timeout with max_poll_interval_ms,
            or by reducing the maximum size of batches returned in poll()
            with max_poll_records.

如何避免此问题?我该怎么办?

2 个答案:

答案 0 :(得分:2)

首先让我们看看造成此错误的原因。正如官方的kafka消费者文档(here中所讨论的那样),kafka在呼叫buf

时会检测到已连接的消费者

订阅了一组主题后,在调用poll(Duration)时,消费者将自动加入该组。轮询API旨在确保消费者的活力。只要您继续呼叫民意调查,使用者就将留在组中并继续从为其分配的分区中接收消息。使用者在掩体下定期向服务器发送心跳。如果使用者在session.timeout.ms期间崩溃或无法发送心跳,则该使用者将被视为已死,并且将重新分配其分区。

因此,要保留在组中,您必须继续致电民意调查。 poll()设置声明消费者可以在不呼叫max.poll.interval.ms的情况下在组中停留多长时间。 每次对poll()的调用都会返回许多记录(默认为500),然后在poll()中进行遍历。完成对所有返回记录的处理后,下一次调用for message in consumer

如果程序处理记录poll()的时间过长,您将被踢出该组。

这是您可以做的:

  1. 增加max.poll.interval.ms
  2. 减少max.poll.interval.ms

在@Dolphin答案中,他实际上是将max.poll.records减小为1。我更喜欢这样:

max.poll.records

重要的部分是self.consumer = kafka.KafkaConsumer(topic, bootstrap_servers='servers:ports', group_id='group_id',max_poll_records=1,max_poll_interval_ms=300000) 。当然,您可能希望将其设置为大于1的值。

答案 1 :(得分:0)

调整消费者调查功能:

def consume_bookinfo(self):
        while True:
            try:
                msg_pack = self.consumer.poll(timeout_ms=5000,max_records=1)
                for messages in msg_pack.items():
                    for message in messages:
                        var_type = type(message)
                        if(isinstance(message,TopicPartition)):
                            logger.info("TopicPartition: %s", TopicPartition)
                        if(var_type == list):
                            for consumer_record in message:
                                #for books in self.consumer.poll(max_records = 5):
                                logger.info("Get books info offset: %s" ,consumer_record.offset)                    
                                self.sub_process_handle(consumer_record.value,consumer_record.offset)                    
            except Exception as e:
                logger.error(e)

这对我来说很好!!!!!!!!!!!!!!!!