我们的集群中有3台kafka机器,
kafka版本- 0.10.0.2.6 ,
和3个zookeeper服务器版本- 3.4.6
我们有一个问题,其中一个kafka代理无法启动,这似乎是因为索引文件损坏了
我们注意到每台kafka机器上的kafka日志(/var/log/kafka/server.log)如下所示指示约数千个损坏的索引文件
来自server.log的示例
[2019-02-25 12:34:44,907] INFO Completed load of log topic.pop.control.gtp.enrichment-38 with 14 log segments and log end offset 200458117 in 1583 ms (kafka.log.Log)
[2019-02-25 12:34:45,044] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index) has non-zero size but the last offset is 8068079 which is no larger than the base offset 8068079.}. deleting /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.timeindex, /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:45,217] INFO Recovering unflushed segment 8068079 in log topic.pop.control.gtp.state-50. (kafka.log.Log)
[2019-02-25 12:34:45,255] INFO Completed load of log topic.pop.control.gtp.state-50 with 6 log segments and log end offset 8095839 in 347 ms (kafka.log.Log)
[2019-02-25 12:34:45,261] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index) has non-zero size but the last offset is 1979940988 which is no larger than the base offset 1979940988.}. deleting /var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.timeindex, /var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:47,607] INFO Recovering unflushed segment 1979940988 in log topic.pop.pri.wnr-38. (kafka.log.Log)
[2019-02-25 12:34:48,872] INFO Completed load of log topic.pop.pri.wnr-38 with 21 log segments and log end offset 1980403224 in 3617 ms (kafka.log.Log)
[2019-02-25 12:34:48,935] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index) has non-zero size but the last offset is 216947511 which is no larger than the base offset 216947511.}. deleting /var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.timeindex, /var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:52,436] INFO Recovering unflushed segment 216947511 in log topic.pop.control.gtp-88. (kafka.log.Log)
[2019-02-25 12:34:54,508] INFO Completed load of log topic.pop.control.gtp-88 with 21 log segments and log end offset 217830559 in 5635 ms (kafka.log.Log)
[2019-02-25 12:34:54,531] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.timeindex, /var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index and rebuilding index... (kafka.log.Log)
[2019-02-25 12:34:57,540] INFO Recovering unflushed segment 0 in log topic.pop.pri.lop-10. (kafka.log.Log)
索引文件损坏的示例
/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index
/var/kafka/kafka-logs/topic.pop.pri.wnr-38/00000000001979940988.index
/var/kafka/kafka-logs/topic.pop.control.gtp-88/00000000000216947511.index
/var/kafka/kafka-logs/topic.pop.pri.lop-10/00000000000000000000.index
什么是删除损坏的索引文件的正确方法?
一种选择是从servcer.log(在每台kafka机器上)找到损坏的索引文件并列出,然后在每个kafka代理上将其删除为
rm -f /var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index
但是这种方法不能保证我们的日志– server.log包括所有损坏的索引文件,因此也许还有更多损坏的索引文件没有在日志中提及! 那么如何查找所有被命令或其他显示所有损坏的索引文件的语法损坏的文件?
我认为,如果我们有此列表,则可以在bash中创建简单的脚本,该脚本将在列表上运行并自动删除文件
答案 0 :(得分:1)
在启动时,Kafka将自动重建所有看起来已损坏的索引文件。您可以在日志行中看到“ rebuild index”:
由于需求失败而发现了损坏的索引文件:找到了损坏的索引,索引文件(/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index)的大小非零但最后一个偏移量是8068079,不大于基本偏移量8068079。}。删除/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.timeindex,/var/kafka/kafka-logs/topic.pop.control.gtp.state-50/00000000000008068079.index和重建索引 ...
当Kafka不能完全关闭时,您通常会获得“损坏的”索引