Samza任务未在一个分区上收到

时间:2016-04-25 23:41:38

标签: apache-kafka apache-samza

我的一个samza任务有一个令人费解的问题。它正常工作,除了一个分区上的消息。我在这个主题上有9个分区。如果我发送1000封邮件,我只会收到大约890封邮件。

我已经使用分区密钥检查了kafka-console-consumer,我知道我的samza作业没有处理,而控制台消费者确实看到消息,所以我知道它已被写入这个主题,至少一个香草消费者可以很好地看到它。

我已经在samza上启用了调试日志记录,org.apache.samza.checkpoint.kafka.KafkaCheckpointManager中有很多消息说:

  

添加检查点检查点[offsets = {SystemStreamPartition [kafka,   com.mycompany.indexing.document,4] = 448}] for taskName Partition   4

分区4总是说448.分区0有类似的日志,但是它说448,它是一个稳定增长的数字。

我很高兴分享任何有趣的配置信息可以帮助缩小范围,但是现在,我对我甚至分享的内容感到有点神秘。

我的运行时为ThreadJobFactory

  • samza-kafka_2.10版本0.9.1

  • 客户端上的kafka_2.10版本0.8.2.1

  • kafka broker 0.9.0.0

更新

我使用相同的分区键查看了上游samza作业,并在上游的分区4上发现了问题。使用kafkacat检查samza检查点主题,我看到分区4的检查点没有前进。首先我看到:

{"SystemStreamPartition [kafka, resource.mutation, 6]":{"system":"kafka","partition":"6","offset":"96639","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 3]":{"system":"kafka","partition":"3","offset":"47135","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 0]":{"system":"kafka","partition":"0","offset":"49476","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 4]":{"system":"kafka","partition":"4","offset":"2556","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 8]":{"system":"kafka","partition":"8","offset":"62263","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 1]":{"system":"kafka","partition":"1","offset":"52151","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 7]":{"system":"kafka","partition":"7","offset":"58081","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 5]":{"system":"kafka","partition":"5","offset":"47712","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 2]":{"system":"kafka","partition":"2","offset":"45831","stream":"resource.mutation"}}
% Reached end of topic __samza_checkpoint_ver_1_for_resource-normalizer_1 [0] at offset 81713

然后一分钟后我看到了:

{"SystemStreamPartition [kafka, resource.mutation, 6]":{"system":"kafka","partition":"6","offset":"96624","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 3]":{"system":"kafka","partition":"3","offset":"47115","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 0]":{"system":"kafka","partition":"0","offset":"49462","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 4]":{"system":"kafka","partition":"4","offset":"2556","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 8]":{"system":"kafka","partition":"8","offset":"62252","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 1]":{"system":"kafka","partition":"1","offset":"52134","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 7]":{"system":"kafka","partition":"7","offset":"58063","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 5]":{"system":"kafka","partition":"5","offset":"47696","stream":"resource.mutation"}}
{"SystemStreamPartition [kafka, resource.mutation, 2]":{"system":"kafka","partition":"2","offset":"45817","stream":"resource.mutation"}}
% Reached end of topic __samza_checkpoint_ver_1_for_resource-normalizer_1 [0] at offset 81722

该数字不会超过2556.但是,查看分区4上resource.mutation的实际主题,最后一个偏移量与其他偏移量相似,截至目前为止约为61000且不断增长。

根本没有错误消息或警告消息。它只是停止从分区4消耗。

1 个答案:

答案 0 :(得分:2)

问题是,有一条消息超过了kafka消费者的默认max.message.bytes。但是,不是提供任何类型的错误消息,负责使用该分区的线程将只挂在该消息上。其他分区线程将继续愉快。

一旦我们将systems.kafka.consumer.fetch.message.max.bytes配置为一个足够大的值来消耗分区上的每条消息并重新启动作业,它就会从它停止的地方开始,并且一切都按预期开始工作。