Question

我正在尝试使用Apache Flume构建管道： spooldir - ＆gt; kafka频道 - ＆gt; hdfs sink

事件转到kafka主题没有问题，我可以用kafkacat请求看到它们。但是kafka通道无法通过接收器将文件写入hdfs。错误是：

在等待来自Kafka的数据时超时

完整日志：

2016-02-26 18：25：17,125   （SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread（zoo02：2181））   [调试 -   org.apache.zookeeper.ClientCnxn $ SendThread.readResponse（ClientCnxn.java:717）]   对于sessionid得到ping响应：0ms后的0x2524a81676d02aa

2016-02-26 18：25：19,127   （SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread（zoo02：2181））   [调试 -   org.apache.zookeeper.ClientCnxn $ SendThread.readResponse（ClientCnxn.java:717）]   对于sessionid获得ping响应：1ms后获得0x2524a81676d02aa

2016-02-26 18：25：21,129   （SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread（zoo02：2181））   [调试 -   org.apache.zookeeper.ClientCnxn $ SendThread.readResponse（ClientCnxn.java:717）]   对于sessionid得到ping响应：0ms后的0x2524a81676d02aa

2016-02-26 18：25：21,775   （SinkRunner-PollingRunner-DefaultSinkProcessor）[DEBUG -   org.apache.flume.channel.kafka.KafkaChannel $ KafkaTransaction.doTake（KafkaChannel.java:327）]   在等待来自Kafka的数据时超时   kafka.consumer.ConsumerTimeoutException at   kafka.consumer.ConsumerIterator.makeNext（ConsumerIterator.scala：69）     在   kafka.consumer.ConsumerIterator.makeNext（ConsumerIterator.scala：33）     在   kafka.utils.IteratorTemplate.maybeComputeNext（IteratorTemplate.scala：66）     at kafka.utils.IteratorTemplate.hasNext（IteratorTemplate.scala：58）     在   org.apache.flume.channel.kafka.KafkaChannel $ KafkaTransaction.doTake（KafkaChannel.java:306）     在   org.apache.flume.channel.BasicTransactionSemantics.take（BasicTransactionSemantics.java:113）     在   org.apache.flume.channel.BasicChannelSemantics.take（BasicChannelSemantics.java:95）     在   org.apache.flume.sink.hdfs.HDFSEventSink.process（HDFSEventSink.java:374）     在   org.apache.flume.sink.DefaultSinkProcessor.process（DefaultSinkProcessor.java:68）     在org.apache.flume.SinkRunner $ PollingRunner.run（SinkRunner.java:147）     在java.lang.Thread.run（Thread.java:745）

我的FlUME的配置是：

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c2

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/alex/spoolFlume

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path =  hdfs://10.12.0.1:54310/logs/flumetest/
a1.sinks.k1.hdfs.filePrefix = flume-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text

a1.channels.c2.type   = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.capacity = 10000
a1.channels.c2.transactionCapacity = 1000
a1.channels.c2.brokerList=kafka10:9092,kafka11:9092,kafka12:9092
a1.channels.c2.topic=flume_test_001
a1.channels.c2.zookeeperConnect=zoo00:2181,zoo01:2181,zoo02:2181

# Bind the source and sink to the channel
a1.sources.r1.channels = c2
a1.sinks.k1.channel = c2

使用内存频道代替kafka频道一切正常。

提前感谢任何想法！

Answer 1

ConsumerTimeoutException意味着很长一段时间没有新消息，并不代表Kafka的连接超时。

http://kafka.apache.org/documentation.html

consumer.timeout.ms -1如果在指定的时间间隔后没有消息可供消费，则向消费者抛出超时异常

Answer 2

Kafka的ConsumerConfig类具有“consumer.timeout.ms”配置属性，Kafka默认将其设置为-1。预计任何新的卡夫卡消费者都会以合适的价格覆盖该物业。

以下是Kafka documentation的参考：

consumer.timeout.ms     -1  
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.

当Flume创建Kafka频道时，它将timeout.ms值设置为100，如INFO级别的Flume日志所示。这就解释了为什么我们会看到大量的这些ConsumerTimeoutExceptions。

 level: INFO Post-validation flume configuration contains configuration for agents: [agent]
 level: INFO Creating channels
 level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
 level: INFO Creating instance of channel c1 type org.apache.flume.channel.kafka.KafkaChannel
 level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
 level: INFO Group ID was not specified. Using flume as the group id.
 level: INFO {metadata.broker.list=kafka:9092, request.required.acks=-1, group.id=flume, 
              zookeeper.connect=zookeeper:2181, **consumer.timeout.ms=100**, auto.commit.enable=false}
 level: INFO Created channel c1

通过Kafka channel settings上的Flume用户指南，我尝试通过指定以下内容来覆盖此值，但这似乎不起作用：

agent.channels.c1.kafka.consumer.timeout.ms=5000

此外，我们通过频道不断地对敲击数据进行了负载测试，并且在测试期间没有发生此异常。

Answer 3

我读了flume的源代码，发现flume读取了“consumer.timeout.ms”键的“超时”值。

所以你可以像这样配置“consumer.timeout.ms”的值：

agent1.channels.kafka_channel.timeout=-1

Apache Flume：kafka.consumer.ConsumerTimeoutException

3 个答案: