我正在尝试使用Apache Flume构建管道: spooldir - > kafka频道 - > hdfs sink
事件转到kafka主题没有问题,我可以用kafkacat请求看到它们。但是kafka通道无法通过接收器将文件写入hdfs。错误是:
在等待来自Kafka的数据时超时
完整日志:
2016-02-26 18:25:17,125 (SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181)) [调试 - org.apache.zookeeper.ClientCnxn $ SendThread.readResponse(ClientCnxn.java:717)] 对于sessionid得到ping响应:0ms后的0x2524a81676d02aa
2016-02-26 18:25:19,127 (SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181)) [调试 - org.apache.zookeeper.ClientCnxn $ SendThread.readResponse(ClientCnxn.java:717)] 对于sessionid获得ping响应:1ms后获得0x2524a81676d02aa
2016-02-26 18:25:21,129 (SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181)) [调试 - org.apache.zookeeper.ClientCnxn $ SendThread.readResponse(ClientCnxn.java:717)] 对于sessionid得到ping响应:0ms后的0x2524a81676d02aa
2016-02-26 18:25:21,775 (SinkRunner-PollingRunner-DefaultSinkProcessor)[DEBUG - org.apache.flume.channel.kafka.KafkaChannel $ KafkaTransaction.doTake(KafkaChannel.java:327)] 在等待来自Kafka的数据时超时 kafka.consumer.ConsumerTimeoutException at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:69) 在 kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33) 在 kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66) at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58) 在 org.apache.flume.channel.kafka.KafkaChannel $ KafkaTransaction.doTake(KafkaChannel.java:306) 在 org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) 在 org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95) 在 org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:374) 在 org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) 在org.apache.flume.SinkRunner $ PollingRunner.run(SinkRunner.java:147) 在java.lang.Thread.run(Thread.java:745)
我的FlUME的配置是:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c2
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/alex/spoolFlume
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://10.12.0.1:54310/logs/flumetest/
a1.sinks.k1.hdfs.filePrefix = flume-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.capacity = 10000
a1.channels.c2.transactionCapacity = 1000
a1.channels.c2.brokerList=kafka10:9092,kafka11:9092,kafka12:9092
a1.channels.c2.topic=flume_test_001
a1.channels.c2.zookeeperConnect=zoo00:2181,zoo01:2181,zoo02:2181
# Bind the source and sink to the channel
a1.sources.r1.channels = c2
a1.sinks.k1.channel = c2
使用内存频道代替kafka频道一切正常。
提前感谢任何想法!
答案 0 :(得分:0)
ConsumerTimeoutException意味着很长一段时间没有新消息,并不代表Kafka的连接超时。
http://kafka.apache.org/documentation.html
consumer.timeout.ms -1如果在指定的时间间隔后没有消息可供消费,则向消费者抛出超时异常
答案 1 :(得分:0)
Kafka的ConsumerConfig类具有“consumer.timeout.ms”配置属性,Kafka默认将其设置为-1。预计任何新的卡夫卡消费者都会以合适的价格覆盖该物业。
以下是Kafka documentation的参考:
consumer.timeout.ms -1
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.
当Flume创建Kafka频道时,它将timeout.ms值设置为100,如INFO级别的Flume日志所示。这就解释了为什么我们会看到大量的这些ConsumerTimeoutExceptions。
level: INFO Post-validation flume configuration contains configuration for agents: [agent]
level: INFO Creating channels
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Creating instance of channel c1 type org.apache.flume.channel.kafka.KafkaChannel
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Group ID was not specified. Using flume as the group id.
level: INFO {metadata.broker.list=kafka:9092, request.required.acks=-1, group.id=flume,
zookeeper.connect=zookeeper:2181, **consumer.timeout.ms=100**, auto.commit.enable=false}
level: INFO Created channel c1
通过Kafka channel settings上的Flume用户指南,我尝试通过指定以下内容来覆盖此值,但这似乎不起作用:
agent.channels.c1.kafka.consumer.timeout.ms=5000
此外,我们通过频道不断地对敲击数据进行了负载测试,并且在测试期间没有发生此异常。
答案 2 :(得分:0)
我读了flume的源代码,发现flume读取了“consumer.timeout.ms”键的“超时”值。
所以你可以像这样配置“consumer.timeout.ms”的值:
agent1.channels.kafka_channel.timeout=-1