Flume以不一致的方式汇总数据

时间:2015-04-11 10:02:35

标签: hadoop flume flume-ng

我遇到了问题。我正在使用apache flume从txt文件中读取日志以接收到hdfs。某些记录在阅读时被忽略了。我正在使用fileChannel,请检查以下配置。

agent2.sources = file_server
agent2.sources.file_server.type=exec
agent2.sources.file_server.command = tail -F /home/datafile/error.log
agent2.sources.file_server.channels = fileChannel


agent2.channels = fileChannel
agent2.channels.fileChannel.type=file
agent2.channels.fileChannel.capacity = 12000
agent2.channels.fileChannel.transactionCapacity = 10000
agent2.channels.fileChannel.checkpointDir=/home/data/flume/checkpoint
agent2.channels.fileChannel.dataDirs=/home/data/flume/data


# Agent2 sinks
agent2.sinks = hadooper loged
agent2.sinks.hadooper.type = hdfs
agent2.sinks.loged.type=logger
agent2.sinks.hadooper.hdfs.path = hdfs://localhost:8020/flume/data/file
agent2.sinks.hadooper.hdfs.fileType = DataStream
agent1.sinks.hadooper.hdfs.writeFormat = Text
agent2.sinks.hadooper.hdfs.writeFormat = Text
agent2.sinks.hadooper.hdfs.rollInterval = 600
agent2.sinks.hadooper.hdfs.rollCount = 0
agent2.sinks.hadooper.hdfs.rollSize = 67108864
agent2.sinks.hadooper.hdfs.batchSize = 10
agent2.sinks.hadooper.hdfs.idleTimeout=0
agent2.sinks.hadooper.channel = fileChannel
agent2.sinks.loged.channel = fileChannel
agent2.sinks.hdfs.threadsPoolSize = 20

请帮忙。

1 个答案:

答案 0 :(得分:0)

我认为问题是你正在使用2个接收器从单个通道读取它们;在这种情况下,由2个接收器之一读取的Flume事件不会被另一个读取,反之亦然。

如果您希望两个接收器都收到相同Flume事件的副本,则需要为每个接收器创建专用通道。创建这些频道后,默认频道选择器ReplicatingChannelSelector将为每个频道创建一个副本。