什么应该是flume.conf参数来保存每小时单个FlumeData文件的推文?

时间:2016-06-29 11:52:49

标签: hadoop cloudera flume tweetstream flume-twitter

我们按目录顺序保存推文,例如/ user / flume / 2016/06/28/13 / FlumeData ....但每小时创建超过100个FlumeData文件。我已更改TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb)相同事情再次发生。之后我尝试改变rollcount参数但是没有工作。我如何设置参数来获得每小时一个FlumeData文件。

3 个答案:

答案 0 :(得分:0)

rollInterval怎么样?你把它设置为零吗?如果是,则问题可能是其他问题。如果rollInterval设置为某个值,则会覆盖rollSizerollCount值。文件轮换可能在文件大小达到rollSize值之前发生。另外,检查您设置的HDFS块大小。如果设置为,则值太小,甚至可能导致文件滚动。

试试这个 -

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100


    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0

    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000

    TwitterAgent.channels.MemChannel.transactionCapacity = 100

答案 1 :(得分:0)

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1


TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 1000

答案 2 :(得分:0)

我通过将rollInterval = 3600 rollcount = 0和batchSize = 100 flume.conf参数设置为@vkgade建议解决了这个问题