我们按目录顺序保存推文,例如/ user / flume / 2016/06/28/13 / FlumeData ....但每小时创建超过100个FlumeData文件。我已更改TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb)
相同事情再次发生。之后我尝试改变rollcount参数但是没有工作。我如何设置参数来获得每小时一个FlumeData文件。
答案 0 :(得分:0)
rollInterval
怎么样?你把它设置为零吗?如果是,则问题可能是其他问题。如果rollInterval
设置为某个值,则会覆盖rollSize
和rollCount
值。文件轮换可能在文件大小达到rollSize
值之前发生。另外,检查您设置的HDFS块大小。如果设置为,则值太小,甚至可能导致文件滚动。
试试这个 -
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
答案 1 :(得分:0)
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
答案 2 :(得分:0)
我通过将rollInterval = 3600 rollcount = 0和batchSize = 100 flume.conf参数设置为@vkgade建议解决了这个问题