配置水槽写文件~100mb(接近120mb hdfs文件大小)

时间:2014-07-24 07:28:54

标签: hadoop flume-ng

我试图配置Flume所以它至少使用HDFS的块大小,在我的情况下是128mb。这是我的配置,每个文件大约写10mb:

###############################
httpagent.sources = http-source
httpagent.sinks = k1
httpagent.channels = ch3

# Define / Configure Source (multiport seems to support newer "stuff")
###############################
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 5140

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.5/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollCount = 0
httpagent.sinks.k1.hdfs.batchSize = 10000
httpagent.sinks.k1.hdfs.rollSize = 0



httpagent.sinks.log-sink.channel = memory
httpagent.sinks.log-sink.type = logger





# Channels
###############################

httpagent.channels = ch3
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 100000
httpagent.channels.ch3.transactionCapacity = 80000

所以问题是我不能写它写100mb文件..我希望至少写100mb如果我改变这样的配置:

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.4test/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollSize = 100000000                                   
httpagent.sinks.k1.hdfs.rollCount = 0

然后文件变得更小,他写了大约3-8mb文件......由于它不可能聚合​​文件,他们在hdfs我真的想让这些文件更大。有什么东西我没有得到关于rollSize参数?或者是否有一些默认值,所以从来没有写过那些大文件?

1 个答案:

答案 0 :(得分:3)

您需要将rollInterval覆盖为0,永远不会根据时间间隔滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0