我试图配置Flume所以它至少使用HDFS的块大小,在我的情况下是128mb。这是我的配置,每个文件大约写10mb:
###############################
httpagent.sources = http-source
httpagent.sinks = k1
httpagent.channels = ch3
# Define / Configure Source (multiport seems to support newer "stuff")
###############################
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 5140
httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.5/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollCount = 0
httpagent.sinks.k1.hdfs.batchSize = 10000
httpagent.sinks.k1.hdfs.rollSize = 0
httpagent.sinks.log-sink.channel = memory
httpagent.sinks.log-sink.type = logger
# Channels
###############################
httpagent.channels = ch3
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 100000
httpagent.channels.ch3.transactionCapacity = 80000
所以问题是我不能写它写100mb文件..我希望至少写100mb如果我改变这样的配置:
httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.4test/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollSize = 100000000
httpagent.sinks.k1.hdfs.rollCount = 0
然后文件变得更小,他写了大约3-8mb文件......由于它不可能聚合文件,他们在hdfs我真的想让这些文件更大。有什么东西我没有得到关于rollSize参数?或者是否有一些默认值,所以从来没有写过那些大文件?
答案 0 :(得分:3)
您需要将rollInterval覆盖为0,永远不会根据时间间隔滚动:
httpagent.sinks.k1.hdfs.rollInterval = 0