我想将确切的文件从本地转储到HDFS,但是发生的事情是,flume正在合并来自所有.gz文件的所有数据并将其写入一个并且我想在hdfs上写入.gz文件系统当前时间戳的基础。
AGENT_CONFIGURATION
# Identify the components on agent agent1:
agent1.sources = agent1_source
agent1.sinks = agent1_sink
agent1.channels = agent1_channel
# Configure the source:
agent1.sources.agent1_source.type = spooldir
agent1.sources.agent1_source.spoolDir =/data/
agent1.sources.agent1_source.fileSuffix= .COMPLETED
agent1.sources.agent1_source.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
# Describe the sink:
agent1.sinks.agent1_sink.type = hdfs
agent1.sinks.agent1_sink.hdfs.path = /user/hadoop/data
agent1.sinks.agent1_sink.hdfs.writeFormat = Text
#agent1.sinks.agent1_sink.hdfs.fileType = DataStream
agent1.sinks.agent1_sink.hdfs.rollInterval=0
agent1.sinks.agent1_sink.hdfs.rollSize=0
agent1.sinks.agent1_sink.hdfs.fileType =CompressedStream
agent1.sinks.agent1_sink.hdfs.codeC=gzip
agent1.sinks.agent1_sink.hdfs.rollCount=0
agent1.sinks.agent1_sink.hdfs.idleTimeout=1
# Configure a channel that buffers events in memory:
agent1.channels.agent1_channel.type = memory
agent1.channels.agent1_channel.capacity = 20000
agent1.channels.agent1_channel.transactionCapacity = 100
# Bind the source and sink to the channel:
agent1.sources.agent1_source.channels = agent1_channel
agent1.sinks.agent1_sink.channel = agent1_channel
我的hdfs文件格式如下: text1_2018-02-01.txt.gz text2_2018-02-02.txt.gz
我希望它能存储在hdfs上 /user/hadoop/data/event_date=2018-02-01/text1.txt.gz /user/hadoop/data/event_date=2018-02-02/text2.txt.gz 提前谢谢。