我想将确切的文件从本地文件系统转储到HDFS

时间:2018-03-26 11:34:41

标签: hadoop streaming flume

我想将确切的文件从本地转储到HDFS,但是发生的事情是,flume正在合并来自所有.gz文件的所有数据并将其写入一个并且我想在hdfs上写入.gz文件系统当前时间戳的基础。

AGENT_CONFIGURATION

# Identify the components on agent agent1:
agent1.sources = agent1_source
agent1.sinks = agent1_sink
agent1.channels = agent1_channel

# Configure the source:
agent1.sources.agent1_source.type = spooldir
agent1.sources.agent1_source.spoolDir =/data/
agent1.sources.agent1_source.fileSuffix= .COMPLETED
agent1.sources.agent1_source.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

# Describe the sink:
agent1.sinks.agent1_sink.type = hdfs
agent1.sinks.agent1_sink.hdfs.path = /user/hadoop/data
agent1.sinks.agent1_sink.hdfs.writeFormat = Text
#agent1.sinks.agent1_sink.hdfs.fileType = DataStream
agent1.sinks.agent1_sink.hdfs.rollInterval=0
agent1.sinks.agent1_sink.hdfs.rollSize=0
agent1.sinks.agent1_sink.hdfs.fileType =CompressedStream
agent1.sinks.agent1_sink.hdfs.codeC=gzip
agent1.sinks.agent1_sink.hdfs.rollCount=0
agent1.sinks.agent1_sink.hdfs.idleTimeout=1

# Configure a channel that buffers events in memory:
agent1.channels.agent1_channel.type = memory
agent1.channels.agent1_channel.capacity = 20000
agent1.channels.agent1_channel.transactionCapacity = 100

# Bind the source and sink to the channel:
agent1.sources.agent1_source.channels = agent1_channel
agent1.sinks.agent1_sink.channel = agent1_channel

我的hdfs文件格式如下: text1_2018-02-01.txt.gz text2_2018-02-02.txt.gz

我希望它能存储在hdfs上 /user/hadoop/data/event_date=2018-02-01/text1.txt.gz /user/hadoop/data/event_date=2018-02-02/text2.txt.gz     提前谢谢。

0 个答案:

没有答案