将水槽中的流多路复用到几个通道中

时间:2018-08-24 18:27:10

标签: flume-ng

我想根据文件名将水槽中的流多路复用到几个通道中。怎么办我使用了假脱机目录源。我使用了频道选择器。它应将流乘以事件标题中的文件名。

我有很多文件名为CA,AZ,CA2,AZ2,....等等.CA文件shuold写入/ flume_sink / CA目录,AZ文件shuold写入/ flume_sink / AZ和KT是默认目录。使用以下代码。但是它没有进行多路复用。它正在复制。

配置有什么问题?

agent1.sinks=hdfs-sink1_1 hdfs-sink1_2 hdfs-sink1_3
agent1.sources=source1_1
agent1.channels=fileChannel1_1 fileChannel1_2 fileChannel1_3

agent1.channels.fileChannel1_1.type=file
agent1.channels.fileChannel1_1.capacity=200000
agent1.channels.fileChannel1_1.transactionCapacity=1000
agent1.channels.fileChannel1_1.checkpointDir=/home/Flume/alpha/001
agent1.channels.fileChannel1_1.dataDirs=/home/Flume/alpha_data
agent1.channels.fileChannel1_1.checkpointOnClose=true
agent1.channels.fileChannel1_1.dataOnClose=true


agent1.sources.source1_1.type=spooldir
agent1.sources.source1_1.spoolDir=/home/ABC/
agent1.sources.source1_1.recursiveDirectorySearch=true
#agent1.sources.source1_1.fileHeader=true
#agent1.sources.source1_1.fileHeaderKey=file
agent1.sources.source1_1.fileSuffix=.COMPLETED
agent1.sources.source1_1.basenameHeader = true
agent1.sources.source1_1.basenameHeaderKey = basename

agent1.sinks.hdfs-sink1_1.type=hdfs
agent1.sinks.hdfs-sink1_1.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_1.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/CA
agent1.sinks.hdfs-sink1_1.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_1.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_1.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_1.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_1.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_1.hdfs.useLocalTimeStamp=false


agent1.channels.fileChannel1_2.type=file
agent1.channels.fileChannel1_2.capacity=200000
agent1.channels.fileChannel1_2.transactionCapacity=1000
agent1.channels.fileChannel1_2.checkpointDir=/home/Flume/beta/001
agent1.channels.fileChannel1_2.dataDirs=/home/Flume/beta_data
agent1.channels.fileChannel1_2.checkpointOnClose=true
agent1.channels.fileChannel1_2.dataOnClose=true



agent1.sinks.hdfs-sink1_2.type=hdfs
agent1.sinks.hdfs-sink1_2.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_2.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/AZ
agent1.sinks.hdfs-sink1_2.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_2.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_2.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_2.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_2.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_2.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_2.hdfs.useLocalTimeStamp=false

agent1.channels.fileChannel1_3.type=file
agent1.channels.fileChannel1_3.capacity=200000
agent1.channels.fileChannel1_3.transactionCapacity=10
agent1.channels.fileChannel1_3.checkpointDir=/home/Flume/gamma/001
agent1.channels.fileChannel1_3.dataDirs=/home/Flume/gamma_data
agent1.channels.fileChannel1_3.checkpointOnClose=true
agent1.channels.fileChannel1_3.dataOnClose=true


agent1.sinks.hdfs-sink1_3.type=hdfs
agent1.sinks.hdfs-sink1_3.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_3.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/KT
agent1.sinks.hdfs-sink1_3.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_3.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_3.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_3.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_3.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_3.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_3.hdfs.useLocalTimeStamp=false


agent1.sources.source1_1.channels=fileChannel1_1 fileChannel1_2 fileChannel1_3

agent1.sinks.hdfs-sink1_1.channel=fileChannel1_1
agent1.sinks.hdfs-sink1_2.channel=fileChannel1_2
agent1.sinks.hdfs-sink1_3.channel=fileChannel1_3


agent1.sources.source1_1.selector.type=replicating
agent1.sources.source1_1.selector.header=basename
agent1.sources.source1_1.selector.mapping.CA=fileChannel1_1
agent1.sources.source1_1.selector.mapping.AZ=fileChannel1_2
agent1.sources.source1_1.selector.default=fileChannel1_3

1 个答案:

答案 0 :(得分:0)

在假脱机目录源中尝试fileHeader和fileHeaderKey属性 https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source