无法将数据从水槽输入hdfs hadoop以获取日志

时间:2015-04-05 10:22:22

标签: apache hadoop hdfs flume

我正在使用以下配置将数据从日志文件推送到hdfs。

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity=5000
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.batchSize=10
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/user/flume/data/log.txt
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

我没有收到错误消息,但仍然无法找到hdfs中的输出。 在中断我可以看到下沉中断异常&该日志文件的一些数据。 我正在运行以下命令: flume-ng agent --conf / etc / flume-ng / conf / --conf-file /etc/flume-ng/conf/flume.conf -Dflume.root.logger = DEBUG,console -n agent;

2 个答案:

答案 0 :(得分:1)

我有类似的问题

在我的情况下现在它正在运作 以下是conf文件:

#Exec Source
execAgent.sources=e
execAgent.channels=memchannel
execAgent.sinks=HDFS
#channels
execAgent.channels.memchannel.type=file
execAgent.channels.memchannel.capacity = 20000
execAgent.channels.memchannel.transactionCapacity = 1000
#Define Source
execAgent.sources.e.type=org.apache.flume.source.ExecSource
execAgent.sources.e.channels=memchannel
execAgent.sources.e.shell=/bin/bash -c
execAgent.sources.e.fileHeader=false
execAgent.sources.e.fileSuffix=.txt
execAgent.sources.e.command=cat /home/sample.txt
#Define Sink
execAgent.sinks.HDFS.type=hdfs
execAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/user/flume/
execAgent.sinks.HDFS.hdfs.fileType=DataStream
execAgent.sinks.HDFS.hdfs.writeFormat=Text
execAgent.sinks.HDFS.hdfs.batchSize=1000
execAgent.sinks.HDFS.hdfs.rollSize=268435
execAgent.sinks.HDFS.hdfs.rollInterval=0
#Bind Source Sink Channel
execAgent.sources.e.channels=memchannel
execAgent.sinks.HDFS.channel=memchannel
`

我希望这可以帮到你。

答案 1 :(得分:0)

我建议在HDFS中放置文件时使用前缀配置:

agent.sinks.hdfs-sink.hdfs.filePrefix = log.out