所以我配置了水槽把我的apache2访问日志写入hdfs ...而且我想通过水槽的日志是所有的配置是正确的但我不知道为什么它仍然没有写入hdfs 。 所以这是我的水槽配置文件
#agent and component of agent
search.sources = so
search.sinks = si
search.channels = sc
# Configure a channel that buffers events in memory:
search.channels.sc.type = memory
search.channels.sc.capacity = 20000
search.channels.sc.transactionCapacity = 100
# Configure the source:
search.sources.so.channels = sc
search.sources.so.type = exec
search.sources.so.command = tail -F /var/log/apache2/access.log
# Describe the sink:
search.sinks.si.channel = sc
search.sinks.si.type = hdfs
search.sinks.si.hdfs.path = hdfs://localhost:9000/flumelogs/
search.sinks.si.hdfs.writeFormat = Text
search.sinks.si.hdfs.fileType = DataStream
search.sinks.si.hdfs.rollSize=0
search.sinks.si.hdfs.rollCount = 10000
search.sinks.si.hdfs.batchSize=1000
search.sinks.si.rollInterval=1
这是我的水文日志
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Creating channels
14/12/18 17:47:56 INFO channel.DefaultChannelFactory: Creating instance of channel sc type memory
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Created channel sc
14/12/18 17:47:56 INFO source.DefaultSourceFactory: Creating instance of source so, type exec
14/12/18 17:47:56 INFO sink.DefaultSinkFactory: Creating instance of sink: si, type: hdfs
14/12/18 17:47:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/18 17:47:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Channel sc connected to [so, si]
14/12/18 17:47:56 INFO node.Application: Starting new configuration:{ sourceRunners:{so=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:so,state:IDLE} }} sinkRunners:{si=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3de76481 counterGroup:{ name:null counters:{} } }} channels:{sc=org.apache.flume.channel.MemoryChannel{name: sc}} }
14/12/18 17:47:56 INFO node.Application: Starting Channel sc
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: sc: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: sc started
14/12/18 17:47:56 INFO node.Application: Starting Sink si
14/12/18 17:47:56 INFO node.Application: Starting Source so
14/12/18 17:47:56 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: si: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: si started
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: so: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: so started
这是命令,我曾经开始使用水槽
flume-ng agent -n search -c conf -f ../conf/flume-conf-search
我在hdfs中有一条路径
hadoop fs -mkdir hdfs://localhost:9000/flumelogs
但我不知道为什么它不写入hdfs..i可以看到apache2的访问日志..但是水槽没有将它们发送到hdfs / flumelogs目录....请帮忙! !
答案 0 :(得分:1)
我不认为这是一个许可问题,当水槽冲到HDFS时你会看到异常。这个问题有两个可能的原因:
1)缓冲区中没有足够的数据,水槽不认为它必须冲洗。您的接收器批量大小为1000,您的通道容量为20000.要验证这一点,请按CTRL -C您的水槽过程,这将强制进程刷新到HDFS。
2)更可能的原因是你的exec源运行不正常。这可能是由于tail命令的路径问题。在命令中添加tail的完整路径,例如/ bin / tail -F /var/log/apache2/access.log或/ usr / bin / tail -F /var/log/apache2/access.log(取决于您的系统)检查
which tail
获取正确的路径。
答案 1 :(得分:0)
请检查一下这个文件夹的权限: hdfs:// localhost:9000 / flumelogs /
我的猜测是,水槽没有写入该文件夹的权限