这是我之前的配置文件,但以后突然发出错误。实际上我要做的是将所有日志从本地移动到hdfs日志应该作为一个文件移动到hdfs而不是作为一个部分:
#create source, channels, and sink
agent1.sources=S1
agent1.sinks=H1
agent1.channels=C1
#bind the source and sink to the channel
agent1.sources.S1.channels=C1
agent1.sinks.H1.channel=C1
#Specify the source type and directory
agent1.sources.S1.type=spooldir
agent1.sources.S1.spoolDir=/tmp/spooldir
#Specify the Sink type, directory, and parameters
agent1.sinks.H1.type=HDFS
agent1.sinks.H1.hdfs.path=/user/hive/warehouse
agent1.sinks.H1.hdfs.filePrefix=events
agent1.sinks.H1.hdfs.fileSuffix=.log
agent1.sinks.H1.hdfs.inUsePrefix=processing
A1.sinks.H1.hdfs.fileType=DataStream
#Specify the channeltyoe (Memory vs File)
agent1.channels.C1.type=file
我从这个脚本运行我的代理:
flume-ng agent --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1
然后我收到了这个警告:
Warning: No configuration directory set! Use --conf <dir> to override.
也
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration for 'A1' does not contain any channels. Marking it as invalid.
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'A1'. It will be removed.
然后就像这样重命名,创建和关闭相同的日志到hdfs:
16/10/14 16:22:41 INFO node.Application: Starting Sink H1
16/10/14 16:22:41 INFO node.Application: Starting Source S1
16/10/14 16:22:41 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /tmp/spooldir
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: H1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: H1 started
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: S1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: S1 started
16/10/14 16:22:41 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
16/10/14 16:22:42 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561961.log.tmp to /user/hive/warehouse/events.1476476561961.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561962.log.tmp to /user/hive/warehouse/events.1476476561962.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561963.log.tmp to /user/hive/warehouse/events.1476476561963.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561964.log.tmp to /user/hive/warehouse/events.1476476561964.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561965.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561965.log.tmp
:
:
:
为什么flume会一直将同一个文件写入hdfs,如何将一个日志从本地移动到hdfs而不将它们分成几部分,因为我的日志大小通常介于50 kb到300 kb之间。
更新警告:
16/10/18 10:10:05 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
16/10/18 10:10:05 WARN file.ReplayHandler: Ignoring /home/USER/.flume/file-channel/data/log-18 due to EOF
java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
at org.apache.flume.channel.file.LogFileFactory.getSequentialReader(LogFileFactory.java:169)
at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:264)
at org.apache.flume.channel.file.Log.doReplay(Log.java:529)
at org.apache.flume.channel.file.Log.replay(Log.java:455)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
答案 0 :(得分:0)
flume使用conf文件夹来提取JRE和记录属性,您可以使用--conf
参数修复错误消息,如下所示:
flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1
关于A1
的警告是因为您的代理配置文件末尾附近可能有拼写错误:
A1.sinks.H1.hdfs.fileType=DataStream
应阅读
agent1.sinks.H1.hdfs.fileType=DataStream
对于文件 - 您尚未为spoolDir源配置反序列化器,默认为LINE,因此您需要为spoolDir中的文件中的每一行获取HDFS文件。如果您希望Flume将整个文件用作单个事件(https://flume.apache.org/FlumeUserGuide.html#blobdeserializer)
,则需要使用BlobDeserializeragent1.sources.S1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder