我使用flume-ng-1.2.0和cdh3u5。我只是试图从文本文件中提取数据并将其放入hdfs。 这是我正在使用的配置:
agent1.sources = tail1
agent1.channels = Channel-2
agent1.sinks = HDFS
agent1.sources.tail1.type = exec
agent1.sources.tail1.command = tail -F /usr/games/sample1.txt
agent1.sources.tail1.channels = Channel-2
agent1.sinks.HDFS.channel = Channel-2
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://10.12.1.2:8020/user/hdfs/flume
agent1.sinks.HDFS.hdfs.fileType = DataStream
agent1.channels.Channel-2.type = memory
agent1.channels.Channel-2.capacity = 1000
我正在bin/flume-ng agent -n agent1 -c ./conf/ -f conf/flume.conf
我得到的日志是
2012-10-11 12:10:36,626 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
2012-10-11 12:10:36,631 INFO node.FlumeNode: Flume node starting - agent1
2012-10-11 12:10:36,639 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
2012-10-11 12:10:36,639 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 12
2012-10-11 12:10:36,641 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
2012-10-11 12:10:36,646 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:conf/flume.conf
2012-10-11 12:10:36,657 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,671 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: agent1
2012-10-11 12:10:36,758 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]
2012-10-11 12:10:36,758 INFO properties.PropertiesFileConfigurationProvider: Creating channels
2012-10-11 12:10:36,800 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: Channel-2, registered successfully.
2012-10-11 12:10:36,800 INFO properties.PropertiesFileConfigurationProvider: created channel Channel-2
2012-10-11 12:10:36,835 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
2012-10-11 12:10:37,753 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
2012-10-11 12:10:37,896 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
2012-10-11 12:10:37,899 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{tail1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource@362f0d54 }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4b142196 counterGroup:{ name:null counters:{} } }} channels:{Channel-2=org.apache.flume.channel.MemoryChannel@16a9255c} }
2012-10-11 12:10:37,900 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel Channel-2
2012-10-11 12:10:37,901 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: Channel-2 started
2012-10-11 12:10:37,901 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink HDFS
2012-10-11 12:10:37,905 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2012-10-11 12:10:37,910 INFO nodemanager.DefaultLogicalNodeManager: Starting Source tail1
2012-10-11 12:10:37,912 INFO source.ExecSource: Exec source starting with command:tail -F /usr/games/sample1.txt
我不知道我在哪里做错了。由于我是初学者,我在hdfs中没有得到任何东西,并且flume-agent继续运行。任何建议和更正对我都非常有帮助,谢谢。
答案 0 :(得分:2)
一个问题是您设置了agent1.sinks.HDFS.hdfs.file.Type = DataStream
但属性为hdfs.fileType
- 有关详细信息,请参阅https://flume.apache.org/FlumeUserGuide.html#hdfs-sink。
我会尝试使用记录器 - sink.type = logger
- 只是为了查看是否有任何问题。还要确保从shell运行tail -F
命令时得到的东西。
还有一件事,可能是红鲱鱼:在日志消息的末尾有一个反引号(`)。也许这是一个粘贴错误,但如果没有,那么如果它在你的配置文件中,如果它引起麻烦我不会感到惊讶。我所引用的消息来自日志的最后一行:
Exec source starting with command:tail -F /usr/games/value.txt`