当我尝试通过Flume将流数据传输到hadoop时,我收到以下错误。
我在flume / lib中创建了指向hadoop / share / hadoop /
中的.jar
文件的链接
我仔细检查了网址,我认为它们都是正确的。想发布以获得更多的眼睛和一些反馈。
2017-07-20 10:53:18,959 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process HDFSEventSink.java:455)] HDFS IO error
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
这是Flume Sink Config
agent1.sinks.PurePathSink.type = hdfs
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
agent1.sinks.PurePathSink.hdfs.fileType = DataStream
agent1.sinks.PurePathSink.hdfs.filePrefix = export
agent1.sinks.PurePathSink.hdfs.fileSuffix = .txt
agent1.sinks.PurePathSink.hdfs.rollInterval = 120
agent1.sinks.PurePathSink.hdfs.rollSize = 131072
core-site.xml - Hadoop 2.8
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home1/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
答案 0 :(得分:0)
查看您的Flume Sink,您似乎没有在群集上运行此功能,而是在本地主机上运行。
检查HDFS路径是否可访问:
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp
端口号通常为8020(如果您使用的是Cloudera Distribution)
还请检查以下链接以获取错误复制和解决方案: [Cloudera解决:FLUME + IO错误问题]
答案 1 :(得分:0)
在我的案例中,我发现明确声明路径解决了这个问题。它与它正在拾取的Jar有关。
感谢@ V.Bravo的回复。我没有使用发行版,而是站在我自己的集群中
答案 2 :(得分:0)
在我的情况下,将hdfs jar文件从hadoop / hdfs复制到flume / lib解决了问题。
$ cp my_hadoop_path/share/hadoop/hdfs/*.jar my_flume_path/lib/