我在使用Nutch进行网页抓取时使用以下堆栈:
但是当我通过这个命令注入url时:
hadoop@ubuntu:~$ nutch inject /home/gsingh/urls/seed.txt
我收到以下错误。
> InjectorJob: starting at 2016-05-19 11:12:57 InjectorJob: Injecting
> urlDir: /home/gsingh/urls/seed.txt InjectorJob:
> java.lang.UnsupportedOperationException: Not implemented by the
> DistributedFileSystem FileSystem implementation
> at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
> at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2559)
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2569)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:352)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:212)
> at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
> at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
以下是类路径值:
hadoop@ubuntu:~$ echo $CLASSPATH
/usr/local/nutch/runtime/local/lib/*:.
任何人都知道如何纠正这个错误?