我是抓狂的新手。我正在使用https://wiki.apache.org/nutch/NutchTutorial#A3._Crawl_your_first_website 用nutch 1.12执行爬行。我在Windows上使用Cygwin进行了设置。
“bin / nutch”命令正常运行但是要抓取我做了以下更改 -
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>My Nutch Spider</value>
</property>
</configuration>
这是我创建的urls / seed.txt文件的内容
现在我运行以下命令bin/nutch inject crawl/crawldb urls
我得到nullPointerException,如下所示
MithL@DESKTOP-K3INBH0 /home/apache-nutch-1.12
$ bin/nutch inject crawl/crawldb urls
Injector: starting at 2017-02-21 14:03:51
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
请建议我该做什么。谢谢
!UPDATE!
我将hadoop-core-1.2.1.jar添加到apache-nutch-1.12 / lib文件夹中,并将HADOOP_HOME环境变量设置为C:\winutils\bin\winutils.exe
现在它提供了UnsupportedOperationException,如下所示
$ bin/nutch inject crawl/crawldb urls
Injector: starting at 2017-02-21 21:37:32
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.nutch.crawl.Injector.inject(Injector.java:347)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
请建议我该做什么。谢谢