尝试使用本地主机名上的nutch-java.net.UnknownHostException进行爬网时出错

时间:2015-01-30 03:52:00

标签: java hadoop solr nutch

尝试在Centos 6.6上使用Nutch 1.9进行爬行。

在遵循本指南后尝试初始化我的第一次抓取时:

http://wiki.apache.org/nutch/NutchTutorial

但是,我在启动时遇到以下异常:

  

Injector:将注入的url转换为抓取db条目。注射器:   java.net.UnknownHostException:Sparky.LITK:Sparky.LITK:名称或   服务未知   java.net.InetAddress.getLocalHost(InetAddress.java:1473)at   org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:960)at   org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:936)at at   java.security.AccessController.doPrivileged(Native Method)at   javax.security.auth.Subject.doAs(Subject.java:415)at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)     在   org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)     在org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)     在org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)at   org.apache.nutch.crawl.Injector.inject(Injector.java:324)at   org.apache.nutch.crawl.Injector.run(Injector.java:380)at at   org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)at   org.apache.nutch.crawl.Injector.main(Injector.java:370)引起:   java.net.UnknownHostException:Sparky.LITK:名称或服务未知     在java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)at   java.net.InetAddress $ 1.lookupAllHostAddr(InetAddress.java:901)at   java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)     在java.net.InetAddress.getLocalHost(InetAddress.java:1469)... 12   更

它似乎试图抓取机器自己的主机名(Sparky.LITK),这不是我想要它做的,我根据教程设置了一个seed.txt列表,但它在这里停留

1 个答案:

答案 0 :(得分:0)

修复就像将机器的主机名添加到指向回送地址(127.0.0.1)的/ etc / hosts文件一样简单

我推荐我的主持人如下:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 Sparky.LITK
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 Sparky.LITK

它有效!