线程FetcherThread没有更多可用的工作。使用以下命令获取.com /失败:java.net.SocketTimeoutException:connect timed out

时间:2016-01-13 09:32:31

标签: nutch

在面对下面的问题https://wiki.apache.org/nutch/NutchTutorial跟随{{3}}时提供 bin / nutch fetch $ s1

Thread FetcherThread has no more work available
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
fetch of http://nutch.apache.org/ failed with: java.net.SocketTimeoutException: connect timed out

请理清这是怎么回事?

1 个答案:

答案 0 :(得分:0)

首先删除您的抓取文件夹。

运行以下命令:

bin/nutch inject crawl/crawldb dmoz
bin/nutch inject crawl/crawldb urls
bin/nutch generate crawl/crawldb crawl/segments
s1=`ls -d crawl/segments/2* | tail -1`
echo $s1
bin/nutch fetch $s1
bin/nutch parse $s1
bin/nutch updatedb crawl/crawldb $s1

启动所有配置,例如dmoz,nutch-site.xml等下的url