Nutch 1.9和Javascript生成的内容

时间:2015-01-27 09:51:37

标签: selenium ant nutch

我正在使用Nutch 1.9。

当大多数网页使用javascript生成时,Nutch忽略了Javascripte生成的内容。是否有可能获取它?

我发现Selenium可能是一种方法,但它似乎只有Nutch 2.x支持。是否可以与Nutch 1.9集成(以及如何)?

我已经按照nutch-selenium上的安装说明进行了操作,但是当我运行ant时,很明显发生了很多错误。

compile:
     [echo] Compiling plugin: protocol-selenium
    [javac] Compiling 2 source files to $NUTCH_HOME/build/protocol-selenium/classes
    [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:14: error: package org.apache.nutch.storage does not exist
    [javac] import org.apache.nutch.storage.WebPage;
    [javac]                                ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:15: error: package org.apache.nutch.storage.WebPage does not exist
    [javac] import org.apache.nutch.storage.WebPage.Field;
    [javac]                                        ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:26: error: package WebPage does not exist
    [javac]   private static final Collection<WebPage.Field> FIELDS = new HashSet<WebPage.Field>();
    [javac]                                          ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:49: error: cannot find symbol
    [javac]     protected Response getResponse(URL url, WebPage page, boolean redirect)
    [javac]                                             ^
    [javac]   symbol:   class WebPage
    [javac]   location: class Http
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:55: error: package WebPage does not exist
    [javac]   public Collection<WebPage.Field> getFields() {
    [javac]                            ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java:16: error: package org.apache.nutch.storage does not exist
    [javac] import org.apache.nutch.storage.WebPage;
    [javac]                                ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java:47: error: cannot find symbol
    [javac]     public HttpResponse(Http http, URL url, WebPage page, Configuration conf) throws ProtocolException, IOException {
    [javac]                                             ^
    [javac]   symbol:   class WebPage
    [javac]   location: class HttpResponse
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:26: error: package WebPage does not exist
    [javac]   private static final Collection<WebPage.Field> FIELDS = new HashSet<WebPage.Field>();
    [javac]                                                                              ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:29: error: package WebPage does not exist
    [javac]     FIELDS.add(WebPage.Field.MODIFIED_TIME);
    [javac]                       ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:30: error: package WebPage does not exist
    [javac]     FIELDS.add(WebPage.Field.HEADERS);
    [javac]                       ^
    [javac] $NUTCH_HOME/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java:54: error: method does not override or implement a method from a supertype
    [javac]   @Override
    [javac]   ^
    [javac] 11 errors
    [javac] 1 warning

BUILD FAILED
$NUTCH_HOME/build.xml:112: The following error occurred while executing this line:
$NUTCH_HOME/src/plugin/build.xml:77: The following error occurred while executing this line:
$NUTCH_HOME/src/plugin/build-plugin.xml:133: Compile failed; see the compiler error output for details.

或者还有其他选择吗?

0 个答案:

没有答案