Crawljax - 需要jars文件进行动态网页抓取

时间:2014-12-10 05:31:07

标签: java web-crawler

我正在尝试使用Crawljax抓取javascript网页(IFrame html标记中的内容)。我已将slf4j,crawljax 2.1和Guava 18.0 jar添加到应用程序中。

弹出窗口中显示错误消息:

cannot find symbol 
import com.crawljax.core.configuration.CrawljaxConfiguration.CrawljaxConfigurationBuild‌​er; 
symbol: class CrawljaxConfigurationBuilder 
location: class CrawljaxConfiguration.

代码:

import com.crawljax.core.CrawlerContext;
import com.crawljax.core.CrawljaxRunner;
import com.crawljax.core.configuration.CrawljaxConfiguration;
import com.crawljax.core.configuration.CrawljaxConfiguration.CrawljaxConfigurationBuilder;
import com.crawljax.core.plugin.OnNewStatePlugin;
import com.crawljax.core.state.StateVertex;

public class CrawljaxExamples {

    public static void main(String[] args) {

        CrawljaxConfigurationBuilder builder
                = CrawljaxConfiguration.builderFor("http://help.syncfusion.com/ug/wpf/default.htm#!documents/overview.htm");
        builder.addPlugin(new OnNewStatePlugin() {

            @Override
            public void onNewState(CrawlerContext context, StateVertex newState) {
            }

            @Override
            public String toString() {
                return "Our example plugin";
            }
        });
        CrawljaxRunner crawljax = new CrawljaxRunner(builder.build());
        crawljax.call();
    }
}

错误消息:

java.lang.ExceptionInInitializerError
Caused by: java.lang.RuntimeException: Uncompilable source code - cannot find symbol
  symbol:   class CrawljaxConfigurationBuilder
  location: class com.crawljax.core.configuration.CrawljaxConfiguration
    at crawljaxexamples.CrawljaxExamples.<clinit>(CrawljaxExamples.java:12)
Exception in thread "main" Java Result: 1

可以在下面的链接

中找到相同的代码

https://github.com/crawljax/crawljax/blob/master/examples/src/main/java/com/crawljax/examples/PluginExample.java

有人可以告诉您运行此程序所需的jar文件是什么?或者IDE中是否有任何设置需要更改?

由于

1 个答案:

答案 0 :(得分:0)

您似乎正在使用旧版本的crawljax。

下载最新版本crawljax-cli-3.5.1.zip

将lib文件夹和crawljax-cli-3.5.1.jar中的所有jar从主文件夹添加为lib路径。

经过测试,现在效果很好。