使用HtmlUnit预渲染Javascript网站(HTML快照)

时间:2017-06-13 14:55:15

标签: java htmlunit

我正在尝试构建一个由HtmlUnit驱动的预渲染器,并尝试使用此url进行测试:https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3

这是我的代码:

final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
WebClientOptions options = webClient.getOptions();
options.setCssEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
//    webClient.setAjaxController(new AjaxController(){
//        @Override
//        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) {
//            return true;
//        }
//    });
options.setThrowExceptionOnScriptError(false);
options.setThrowExceptionOnFailingStatusCode(false);
options.setRedirectEnabled(false);
options.setAppletEnabled(false);
options.setJavaScriptEnabled(true);
//options.setUseInsecureSSL(true);
options.setTimeout(50000);
webClient.addRequestHeader("Access-Control-Allow-Origin", "*");

HtmlPage page = webClient.getPage(path);

// important!  Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.setJavaScriptTimeout(10000);
webClient.waitForBackgroundJavaScript(10000);
//just wait
for (int i = 0; i < 20; i++) {
    synchronized (page) {
        page.wait(500);
    }
}
String xml = page.asXml();

这里的问题是输出html不包含应该使用Javascript获取的内容。

这里可能有什么问题?

1 个答案:

答案 0 :(得分:0)

以下代码使用2.28-snapshot检索:

  

Donec in massa vel lectus aliquam laoreet nec et turpis。 ....

try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
    WebClientOptions options = webClient.getOptions();
    options.setCssEnabled(true);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    options.setTimeout(50000);
    webClient.addRequestHeader("Access-Control-Allow-Origin", "*");

    HtmlPage page = webClient.getPage("https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3");

    // important!  Give the headless browser enough time to execute JavaScript
    // The exact time to wait may depend on your application.
    webClient.setJavaScriptTimeout(10000);
    webClient.waitForBackgroundJavaScript(10000);
    //just wait
    Thread.sleep(10000);

    String xml = page.asXml();
    System.out.println(xml);
}

你还缺少什么?