使用Htmlunit WebClient无法完全加载网页的动态内容

时间:2019-05-22 11:04:57

标签: javascript java htmlunit

我正在尝试使用HtmlUnit WebClient加载网页(https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en)进行抓取。但是内容未正确加载。例如,我找不到“应用”按钮。 我的网络客户端代码如下

webClient.setCssErrorHandler(new DefaultCssErrorHandler());
        webClient.setJavaScriptErrorListener(new DefaultJavaScriptErrorListener());
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        webClient.getCookieManager().setCookiesEnabled(true);
        webClient.waitForBackgroundJavaScript(60000);

有人可以帮我吗

1 个答案:

答案 0 :(得分:0)

这对我有用

public static void main(String[] args) throws IOException{
    final String url = "https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en";

    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
        HtmlPage page = webClient.getPage(url);

        // waitForBackgroundJavaScript has to be called after every action
        // this page is really slow wait for the last part of the dynamic content
        while(!page.asText().contains("Previous\r\n1\r\n2\r\n3\r\n4\r\n")) {
            webClient.waitForBackgroundJavaScript(1_000);
        }

        System.out.println("-------------------------------------------------------------------------------");
        System.out.println(page.asText());
        System.out.println("-------------------------------------------------------------------------------");
    }
}