我正在尝试使用HtmlUnit WebClient加载网页(https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en)进行抓取。但是内容未正确加载。例如,我找不到“应用”按钮。 我的网络客户端代码如下
webClient.setCssErrorHandler(new DefaultCssErrorHandler());
webClient.setJavaScriptErrorListener(new DefaultJavaScriptErrorListener());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getCookieManager().setCookiesEnabled(true);
webClient.waitForBackgroundJavaScript(60000);
有人可以帮我吗
答案 0 :(得分:0)
这对我有用
public static void main(String[] args) throws IOException{
final String url = "https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
// waitForBackgroundJavaScript has to be called after every action
// this page is really slow wait for the last part of the dynamic content
while(!page.asText().contains("Previous\r\n1\r\n2\r\n3\r\n4\r\n")) {
webClient.waitForBackgroundJavaScript(1_000);
}
System.out.println("-------------------------------------------------------------------------------");
System.out.println(page.asText());
System.out.println("-------------------------------------------------------------------------------");
}
}