如何使用HtmlUnit修复网站上的所有加载URL?

时间:2018-07-24 07:22:43

标签: javascript java google-analytics htmlunit

使用WebClientHtmlUnit获取请求时缺少某些URL。

问题列表:

  1. 未加载JSON类型的facebook.com,JavaScript,gif和XHR类型的Google Analytics(分析)URL。

  2. 例如,在Google Analytics(分析)中,有5个触发了4个网址。 1个网址未触发。请检查我们的代码,让我们知道如何一次在conversion.async.js

  3. 中触发所有网址

这是我的代码:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_52);
webClient.getCookieManager().clearCookies();
webClient.getCache().clear();
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.getOptions().setTimeout(120000);
// to wait for AJAX
webClient.waitForBackgroundJavaScript(60000);

webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setDoNotTrackEnabled(false);

new WebConnectionWrapper(webClient) {
  int conversionUrlCount = 0;

  @Override
  public WebResponse getResponse(WebRequest request) throws IOException {
    System.out.println(request.getUrl());
    WebResponse response = super.getResponse(request);
    System.out.println(response.getStatusCode());
    if (response.getStatusCode() < 400) {
      resourceUrls.add(request.getUrl());
    }
    return response;
  }
};

String url = "abc.com",
  HtmlPage page = webClient.getPage(url);

// to forcibly load the link
HtmlLink link = page.getFirstByXPath("//link");
link.getWebResponse(true);

// to forcibly load the image
try {
  HtmlImage image = page.getFirstByXPath("//img");
  image.getImageReader();
} catch (IOException e) {
  //don't need to crash at this point,
  //just let the user know that a wrong file has been passed.
}

0 个答案:

没有答案