我试图首先使用HtmlUnit来获取网页的html源代码,而不是使用PhantomJS,但两者都让我失望。我得到的页面源包含Javascript,似乎没有被执行。我没有'真的明白发生了什么。我试过的HtmlUnit版本:
webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.waitForBackgroundJavaScript(10000);
webClient.getOptions().setThrowExceptionOnScriptError(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage("https://www.flickr.com/search/?text=cats&view_all=1");
webClient.close();
System.out.println(page.asXml());
幻影版本:
File phantomjs = Phanbedder.unpack();
DesiredCapabilities dcaps = new DesiredCapabilities();
dcaps.setJavascriptEnabled(true);
dcaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, phantomjs.getAbsolutePath());
dcaps.setCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
driver = new PhantomJSDriver(dcaps);
driver.manage().timeouts().setScriptTimeout(10, TimeUnit.SECONDS);
driver.get("https://www.flickr.com/search/?text=cats&view_all=1");
System.out.println(driver.getPageSource());
如果有人可以帮助我,我将非常感激。感谢。