我正试图从网站上抓取动态内容。我无法使用Jsoup
实现这一点,因为它只给了我静态页面源。所以,我切换到HtmlUnit
来完成这项任务。
由于我是HtmlUnit
的新用户,当我尝试按下搜索按钮时,我会遇到一些例外情况。
以下是我的代码: -
public static void main(String[] args) throws IOException {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
try {
System.out.println("Querying");
String url = "http://www.agoda.com/city/mumbai-in.html";
HtmlPage page= webClient.getPage(url);
HtmlForm form = page.getHtmlElementById("aspnetForm");
HtmlSelect checkIn = (HtmlSelect) form.getSelectsByName("ddlCheckInDay").get(0);
checkIn.setSelectedAttribute("12", true);
HtmlSelect checkInMY = (HtmlSelect) form.getSelectsByName("ddlCheckInMonthYear").get(0);
checkInMY.setSelectedAttribute("8,2014", true);
HtmlSelect nights = (HtmlSelect) form.getSelectsByName("ctl00$ctl00$MainContent$area_promo$CitySearchBox1$ddlNights").get(0);
nights.setSelectedAttribute("1", true);
final HtmlSubmitInput button = form.getInputByName("ctl00$ctl00$MainContent$area_promo$CitySearchBox1$SearchButton");
final HtmlPage secondPage = button.click();
System.out.println(secondPage.asXml());
System.out.println("Success");
} catch (final FailingHttpStatusCodeException e) {
System.out.println("One");
e.printStackTrace();
} catch (final MalformedURLException e) {
System.out.println("Two");
e.printStackTrace();
} catch (final IOException e) {
System.out.println("Three");
e.printStackTrace();
} catch (final Exception e) {
System.out.println("Four");
e.printStackTrace();
}
System.out.println("Finished");
}
我得到以下异常: -
EcmaError: lineNumber=[767] column=[0] lineSource=[null] name=[TypeError] sourceName=[script in http://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?asq=bs17wTmKLORqTfZUfjFABspANSEBVRKOEdlgdhMDKXq9AiQm2HGc5Vnb5H3nW0yov9IqRFL8sIj4SMPGpGP7KXtq7DnKdEOZThQP5gmE%2bQqFC%2b63so2JAJDOwZSQfHRSODUrGKb78ZtjV5%2fnfvQuD6eARBQJoMfccTv7dm7lSbHi9gFJ3zoRUUxA1bXicT8i&tick=635424957490 from (759, 32) to (801, 10)] message=[TypeError: Cannot read property "timing" from undefined (script in http://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?asq=bs17wTmKLORqTfZUfjFABspANSEBVRKOEdlgdhMDKXq9AiQm2HGc5Vnb5H3nW0yov9IqRFL8sIj4SMPGpGP7KXtq7DnKdEOZThQP5gmE%2bQqFC%2b63so2JAJDOwZSQfHRSODUrGKb78ZtjV5%2fnfvQuD6eARBQJoMfccTv7dm7lSbHi9gFJ3zoRUUxA1bXicT8i&tick=635424957490 from (759, 32) to (801, 10)#767)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "timing" from undefined (script in http://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?asq=bs17wTmKLORqTfZUfjFABspANSEBVRKOEdlgdhMDKXq9AiQm2HGc5Vnb5H3nW0yov9IqRFL8sIj4SMPGpGP7KXtq7DnKdEOZThQP5gmE%2bQqFC%2b63so2JAJDOwZSQfHRSODUrGKb78ZtjV5%2fnfvQuD6eARBQJoMfccTv7dm7lSbHi9gFJ3zoRUUxA1bXicT8i&tick=635424957490 from (759, 32) to (801, 10)#767)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:705)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:637)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:612)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:1001)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:179)
at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:239)
at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:824)
at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:748)
at com.gargoylesoftware.htmlunit.html.HtmlElement$1.run(HtmlElement.java:920)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:925)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1298)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:290)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:475)
at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2074)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:733)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:820)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1325)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1268)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1216)
at com.dhanraj.Main.main(Main.java:57)
JavaScript的:
function (a) {
return typeof f != "undefined" && (!a || f.event.triggered !== a.type) ? f.event.dispatch.apply(i.elem, arguments) : b;
}
任何人都可以帮助我或纠正我吗?提前谢谢
答案 0 :(得分:0)
现在工作正常。刚刚添加了一行代码
webClient.getOptions()setJavaScriptEnabled(假);
它就像一个魅力!!!!