如何获得正确的页面?

时间:2011-09-28 21:33:10

标签: java htmlunit

我使用htmlunit库来废弃Yellowpages.com网站。我想在其中输入搜索词,然后单击“查找”按钮。但之后我得到2页:http://www.yellowpages.com/ny/sport?g=NY&q=Sporthttps://dealoftheday.yellowpages.com/join?ic=deal_pop-under_signup-v- 第一个是我想要的,第二个是弹出窗口。 我有这段代码:

public void getPage() throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        WebClient webClient = new WebClient();
        page = webClient.getPage("http://www.yellowpages.com");
        HtmlTextInput searchInput = (HtmlTextInput) page.getElementById("search-terms");
        searchInput.setText("Law");

        HtmlSubmitInput button = (HtmlSubmitInput) page.getElementById("search-submit");
        page = button.click();
        System.out.println(page.getTitleText());

    }

此代码打印:

  

YP.com上的一天交易 - 加入

但我想打印首页标题,即:

  

NY Sport |纽约体育 - YP.com

如何获得第一页?

编辑:添加行webClient.setPopupBlockerEnabled(true)后,我收到了很多警告,之后我得到了例外。这是控制台输出的一部分:

  

线程“main”中的异常======= EXCEPTION START ======== EcmaError:   lineNumber = [56] column = [0] lineSource = [null] name = [TypeError]   SOURCENAME = [http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455]   message = [TypeError:无法调用null的方法“blur”   (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56)]   com.gargoylesoftware.htmlunit.ScriptException:TypeError:无法调用   方法“模糊”的null   (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56)at at   com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine $ HtmlUnitContextAction.run(JavaScriptEngine.java:601)     在   net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)     在   net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)     在   com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:531)     在   com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:906)     在   com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:164)     在   com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:223)     在   com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:686)     在   com.gargoylesoftware.htmlunit.html.HtmlElement $ 2.run(HtmlElement.java:885)     在   net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)     在   net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)     在   com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:890)     在   com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:865)     在   com.gargoylesoftware.htmlunit.html.HtmlForm.submit(HtmlForm.java:108)     在   com.gargoylesoftware.htmlunit.html.HtmlSubmitInput.doClickAction(HtmlSubmitInput.java:77)     在   com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1263)     在   com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1214)     在   com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1177)     在YellowPages.getPage(YellowPages.java:39)at   YellowPages.main(YellowPages.java:22)引起:   net.sourceforge.htmlunit.corejs.javascript.EcmaError:TypeError:   无法调用null的方法“模糊”   (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56)at at   net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3772)     在   net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3750)     在   net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3778)

3 个答案:

答案 0 :(得分:2)

听起来像是一个JS错误。禁用JS:

webClient.setJavaScriptEnabled(false);

那怎么样?

webClient.setThrowExceptionOnScriptError(false);

如果使用HtmlUnit 2.11 +

,请添加webClient.getOptions()

答案 1 :(得分:1)

你试过吗

webClient.setPopupBlockerEnabled(true)

然后你应该只得到一页

答案 2 :(得分:1)

未经测试,但我认为您可以遍历WebClient的顶级窗口(使用WebClient.getTopLevelWindows()),调用getEnclosedPage()并测试页面的标题文本是否是您要查找的页面