Java中的HTMLUnit - 如何导航到GridView页面

时间:2016-09-24 07:32:36

标签: javascript java jsoup htmlunit

我正在尝试使用java创建一个应用程序来读取网页中的信息。为了从我想要的元素下载信息我使用了jsoup(优秀的工具!),但我想加载网页中使用的GridView的下一页。 该页面是.aspx页面,第二页的链接是这样的:

public class SettingsActivity extends AppCompatActivity {

    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        getFragmentManager()
            .beginTransaction()
            .replace(R.id.content_frame_settings, new SettingsPreferenceFragment1())
            .addToBackStack("SettingsPreferenceFragment1")
            .commit();
    }

    @Override
    public void onBackPressed() {
        if ( getFragmentManager().getBackStackEntryCount() > 0){
            getFragmentManager().popBackStack();
        } else {
            super.onBackPressed();
        }
    }

    public static class SettingsPreferenceFragment1 extends PreferenceFragment {
        ...
        ...
        public void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            addPreferencesFromResource(R.xml.pref_settings1);
            ...
            ...
        }

        @Override
            OnClick(View view){
                if(view == btnNext){
                    getFragmentManager()
                        .beginTransaction()
                        .replace(R.id.content_frame_settings, new SettingsPreferenceFragment2())
                        .addToBackStack("SettingsPreferenceFragment2")
                        .commit();
                }
            }
    }

    public static class SettingsPreferenceFragment2 extends PreferenceFragment {
        public void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            addPreferencesFromResource(R.xml.pref_settings2);
        }
    }
}

以下是使用的javascript函数:

 <a href="javascript:__doPostBack('GridView1','Page$2')" style="color:White;">2</a>

目前,我正在尝试使用HTMLUnit,但看起来不起作用。以下是我正在使用的代码:

    //<![CDATA[
    var theForm = document.forms['form1'];
    if (!theForm) {
        theForm = document.form1;
    }
    function __doPostBack(eventTarget, eventArgument) {
        if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
            theForm.__EVENTTARGET.value = eventTarget;
            theForm.__EVENTARGUMENT.value = eventArgument;
            theForm.submit();
        }
    }
    //]]>

当我使用与第1页相同的代码阅读页面时,出现以下错误:

 final WebClient webClient = new WebClient(BrowserVersion.CHROME);
            HtmlPage page = webClient.getPage("http://www.webpage.com/Main.aspx");          
            HtmlAnchor anchor = null;
            List<HtmlAnchor> anchors = page.getAnchors();
            for (int j = 0; j < anchors.size(); j++)
            {
                anchor = anchors.get(j);
                String sAnchor = anchor.asText();               
                String sAnchorxml = anchor.asXml();         
                if (sAnchor.equals("2"))
                {
                    HtmlPage page2 = anchor.click();
                    doc = Jsoup.parse(page2.asXml());
                    .....

我认为我的错误在Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(Unknown Source) at java.util.ArrayList.get(Unknown Source) at test.advacus.com.MainProgram.main(MainProgram.java:148) 行。只是为了澄清一旦你点击下一页的网址没有改变,只有GridView中的信息,所以我无法使用新的网址进行解析。

任何额外的帮助或任何建议的工具,而不是与jsoup合作更好的HTMLUnit真的会有所帮助! 提前谢谢!

已修改其他信息: 看起来像是'Jsoup.parse()'不起作用...我修改了代码,newPage主体看起来像包含与第一页相同的信息:

click()

1 个答案:

答案 0 :(得分:2)

检查锚点 - 正如您已经指出的那样 - doPostBack被调用,因此调用javascript调用要简单得多,而不是先抓住锚点并调用点击它。

示例代码

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
final WebClient webClient = new WebClient(BrowserVersion.CHROME);

webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(10000);

try {
    HtmlPage htmlPage = webClient.getPage("http://qatarsale.com/EnMain.aspx");

    Document doc = Jsoup.parse(htmlPage.asXml());

    System.out.println(doc.select("[id$=Label10]").text());

    ScriptResult result = htmlPage.executeJavaScript("__doPostBack('GridView1','Page$2')");
    htmlPage = (HtmlPage)result.getNewPage();

    Thread.sleep(3000); // delay needed for lazy loading, there might be something cleaner

    doc = Jsoup.parse(((HtmlPage)htmlPage).asXml());

    System.out.println(doc.select("[id$=Label10]").text());

} catch (Exception e) {
    e.printStackTrace();
} finally {
    webClient.close();
}

<强>输出

Toyota Porsche Mercedes-Benz Cadillac Jeep Porsche Porsche Nissan Mitsubishi BMW Porsche Ford Mitsubishi Toyota Nissan Land Rover Nissan Mercedes-Benz Nissan Nissan Toyota Toyota Porsche Mitsubishi Mitsubishi Nissan Nissan Mercedes-Benz Nissan Jeep Mercedes-Benz Lexus BMW Lexus
BMW Lexus Toyota Toyota Lexus Nissan Mercedes-Benz Mercedes-Benz Ferrari Dodge BMW Mercedes-Benz Aston Martin Mitsubishi Suzuki Maserati Porsche Maserati Land Rover Chevrolet Land Rover GMC Toyota Porsche Lexus Land Rover GMC Mercedes-Benz Toyota Lexus Toyota Lexus Toyota Nissan