使用htmlunit检索结果页面

时间:2013-03-27 14:22:03

标签: java web-crawler htmlunit

我正在尝试使用HtmlUnit模拟搜索网站旅行票。目标是获得搜索的页面结果。我的代码返回搜索页面(等待结果...)

以下是代码:

public class TestHtmlUnit {

public static void main(String[] args) throws Exception {

    // Create and initialize WebClient object
    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_10);
    webClient.setThrowExceptionOnScriptError(false);
    webClient.setRefreshHandler(new RefreshHandler() {
        public void handleRefresh(Page page, URL url, int arg) throws IOException {
            System.out.println("handleRefresh");
        }

    });

    // visit Yahoo Mail login page and get the Form object
    HtmlPage page = (HtmlPage) webClient.getPage("http://www.voyages-sncf.com/");
    HtmlForm form = page.getFormByName("TrainTypeForm");

    // Enter login and passwd of 
    form.getInputByName("origin_city").setValueAttribute("paris");
    form.getInputByName("destination_city").setValueAttribute("marseille");
    form.getInputByName("outward_date").setValueAttribute("28/03/2013");


    // Click "Sign In" button/link
    page = (HtmlPage) form.getInputByValue("Rechercher").click();





    // Print the newMessageCount to screen
    //System.out.println("newMessageCount = " + newMessageCount);

   // System.out.println(page.asHTML());                    
    System.out.println(page.asText());
    }
}

1 个答案:

答案 0 :(得分:1)

点击

后,您应该等待页面加载

试试这个

webClient.waitForBackgroundJavaScript(1000);

 webClient.setAjaxController(new NicelyResynchronizingAjaxController());
            webClient.setAjaxController(new AjaxController(){
                @Override
                public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
               {
                    return true;
                }
            });

JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager();
            while (manager.getJobCount() > 0) {
                Thread.sleep(100);
            }