如何使用htmlunit从ajax / javascript网站中提取数据?我试图提取货运历史

时间:2015-09-19 11:21:12

标签: java htmlunit

我正尝试从此页面http://www.aramex.com/express/track-results.aspx?q=aWQ9MzU2NDQ4MTQ3Jg%3d%3d-ULINyZQtKrw%3d中提取货件历史记录。

这是我的代码:

public void aramexTracking() {
    WebClient webClient  = new WebClient(BrowserVersion.CHROME);
    String trackingId = "9181468833";       
    HtmlPage page1, page2;

            try {

            page1 = webClient.getPage("http://www.aramex.com/express/track.aspx");


                                     webClient.getOptions().setThrowExceptionOnScriptError(false);

                               webClient.getOptions().setPrintContentOnFailingStatusCode(false);


           webClient.setCssErrorHandler(new com.gargoylesoftware.htmlunit.SilentCssErrorHandler());



                //Submitting form on Tracking Page
                HtmlForm form = page1.getFormByName("aspnetForm");

                HtmlButtonInput button =  form.getInputByName("ctl00$ctl00$MainContent$InnerMainContent$btnGo");

                HtmlTextArea textArea = form.getTextAreaByName("ShipmentNumber");
                textArea.setText(trackingId);

                page2 = button.click();

                List<?> list = page2.getByXPath("//div[@id='dvSearchResults']/text()");



            } catch (FailingHttpStatusCodeException | IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }       
}

1 个答案:

答案 0 :(得分:0)

请发布有效的跟踪号码。我尝试随机一个 - 3974937493并想建议另一个xpath:

HtmlTable table = (HtmlTable) page2.getFirstByXPath("//div[@id='MainContent']//table//table");

之后,像往常一样解析表的行

if (table.getCellAt(1,0) != null) System.out.println(table.getCellAt(1,0).asText();