无法使用htmlunit在java中抓取网站

时间:2015-02-02 08:28:14

标签: java web-scraping htmlunit

我是Scrapping的新手,我正在尝试使用Htmlunit抓取一个网站。我已登录该网站,但在浏览第2页后,我将从该网站注销。 https://hotelservice.hrs.com/portal/?lang=en

代码

import com.gargoylesoftware.htmlunit.AjaxController;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlPasswordInput;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
import com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine;


public class HrsHtmlUnit {

public static void main(String[] args) {
    HrsHtmlUnit unitTest = null;
    try {
        unitTest = new HrsHtmlUnit();
        unitTest.homePage("601296", "xcvgxcvxcv", "xvxcvxc");
    } catch (Exception e) {
        e.printStackTrace();
    }
}

public void homePage(String uName, String pwd, String cate) {
    try {



        final WebClient webClient = new WebClient(BrowserVersion.CHROME);

        HtmlPage page = webClient.getPage("https://hotelservice.hrs.com/portal/?lang=en");


        final String pageAsXml = page.asXml();


        HtmlForm form = page.getFormByName("loginForm");


        HtmlTextInput hotelNumber = form.getInputByName("loginForm:hotelKey");
        HtmlPasswordInput password = form.getInputByName("loginForm:password");
        HtmlTextInput userName = form.getInputByName("loginForm:username");


        HtmlSubmitInput submit = form.getInputByName("loginForm:submitButton");


        hotelNumber.setValueAttribute(uName);
        password.setValueAttribute(pwd);
        userName.setValueAttribute(cate);

        HtmlPage adminPage = submit.click();

        HtmlAnchor anc = adminPage.getHtmlElementById("allApplications:division:1:application:0:hsv");



        HtmlPage click = anc.click();
        System.out.println("Title :: " + click.getTitleText());

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

我已登录该网站,但在单击酒店自助管理工具(HSA)链接的登录页面中。我将再次登录。

请帮助。!!

0 个答案:

没有答案