登录&从网页Jsoup中提取数据

时间:2015-06-19 01:39:58

标签: java jsoup webpage extraction

因此,我尝试登录网站,然后将该元素从网站“http://www.website.com”中的其他网页中删除

public class TicketingJsoup {

public static void main (String [] args) throws IOException{
       try {
            String url = "www.website.com";
            Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();

            response = Jsoup.connect(url)
                    .cookies(response.cookies())
                    .data("Action", "Login")
                    .data("User", "myuser")
                    .data("Password", "mypass")
                    .method(Connection.Method.POST)
                    .followRedirects(true)
                    .execute();

            Document document = response.parse();
            System.out.println(document);

            Map<String, String> loginCookies = response.cookies();


               Elements ticketNumber = doc.select("body > div.MainBox.ARIARoleMain.UseArticleColors > div.Headline > h1");
              System.out.println(ticketNumber);   
              System.out.println("TEST");

        } catch (IOException e) {
            e.printStackTrace();
        }
}

}

堆栈跟踪

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:516)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:534)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at TicketingJsoup.main(TicketingJsoup.java:25)

我做错了什么?我正在绞尽脑汁试图弄清楚如何做到这一点,我还尝试了许多其他的东西,但它们导致了死胡同。我认为这是最好的方法,但如果我错了,请告诉我一个更好的方法。

感谢。

1 个答案:

答案 0 :(得分:1)

试试这段代码:

   try {
        String url = "www.website.com";
        Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();

        response = Jsoup.connect(url)
                .cookies(response.cookies())
                .data("Action", "Login")
                .data("User", "your_login")
                .data("Password", "your_password")
                .method(Connection.Method.POST)
                .followRedirects(true)
                .execute();

        Document document = response.parse();
        System.out.println(document);

    } catch (IOException e) {
        e.printStackTrace();
    }