如何使用Jsoup登录网站进行Web抓取

时间:2019-04-18 12:26:43

标签: java web-scraping jsoup

我无法使用jsoup登录网站。

我几乎尝试了所有事情。阅读不同的博客/教程,但没有帮助。这是我到目前为止编写的代码


Connection.Response res = Jsoup.connect(url)
                               .method(Method.GET)
                               .execute();

Connection.Response login = Jsoup.connect(url) 
        .data("username", uname, "password", pass, "anchor", "")
        .cookies(res.cookies())
        .method(Method.GET)
        .execute();

Document doc = Jsoup.connect(url)
      .cookies(login.cookies())
      .get();
       String title = doc.title();

                System.out.println("title is: " + title);  

1 个答案:

答案 0 :(得分:0)

我在Jsoup的登录过程中遇到问题,我发现登录有两个页面,第一页(GET):https://www.dgr.gub.uy/sr/loginStart.jsf第二页(Post):https://www.dgr.gub.uy/sr/j_security_check

public Document getDocumentSix(String defaultUrl, String host) {
        Connection.Response response = null;
        Document document = null;
    try {
        Connection connection = Jsoup.connect(defaultUrl)
                .headers(this.getRequestHeaderBefore(host))
                .method(Connection.Method.GET);
        response = connection.execute();

    } catch(Exception e) {
        e.printStackTrace();
    }

    try {
        Connection.Response responseTwo = null;
        Map<String, String> cookies = response.cookies();

        responseTwo = Jsoup.connect("https://www.dgr.gub.uy/sr/j_security_check")
                .headers(this.getRequestHeaderAfter(cookies.get("JSESSIONID")))
                .data("j_username", "6551990")
                .data("j_password", "VNZANU")
                .cookies(cookies)
                .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36")
                .method(Connection.Method.POST)
                .execute();

        document = responseTwo.parse();
    } catch (Exception e) {
        e.printStackTrace();
    }

    return document;
}