Jsoup中两个具有相同名称但值不同的Cookie

时间:2018-10-22 15:41:27

标签: java servlets web-crawler jsoup java-api

这是我的代码:

            String url          = "http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&";
            String articleURL   = "https://www.metalbulletin.com/Article/3838710/Home/CHINA-REBAR-Domestic-prices-recover-after-trading-pick-up.html";

            Connection.Response loginForm = Jsoup.connect(url)
            .method(Connection.Method.GET)
            .execute();

            Document welcomePage    = loginForm.parse();                
            Element formElement     = welcomePage.body().getElementsByTag("form").get(0);
            String formAction       = formElement.attr("action");

            Elements input = welcomePage.select("input[name=idsrv.xsrf]");
            String securityTokenValue =input.attr("value");         

            Connection.Response mainPage = Jsoup.connect("https://account.metalbulletin.com"+formAction)
            .data("idsrv.xsrf", securityTokenValue)
            .data("username", "test@tempmail.com")
            .data("password", "p@ssw0rd")
            .cookies(loginForm.cookies())
            .method(Connection.Method.POST)
            .execute();

            Map<String, String> cookies = mainPage.cookies();

            System.out.println("\n\nloginForm.cookies()==>\n"+loginForm.cookies());
            System.out.println("\n\nmainPage.cookies()==>\n"+mainPage.cookies());

                Document articlePage    = Jsoup.connect(articleURL).cookies(cookies).get();
                Element article         = articlePage.getElementById("article-body");   
                Elements lead1          = article.getElementsByClass("articleContainer");       
                System.out.println("\n\nNews Article==>\n"+lead1);

输出显示:

  1. Jsoup正在连接的状态200,
  2. Jsoup获取Cookie

但是代码无法阅读新闻全文

有人在我犯错的地方可以帮助我吗? 请注意:登录名/密码是真实的,您可以登录以测试我的代码

*********更新*********
我注意到上面的网站正在生成两个相同的名称但不同的cookie:

loginForm.cookies()==>
{emlbaz1=c61c6ce278bb1f7deb4815e59d4af24a, idsrv.xsrf=iJiP2cNExN8G52A4mDDGEmsQU3NuE2dHV4QIMYT9f1-7Lh_7om59Bkxx7xdYkJLW91dW6Tm1GBO66iP0W1_2Hsw-x0UYaAmaQANOPOOGQVc, SignInMessage.480d6d22b06d45a3c285fd357988d1ce=4btvKnOHpXeDRvzc5qH1aH8QX2UoO5ag0DPpE98J99abfOuxv6IBr-ygLq7NdGpHYjiuJzJWIeSr7MM8VUHynWFTVgIYJbNxPy1rJHD2guJKbCQR63Olh-w5gqeIuC4MbwCm4UEcX4KrMnEclU5Z7L-paPJgRYjmHKq2wrzba6tCsosZamFaMaRFN7Hkq7Vv2cYEKDCVFaOEyk_1DZXlshlNEDOME1WFgjCQ5R8065AJmHik5OdMOIH4ji_HeYVwupa1jc4KJO1kv7GnLOqZzhIJxXjV5N0dyQ1HYmBD4N8YLS6bvIp5guhxX7IhBASiqnpaMB_ZH8BazaOJRW6-vFlU0Q3fItW6_h2UkQwb9QBh4Ig9K3hclncY4MLxtJJGTVAildPaandRD0JNNeDMs1n2GQ78oHKqGxXpLTOa6dauN-5gm8bBKMv0mvb90G8SjmMIblW-uFWEJ4k13AKtTm6cZrQXB3vmsS6yZpoyAUdxx4E5YOP_pbJmmgHRWq8Tg5f496gpwWOirKEYsF-_YCuJP9cMkHsAHeQmYn3pKPjuug64fJaIGzKqgpByjwKYwQgPQWCd9UQ21uDWJVA1FCoHdllYbrjoc_5WvR9CttL0vMInA8AnxGcUUSmmmN5yi-KXttiUryYZgcw_gwbpFQ, clientId=mb, ASP.NET_SessionId=bz0ezutsr1mizdqhunbeans0, EMWF_MBR.ASPXAUTH=, titan-nonce=05c9ea8c-145a-4d08-a56d-9ffa277d39fd, visid_incap_884336=lcF2GyNEQ/2MAa/2Yh70X+poz1sAAAAAQUIPAAAAAABJrURuQ31kxFwvtR4xuU55, incap_ses_532_884336=V4UEIbvIzBFqc0OXJwxiB+poz1sAAAAAaAJN/2THMp/WElZBXl1aaw==}


mainPage.cookies()==>
{emlbaz1=c61c6ce278bb1f7deb4815e59d4af24a, idsvr.clients=WyJtYiJd, SignInMessage.480d6d22b06d45a3c285fd357988d1ce=., idsvr.session=af9f1a6a46a5133d9751ad3ff90eb23e, idsrv.partial=, idsrv.external=, idsrv=jDTFFRmvLNgO3GEqKKI1U68aI7_M5m1BmN7qsMsOi0WzAvAiz2GCsVFPuSzcPpeeCZ-UUUbP8A7T3A21bRDVUoSpEnNlkqeOhsVnmykf3sZvvfnRG-rPAFQDp-KObHopQ5cPZkYKMnEguStM6Mfvvu-XBM5fj8Z86B_rZp4YyaVEa_k2tpGFdT8PjRB64zTHpdPmgyG0GMBayV1AhvPVSZCnO41vZlwTCm_p8B5OJFmRnCqimRhwSIi6dUA9P8D11G01Gn7aYxqAr8Q125-bqnoMORC1ESYgy2YwjxGRd2tletrqSrsd3VqwEpAlOuSiECZJuvCGmW0VQt2ErXcYKJ-me-uXm8FAWVj3oml7iT4mMgysKcFUtOYNAbiQunuz6_-3udZgiCsG8cBZLRXu1Nvs_We7Y0rqgbm1Lmn4aduf4XC8bV3IF-MHY_pObX-1WtN7xiZYZOuqyI8leZWSIoPLBOHbSR5oyUKxGmGrZybFQm4pkpBy9OkYFUx15Qh5R6nClwuqCLu6iX6KXB7jySu4LgvYntaEP8jkJSxiGFo}

现在出现了一个问题,即必须使用哪些cookie才能访问网页的其他URL?

更新2
我检查了以邮寄方式发送的表单,并找到了我正在使用的相同参数

Element testForm = document.body().getElementsByTag("form").get(0);
String formAction = testForm.attr("action");
System.out.println(formAction);
Elements inputElements = testForm.getElementsByTag("input"); 

    for (Element inputElement : inputElements) {
        String key = inputElement.attr("name");
        String value = inputElement.attr("value");
        System.out.println("Parameter Name: " + key);
        System.out.println("Parameter Value: " + value);
        System.out.println("");
    }           

Jsoup也将状态200表示为“ OK”,但是也许是在处理登录凭据+ Cookie来抓取网站的其他页面时出现问题。

事实证明,Jsoup登录到该网站,但登录凭据不适用于该页面的其他URL。

更新3
我遇到了一个确切的问题,该站点正在使用两个具有不同值的相同名称的cookie。 Jsoup支持吗?

最诚挚的问候

0 个答案:

没有答案