这是我的代码:
String url = "http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&";
String articleURL = "https://www.metalbulletin.com/Article/3838710/Home/CHINA-REBAR-Domestic-prices-recover-after-trading-pick-up.html";
Connection.Response loginForm = Jsoup.connect(url)
.method(Connection.Method.GET)
.execute();
Document welcomePage = loginForm.parse();
Element formElement = welcomePage.body().getElementsByTag("form").get(0);
String formAction = formElement.attr("action");
Elements input = welcomePage.select("input[name=idsrv.xsrf]");
String securityTokenValue =input.attr("value");
Connection.Response mainPage = Jsoup.connect("https://account.metalbulletin.com"+formAction)
.data("idsrv.xsrf", securityTokenValue)
.data("username", "test@tempmail.com")
.data("password", "p@ssw0rd")
.cookies(loginForm.cookies())
.method(Connection.Method.POST)
.execute();
Map<String, String> cookies = mainPage.cookies();
System.out.println("\n\nloginForm.cookies()==>\n"+loginForm.cookies());
System.out.println("\n\nmainPage.cookies()==>\n"+mainPage.cookies());
Document articlePage = Jsoup.connect(articleURL).cookies(cookies).get();
Element article = articlePage.getElementById("article-body");
Elements lead1 = article.getElementsByClass("articleContainer");
System.out.println("\n\nNews Article==>\n"+lead1);
输出显示:
但是代码无法阅读新闻全文
有人在我犯错的地方可以帮助我吗? 请注意:登录名/密码是真实的,您可以登录以测试我的代码
*********更新*********
我注意到上面的网站正在生成两个相同的名称但不同的cookie:
loginForm.cookies()==>
{emlbaz1=c61c6ce278bb1f7deb4815e59d4af24a, idsrv.xsrf=iJiP2cNExN8G52A4mDDGEmsQU3NuE2dHV4QIMYT9f1-7Lh_7om59Bkxx7xdYkJLW91dW6Tm1GBO66iP0W1_2Hsw-x0UYaAmaQANOPOOGQVc, SignInMessage.480d6d22b06d45a3c285fd357988d1ce=4btvKnOHpXeDRvzc5qH1aH8QX2UoO5ag0DPpE98J99abfOuxv6IBr-ygLq7NdGpHYjiuJzJWIeSr7MM8VUHynWFTVgIYJbNxPy1rJHD2guJKbCQR63Olh-w5gqeIuC4MbwCm4UEcX4KrMnEclU5Z7L-paPJgRYjmHKq2wrzba6tCsosZamFaMaRFN7Hkq7Vv2cYEKDCVFaOEyk_1DZXlshlNEDOME1WFgjCQ5R8065AJmHik5OdMOIH4ji_HeYVwupa1jc4KJO1kv7GnLOqZzhIJxXjV5N0dyQ1HYmBD4N8YLS6bvIp5guhxX7IhBASiqnpaMB_ZH8BazaOJRW6-vFlU0Q3fItW6_h2UkQwb9QBh4Ig9K3hclncY4MLxtJJGTVAildPaandRD0JNNeDMs1n2GQ78oHKqGxXpLTOa6dauN-5gm8bBKMv0mvb90G8SjmMIblW-uFWEJ4k13AKtTm6cZrQXB3vmsS6yZpoyAUdxx4E5YOP_pbJmmgHRWq8Tg5f496gpwWOirKEYsF-_YCuJP9cMkHsAHeQmYn3pKPjuug64fJaIGzKqgpByjwKYwQgPQWCd9UQ21uDWJVA1FCoHdllYbrjoc_5WvR9CttL0vMInA8AnxGcUUSmmmN5yi-KXttiUryYZgcw_gwbpFQ, clientId=mb, ASP.NET_SessionId=bz0ezutsr1mizdqhunbeans0, EMWF_MBR.ASPXAUTH=, titan-nonce=05c9ea8c-145a-4d08-a56d-9ffa277d39fd, visid_incap_884336=lcF2GyNEQ/2MAa/2Yh70X+poz1sAAAAAQUIPAAAAAABJrURuQ31kxFwvtR4xuU55, incap_ses_532_884336=V4UEIbvIzBFqc0OXJwxiB+poz1sAAAAAaAJN/2THMp/WElZBXl1aaw==}
mainPage.cookies()==>
{emlbaz1=c61c6ce278bb1f7deb4815e59d4af24a, idsvr.clients=WyJtYiJd, SignInMessage.480d6d22b06d45a3c285fd357988d1ce=., idsvr.session=af9f1a6a46a5133d9751ad3ff90eb23e, idsrv.partial=, idsrv.external=, idsrv=jDTFFRmvLNgO3GEqKKI1U68aI7_M5m1BmN7qsMsOi0WzAvAiz2GCsVFPuSzcPpeeCZ-UUUbP8A7T3A21bRDVUoSpEnNlkqeOhsVnmykf3sZvvfnRG-rPAFQDp-KObHopQ5cPZkYKMnEguStM6Mfvvu-XBM5fj8Z86B_rZp4YyaVEa_k2tpGFdT8PjRB64zTHpdPmgyG0GMBayV1AhvPVSZCnO41vZlwTCm_p8B5OJFmRnCqimRhwSIi6dUA9P8D11G01Gn7aYxqAr8Q125-bqnoMORC1ESYgy2YwjxGRd2tletrqSrsd3VqwEpAlOuSiECZJuvCGmW0VQt2ErXcYKJ-me-uXm8FAWVj3oml7iT4mMgysKcFUtOYNAbiQunuz6_-3udZgiCsG8cBZLRXu1Nvs_We7Y0rqgbm1Lmn4aduf4XC8bV3IF-MHY_pObX-1WtN7xiZYZOuqyI8leZWSIoPLBOHbSR5oyUKxGmGrZybFQm4pkpBy9OkYFUx15Qh5R6nClwuqCLu6iX6KXB7jySu4LgvYntaEP8jkJSxiGFo}
现在出现了一个问题,即必须使用哪些cookie才能访问网页的其他URL?
更新2
我检查了以邮寄方式发送的表单,并找到了我正在使用的相同参数
Element testForm = document.body().getElementsByTag("form").get(0);
String formAction = testForm.attr("action");
System.out.println(formAction);
Elements inputElements = testForm.getElementsByTag("input");
for (Element inputElement : inputElements) {
String key = inputElement.attr("name");
String value = inputElement.attr("value");
System.out.println("Parameter Name: " + key);
System.out.println("Parameter Value: " + value);
System.out.println("");
}
Jsoup也将状态200表示为“ OK”,但是也许是在处理登录凭据+ Cookie来抓取网站的其他页面时出现问题。
事实证明,Jsoup登录到该网站,但登录凭据不适用于该页面的其他URL。
更新3
我遇到了一个确切的问题,该站点正在使用两个具有不同值的相同名称的cookie。 Jsoup支持吗?
最诚挚的问候