我正在尝试将网站与JSoup连接,但无法正常工作。
这是我的代码:
Connection.Response res = Jsoup.connect("http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&")
.data("username", "94mkr@mail4gmail.com", "password", "jakdjique&THFI#")
.method(Method.POST)
.execute();
Map<String, String> loginCookies = res.cookies();
Document doc = Jsoup.connect("https://www.metalbulletin.com/Article/3838710/Home/CHINA-REBAR-Domestic-prices-recover-after-trading-pick-up.html")
.cookies(loginCookies)
.get();
Element article = doc.getElementById("article-body");
Elements heading = article.getElementsByTag("h1");
Elements lead = article.getElementsByClass("lead");
Elements lead1 = article.getElementsByClass("articleContainer");
System.out.println(lead);
System.out.println(lead1);
我刚刚输入了临时登录名/密码,以便您进行检查
我注意到http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&
会生成一个新链接,例如:
https://account.metalbulletin.com/identity/login?signin=fab48076d8a4f74f52565dd6a9f47e65
我尝试了很多,但仍然无法访问该网站
更新
我将代码优化如下:
Connection.Response response = Jsoup.connect("http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&")
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect("http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&")
.data("username", "94mkr@mail4gmail.com", "password", "jakdjique&THFI#")
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
Map<String, String> cookies = new HashMap<String, String>();
Document doc = Jsoup.connect("https://www.metalbulletin.com/Article/3838710/Home/CHINA-REBAR-Domestic-prices-recover-after-trading-pick-up.html")
.cookies(response.cookies())
.get();
System.out.println(response.statusMessage()+"\n"+response.statusCode());
我编译时的输出是:
OK
200
但是当我继续进行数据的下一部分提取时:
Element article = doc.getElementById("article-body");
Elements lead = article.getElementsByClass("lead");
Elements lead1 = article.getElementsByClass("articleContainer");
System.out.println(lead);
System.out.println(lead1);
然后放弃并显示显示给未登录用户的数据
答案 0 :(得分:0)
假设您要使用给定的凭据浏览网站,建议您从普通浏览器登录。复制网站生成的Cookie,并将其添加到CookieStore实例。
BasicCookieStore cookieStore = new BasicCookieStore();
BasicClientCookie cookie1 = new BasicClientCookie("__gads", "ID=958b183c83ede6e8:T=1539776783:S=ALNI_MbFRRpTafZvTiJAjKmTB9oBQelWWw");
cookie1 .setDomain(".metalbulletin.com");
cookie1 .setPath("/");
BasicClientCookie cookie2 = new BasicClientCookie("__utma", "167598498.350699797.1539776871.1539776871.1539776871.1");
cookie2 .setDomain(".metalbulletin.com");
cookie2 .setPath("/");
....
cookieStore.addCookie(cookie1);
cookieStore.addCookie(cookie2);
....
然后在创建连接池时使用cookiestore。
PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager();
connManager.setMaxTotal(256);
connManager.setDefaultMaxPerRoute(64);
ConnectionKeepAliveStrategy myStrategy = new DefaultConnectionKeepAliveStrategy();
CloseableHttpClient closeableHttpClient = HttpClientBuilder.create()
.setDefaultCookieStore(getCookieStore())
.setDefaultRequestConfig(RequestConfig.custom()
.setCookieSpec(CookieSpecs.STANDARD).build())
.setConnectionManager(connManager).setKeepAliveStrategy(myStrategy).build();
因为无论如何,如果您要登录网站。然后,您需要一种处理Cookie和令牌的方法。这样,cookiestore将处理cookie。您只需使用http客户端调用网站,然后使用jsoup解析返回的html。
修改: 这些是您需要遵循的步骤:
祝你好运。