我有几个应用程序,我从webview上收到一个登录网页的cookie,并直接用jsoup重复使用它来抓取内容,如下所示:
final String url = "https://need.authentication.com";
// -- Android Cookie part here --
CookieSyncManager.getInstance().sync();
CookieManager cm = CookieManager.getInstance();
String cookie = cm.getCookie(url); // returns cookie for url
// ...
// -- JSoup part here --
// Jsoup uses cookies as "name/value pairs"
doc = Jsoup.connect("https://need.authentication.com").cookie(url, cookie).get();
这不适用于所有网址。接收cookie绝不是问题,但jsoup有时不能使用cookie。
我现在要做的就是将这个现有的cookie添加到httpclient或其他不推荐的选项以下载页面,然后将其交给jsoup进行进一步抓取,因为我觉得jsoup没有正确处理cookie。
Jsoup调试仅显示:
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:512)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
有关更多信息,cookie看起来像这样:
__indbg=481084b1-3d71-461a-b6e1-93d;
__gads=ID=0058c3ccb75f72f2:T=1458162316:S=ALN;
INSESSION=ct8njokkc4uadlmjjg8a3gvp1ng4m0acvvveea66bkpmn32fvc;
INEP=%5B%22nw01_101_B_0%22%2C%22mp04_103_B_0%22%2C%22in01_244;
WASLOGGEDIN=1;
INREMEMBERME=cHlMQlRVbzVOUkhJTU5kU25tMlplZ2RvNWxvbkN4TmdsR0RBVWp6Qkp6dkpONW1Tb2o3MH;
INBP=mobile;
__utmt=1;
__utma=68558281.1607821733.1458162272.1458240416.1;
__utmb=68558281.1.10.1458327475;
__utmc=68558281;
__utmz=68558281.1458162272.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
__utmv=68558281.|2=community=sanj=1^3=loggedIn=1=1^5=experiment=%7Cst01_267_B_2%7Cmt01
答案 0 :(得分:2)
cookie(name, value)
期望Cookie的名称不是其相关网址。
请改为尝试:
doc = Jsoup //
.connect("https://need.authentication.com") //
.header("Cookie", cookie) //
.get();