我需要转到网站上的不同页面并收集信息。我不确定如何处理Cookie。如果使用chrome调试器控制台(F12)查看“网络”活动,则可以看到正在发送的请求属性和cookie。如果我专门为其中一个页面添加cookie(请参阅注释掉的con.setRequestProperty(“ Cookie”,...),则信息已成功检索。
URL url = new URL(urlStr);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("Host", county +"." +referer +".com");
con.setRequestProperty("Connection", "keep-alive");
con.setRequestProperty("Accept", "application/json, text/javascript, */*; q=0.01");
con.setRequestProperty("X-Requested-With", "XMLHttpRequest");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36");
con.setRequestProperty("Origin", "http://evil.com/");
con.setRequestProperty("Referer", "https://" +county +"." +referer +".com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=" +df.format(date));
con.setRequestProperty("Accept-Language", "en-US,en;q=0.9");
//con.setRequestProperty("Cookie", "cfid=9ed9c083-4696-4712-950d-1c0ad0727883; cftoken=0; AWSELB=CF13C5A70AE16731FBD093515EF0DDB58935BEB4D69838721C70C3BED039F919AF343D891D9A2001BD1070AC4C076AA72DF0A7EA6AEED1091BCD24CC7203622E75C0DE5C92; _gcl_au=1.1.1696117075.1563489288; __utmc=119398810; __utmz=119398810.1563489288.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_TC=1563505029291; __utma=119398810.1711105058.1563489288.1563498837.1563505090.3; __utmt_UA-51657054-1=1; __utmb=119398810.10.10.1563505090; testcookiesenabled=disabled; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_LV=1563508162268; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_HC=221");
//handle cookies
String cookiesHeader = con.getHeaderField("Set-Cookie");
List<HttpCookie> cookies = HttpCookie.parse(cookiesHeader);
CookieManager cookieManager = new CookieManager();
cookies.forEach(cookie -> cookieManager.getCookieStore().add(null, cookie));
con.disconnect();
con = (HttpURLConnection) url.openConnection(); //create new connection with cookies
con.setRequestProperty("Cookie", StringUtils.join(cookieManager.getCookieStore().getCookies(), ";"));
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuilder stringBuilder = new StringBuilder();
while ((str = in.readLine()) != null) {
stringBuilder.append(str);
}
in.close();
con.disconnect();
但是,如果使用“处理cookie”部分中的代码(来自教程https://www.baeldung.com/java-http-request),则会返回空数据集。有人可以发现我在做什么错吗?
答案 0 :(得分:2)
String cookiesHeader = con.getHeaderField("Set-Cookie");
用于从响应中读取cookie。但是在您的实例中,由于http请求尚未执行,因此它什么也没读取。
因此,您首先需要执行请求,然后才能使用String cookiesHeader = con.getHeaderField("Set-Cookie");
从响应中读取cookie。因此,只需在con.connect()
之前添加一个String cookiesHeader = con.getHeaderField("Set-Cookie");
,它将执行请求,然后帮助从响应中读取cookie。然后,其余的代码会将收到的Cookie添加回请求中。
con.connect();
String cookiesHeader = con.getHeaderField("Set-Cookie");
您还可以先检查请求执行是否成功,然后再读取cookie并执行以下其余过程:
int statusCode = con.getResponseCode();
if (statusCode == 200) {
String cookiesHeader = con.getHeaderField("Set-Cookie");
//rest of the code
}
答案 1 :(得分:-1)
看来我可能在吠错树。网址中的参数显然会随着时间而变化。您可以在下面看到。
我不知道数字的含义或如何在正确的时间提供正确的数字。昨天创建的第一个返回一个空集,而第二个刚刚创建的返回了良好的数据。
编辑: 好吧,我弄清楚了数字的含义。有一个单独的查询来获取纽约的毫秒时间,外加一个偏移量。我已经实现了该查询,现在,如果我将其单独粘贴到新的浏览器窗口中,它将创建一个始终返回良好数据的有效url。但是它仍然没有向我显示Java代码中的数据。
当我从官方链接访问数据时,这是我在Chrome调试器(F12)“网络”标签中看到的请求标头和其他数据:
常规
Request URL: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816
Request Method: GET
Status Code: 200 OK
Remote Address: 34.236.53.129:443
Referrer Policy: no-referrer-when-downgrade
响应标题
Access-Control-Allow-Headers: content-type Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
Access-Control-Allow-Origin: * Allow: POST, GET, OPTIONS, PUT, DELETE
Connection: keep-alive Content-Encoding: gzip Content-Length: 1179
Content-Type: text/html;charset=UTF-8 Date: Sat, 20 Jul 2019 13:47:52 GMT
Server: Realforeclose/1a Vary: Accept-Encoding
请求标头
Provisional headers are shown
Accept: application/json, text/javascript, */*; q=0.01
Referer: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/25/2019
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
X-Requested-With: XMLHttpRequest
查询字符串参数
zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816