Java-如何获取和添加Cookie以正确请求标头?

时间:2019-07-19 16:54:47

标签: java cookies request httpurlconnection

我需要转到网站上的不同页面并收集信息。我不确定如何处理Cookie。如果使用chrome调试器控制台(F12)查看“网络”活动,则可以看到正在发送的请求属性和cookie。如果我专门为其中一个页面添加cookie(请参阅注释掉的con.setRequestProperty(“ Cookie”,...),则信息已成功检索。

            URL url = new URL(urlStr);
            HttpURLConnection con = (HttpURLConnection) url.openConnection();
            con.setRequestMethod("GET");
            con.setRequestProperty("Host", county +"." +referer +".com");
            con.setRequestProperty("Connection", "keep-alive");
            con.setRequestProperty("Accept", "application/json, text/javascript, */*; q=0.01");
            con.setRequestProperty("X-Requested-With", "XMLHttpRequest");
            con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36");
            con.setRequestProperty("Origin", "http://evil.com/");
            con.setRequestProperty("Referer", "https://" +county +"." +referer +".com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=" +df.format(date));
            con.setRequestProperty("Accept-Language", "en-US,en;q=0.9");
            //con.setRequestProperty("Cookie", "cfid=9ed9c083-4696-4712-950d-1c0ad0727883; cftoken=0; AWSELB=CF13C5A70AE16731FBD093515EF0DDB58935BEB4D69838721C70C3BED039F919AF343D891D9A2001BD1070AC4C076AA72DF0A7EA6AEED1091BCD24CC7203622E75C0DE5C92; _gcl_au=1.1.1696117075.1563489288; __utmc=119398810; __utmz=119398810.1563489288.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_TC=1563505029291; __utma=119398810.1711105058.1563489288.1563498837.1563505090.3; __utmt_UA-51657054-1=1; __utmb=119398810.10.10.1563505090; testcookiesenabled=disabled; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_LV=1563508162268; CF_CLIENT_" +county.toUpperCase() +"_" +referer.toUpperCase() +"_HC=221");

            //handle cookies
            String cookiesHeader = con.getHeaderField("Set-Cookie");
            List<HttpCookie> cookies = HttpCookie.parse(cookiesHeader);
            CookieManager cookieManager = new CookieManager();
            cookies.forEach(cookie -> cookieManager.getCookieStore().add(null, cookie));
            con.disconnect();
            con = (HttpURLConnection) url.openConnection();     //create new connection with cookies
            con.setRequestProperty("Cookie", StringUtils.join(cookieManager.getCookieStore().getCookies(), ";"));

            BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
            StringBuilder stringBuilder = new StringBuilder();
            while ((str = in.readLine()) != null) {
                stringBuilder.append(str);
            }
            in.close();
            con.disconnect();

但是,如果使用“处理cookie”部分中的代码(来自教程https://www.baeldung.com/java-http-request),则会返回空数据集。有人可以发现我在做什么错吗?

2 个答案:

答案 0 :(得分:2)

String cookiesHeader = con.getHeaderField("Set-Cookie");用于从响应中读取cookie。但是在您的实例中,由于http请求尚未执行,因此它什么也没读取。

因此,您首先需要执行请求,然后才能使用String cookiesHeader = con.getHeaderField("Set-Cookie");从响应中读取cookie。因此,只需在con.connect()之前添加一个String cookiesHeader = con.getHeaderField("Set-Cookie");,它将执行请求,然后帮助从响应中读取cookie。然后,其余的代码会将收到的Cookie添加回请求中。

con.connect();
String cookiesHeader = con.getHeaderField("Set-Cookie");

您还可以先检查请求执行是否成功,然后再读取cookie并执行以下其余过程:

int statusCode = con.getResponseCode();
if (statusCode == 200) {
   String cookiesHeader = con.getHeaderField("Set-Cookie");
   //rest of the code
}

答案 1 :(得分:-1)

看来我可能在吠错树。网址中的参数显然会随着时间而变化。您可以在下面看到。

  

https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563563124890&bypassPage=1&test=1&_=1563563124891

     

https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563508160468&bypassPage=1&test=1&_=1563508160468

我不知道数字的含义或如何在正确的时间提供正确的数字。昨天创建的第一个返回一个空集,而第二个刚刚创建的返回了良好的数据。

编辑: 好吧,我弄清楚了数字的含义。有一个单独的查询来获取纽约的毫秒时间,外加一个偏移量。我已经实现了该查询,现在,如果我将其单独粘贴到新的浏览器窗口中,它将创建一个始终返回良好数据的有效url。但是它仍然没有向我显示Java代码中的数据。

当我从官方链接访问数据时,这是我在Chrome调试器(F12)“网络”标签中看到的请求标头和其他数据:

常规

Request URL: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816
Request Method: GET
Status Code: 200 OK
Remote Address: 34.236.53.129:443
Referrer Policy: no-referrer-when-downgrade

响应标题

Access-Control-Allow-Headers: content-type Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE
Access-Control-Allow-Origin: * Allow: POST, GET, OPTIONS, PUT, DELETE
Connection: keep-alive Content-Encoding: gzip Content-Length: 1179
Content-Type: text/html;charset=UTF-8 Date: Sat, 20 Jul 2019 13:47:52 GMT
Server: Realforeclose/1a Vary: Accept-Encoding

请求标头

Provisional headers are shown
Accept: application/json, text/javascript, */*; q=0.01
Referer: https://brevard.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/25/2019
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
X-Requested-With: XMLHttpRequest

查询字符串参数

zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563630471816&bypassPage=1&test=1&_=1563630471816