我正在尝试向网站发送PHP GET请求:
问题是,如果我将Cookie信息附加到请求标题,本网站将仅处理我的请求。
或者在图片方面,如果我在浏览器中禁用cookie,我会得到:
这意味着该网站认识到这是我第一次“访问”该网站。
问题是,如果我现在使用右上方的搜索栏,它将不处理此请求: 它只会显示相同的(一般)屏幕。
例如:如果我有禁用的Cookie并且我搜索“AAPL”,则不会显示任何结果。
现在,如果我启用了启用了,那么请求处理得很好:
所以显示了“AAPL”结果。
你也可以自己尝试一下:
启用Cookie ,请访问http://www.pennystocktweets.com/user_posts/feeds?cat=search&lptyp=prep&usrstk=AAPL
使用Cookie 已停用,请再次访问该链接:http://www.pennystocktweets.com/user_posts/feeds?cat=search&lptyp=prep&usrstk=AAPL
现在比较一下答案,只有第一个是正确的。
这意味着该网站仅在 客户端下载了cookie之后才能工作,然后在附加了此Cookie信息的情况下向服务器发出了另一个(新的)GET请求。
(这是否意味着网站需要会话Cookie才能正常运行?)
现在我要做的是用Apache HttpClient模仿请求,如下:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.CookieHandler;
import java.net.CookieManager;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;
import java.util.List;
import java.util.StringTokenizer;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
public class downloadTweets {
private String cookies;
private HttpClient client = new DefaultHttpClient();
private final String USER_AGENT = "Mozilla/5.0";
public static void main(String[] args) throws Exception {
String ticker = "AAPL";
String lptyp = "prep";
int opid = 0;
int lpid = 0;
downloadTweets test = new downloadTweets();
String url = test.constructURL(ticker, lptyp, opid, lpid);
// make sure cookies is turn on
CookieHandler.setDefault(new CookieManager());
downloadTweets http = new downloadTweets();
String page = http.GetPageContent(url, ticker);
System.out.println(page);
}
public String constructURL(String ticker, String lptyp, int opid, int lpid)
{
String link = "http://www.pennystocktweets.com/user_posts/feeds?cat=search" +
"&lptyp=" + lptyp +
"&usrstk=" + ticker;
if (opid != 0)
{
link = link +
"&opid=" + opid +
"&lpid=" + lpid;
}
return link;
}
private String GetPageContent(String url, String ticker) throws Exception {
HttpGet request = new HttpGet(url);
String RefererLink = "http://www.pennystocktweets.com/search/post/" + ticker.toUpperCase();
request.setHeader("Host", "www.pennystocktweets.com");
request.setHeader("Connection", "Keep-alive");
request.setHeader("Accept", "*/*");
request.setHeader("X-Requested-With", "XMLHttpRequest");
request.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36");
request.setHeader("Referer", RefererLink);
request.setHeader("Accept-Language", "nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4,fr;q=0.2");
HttpResponse response = client.execute(request);
int responseCode = response.getStatusLine().getStatusCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
// set cookies
setCookies(response.getFirstHeader("Set-Cookie") == null ? "" :
response.getFirstHeader("Set-Cookie").toString());
return result.toString();
}
public String getCookies() {
return cookies;
}
public void setCookies(String cookies) {
this.cookies = cookies;
}
}
现在,同样的事情就是:如果我附上(我的)cookie信息,响应就可以了,如果我不这样做,那么响应就不起作用了。
但我不知道如何获取Cookie信息,然后在新的GET请求中使用它。
所以我的问题是:
如何向网站提出2个请求:
在第一个GET请求上,我从网站获取cookie信息并将其存储在我的Java程序中
在第二个GET请求上,我使用存储的cookie信息(作为标题)发出新请求。
注意: 我不知道cookie是普通的cookie还是会话cookie,但我怀疑它是会话cookie !
非常感谢所有帮助!
答案 0 :(得分:1)
正如Apache HttpClient Cookie handling part中的Apache公共httpclient状态所述:
HttpClient supports automatic management of cookies, including allowing the server to set cookies and automatically return them to the server when required. It is also possible to manually set cookies to be sent to the server.
每当http客户端收到cookie时,它们都会持久存储到HttpState
并自动添加到新请求中。这是默认行为。
在以下示例代码中,我们可以看到两个GET请求返回的cookie。我们无法直接看到发送到服务器的cookie,但我们可以使用协议/网络嗅探器或ngrep
等工具来查看通过网络传输的数据:
import java.io.IOException;
import org.apache.commons.httpclient.Cookie;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpState;
import org.apache.commons.httpclient.cookie.CookiePolicy;
import org.apache.commons.httpclient.methods.GetMethod;
public class HttpTest {
public static void main(String[] args) throws HttpException, IOException {
String url = "http://www.whatarecookies.com/cookietest.asp";
HttpClient client = new HttpClient();
client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
HttpMethod method = new GetMethod(url);
int res = client.executeMethod(method);
System.out.println("Result: " + res);
printCookies(client.getState());
method = new GetMethod(url);
res = client.executeMethod(method);
System.out.println("Result: " + res);
printCookies(client.getState());
}
public static void printCookies(HttpState state){
System.out.println("Cookies:");
Cookie[] cookies = state.getCookies();
for (Cookie cookie : cookies){
System.out.println(" " + cookie.getName() + ": " + cookie.getValue());
}
}
}
这是输出:
Result: 200
Cookies:
active_template::468: %2Fresponsive%2Fthree_column_inner_ad3b74de5a1c2f311bee7bca5c368aaa4e:b326b5062b2f0e69046810717534cb09
Result: 200
Cookies:
active_template::468: %2Fresponsive%2Fthree_column_inner_ad%2C+3b74de5a1c2f311bee7bca5c368aaa4e%3Db326b5062b2f0e69046810717534cb09
3b74de5a1c2f311bee7bca5c368aaa4e: b326b5062b2f0e69046810717534cb09
以下是ngrep
的摘录:
MacBook$ sudo ngrep -W byline -d en0 "" host www.whatarecookies.com
interface: en0 (192.168.11.0/255.255.255.0)
filter: (ip) and ( dst host www.whatarecookies.com )
#####
T 192.168.11.70:56267 -> 54.228.218.117:80 [AP]
GET /cookietest.asp HTTP/1.1.
User-Agent: Jakarta Commons-HttpClient/3.1.
Host: www.whatarecookies.com.
.
####
T 54.228.218.117:80 -> 192.168.11.70:56267 [A]
HTTP/1.1 200 OK.
Server: nginx/1.4.0.
Date: Wed, 27 Nov 2013 10:22:14 GMT.
Content-Type: text/html; charset=iso-8859-1.
Content-Length: 36397.
Connection: keep-alive.
Vary: Accept-Encoding.
Vary: Cookie,Host,Accept-Encoding.
Set-Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad; expires=Fri, 29-Nov-2013 10:22:01 GMT; path=/; domain=whatarecookies.com; httponly.
Set-Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09; expires=Thu, 27-Nov-2014 10:22:01 GMT.
X-Middleton-Response: 200.
Cache-Control: max-age=0, no-cache.
X-Mod-Pagespeed: 1.7.30.1-3609.
.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
...
##
T 192.168.11.70:56267 -> 54.228.218.117:80 [AP]
GET /cookietest.asp HTTP/1.1.
User-Agent: Jakarta Commons-HttpClient/3.1.
Host: www.whatarecookies.com.
Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad.
Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09.
.
##
T 54.228.218.117:80 -> 192.168.11.70:56267 [A]
HTTP/1.1 200 OK.
Server: nginx/1.4.0.
Date: Wed, 27 Nov 2013 10:22:18 GMT.
Content-Type: text/html; charset=iso-8859-1.
Content-Length: 54474.
Connection: keep-alive.
Vary: Accept-Encoding.
Vary: Cookie,Host,Accept-Encoding.
Set-Cookie: active_template::468=%2Fresponsive%2Fthree_column_inner_ad%2C+3b74de5a1c2f311bee7bca5c368aaa4e%3Db326b5062b2f0e69046810717534cb09; expires=Fri, 29-Nov-2013 10:22:05 GMT; path=/; domain=whatarecookies.com; httponly.
Set-Cookie: 3b74de5a1c2f311bee7bca5c368aaa4e=b326b5062b2f0e69046810717534cb09; expires=Thu, 27-Nov-2014 10:22:05 GMT.
X-Middleton-Response: 200.
Cache-Control: max-age=0, no-cache.
X-Mod-Pagespeed: 1.7.30.1-3609.
.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
...