我已经设置了一个Squid服务器来部署我的网络抓取工具。
对于HTTP URL,Squid运行良好,例如:
List<ItemCultural> data = ((MyApplication)getApplication).getData();
final ArrayAdapter<ItemCultural> arrayAdapter = new ArrayAdapter<>(this, android.R.layout.simple_expandable_list_item_1, data);
返回百度索引页面的来源。
但是,当我尝试连接到HTTPS URL时,会发生一些奇怪的事情。以下curl命令只返回一些无用的HTML代码:
curl -is -L http://www.baidu.com --proxy http://<squid_address>:3129
结果(包括响应标题):
curl -is -L https://www.baidu.com --proxy http://<squid_address>:3129
我知道原因可能是百度服务器检测到传入连接没有使用HTTPS(请参阅HTTP/1.1 200 OK
Server: bfe/1.0.8.14
Date: Fri, 04 Mar 2016 01:24:26 GMT
Content-Type: text/html
Content-Length: 227
Connection: keep-alive
Last-Modified: Thu, 09 Oct 2014 10:47:57 GMT
Set-Cookie: BD_NOT_HTTPS=1; path=/; Max-Age=300
Set-Cookie: BIDUPSID=4617F4EBA3F7D1A94B06FFE0B72E02B7; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1457054666; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
P3P: CP=" OTI DSP COR IVA OUR IND COM "
X-UA-Compatible: IE=Edge,chrome=1
Pragma: no-cache
Cache-control: no-cache
BDPAGETYPE: 1
BDQID: 0xd7f5c039000f7c84
BDUSERID: 0
Accept-Ranges: bytes
Set-Cookie: __bsi=16735848709246631383_00_175_N_N_1_0303_C02F_N_N_Y_0; expires=Fri, 04-Mar-16 01:24:31 GMT; domain=www.baidu.com; path=/
<html>
<head>
<script>
location.replace(location.href.replace("https://","http://"));
</script>
</head>
<body>
<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
</body>
</html>
cookie)
如何配置Squid服务器以便它可以支持HTTPS URL来解决这个问题?