我想使用请求模块访问zoomeye.org,来自firebug的cookie如下:
__ jsluid = 470133a1338c0be13b6fdccf396772c3; csrftoken = WG6eSMS9XaLZfLjICiin8esg1qO3UOFl; Hm_lvt_e58da53564b1ec3fb2539178e6db042e = 1448411456; Hm_lpvt_e58da53564b1ec3fb2539178e6db042e = 1448505898; __jsl_clearance = 1448505830.313 | 0 | EwXSRp%2BrIEF5DR0E5WALlzLMV2Q%3D
阅读网页内容的脚本:
import requests
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language": "en-GB,en;q=0.5",
"Connection": "keep-alive",
"Host": "www.zoomeye.org",
"Referer": "https://www.zoomeye.org/",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:41.0) Gecko/20100101 Firefox/41.0"
}
data = open("cookie.txt", "r").read()
cookieDict = {}
for item in data.split(";"):
keyValue = item.split("=")
cookieDict[keyValue[0]] = keyValue[1]
url = "https://www.zoomeye.org/search?q=apache"
r = requests.get(url,cookies=cookieDict, headers=headers)
print r.content
但我无法阅读网页内容,输出如下:
<script>var dc="";var t_d={hello:"world",t_c:function(x){if(x==="")return;if(x.s
lice(-1)===";"){x=x+" ";};if(x.slice(-2)!=="; "){x=x+"; ";};dc=dc+x;}};(function
(a){eval(function(p,a,c,k,e,d){e=function(c){return(c<a?"":e(parseInt(c/a)))+((c
=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){
while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\
w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);
return p;}('b d=[5,4,0,1,2,3];b o=[];b p=0;g(b i=d.c;i--;){o[d[i]]=a[i]}o=o.m(\'
\');g(b i=0;i<o.c;i++){l(o.q(i)===\';\'){s(o,p,i);p=i+1}}s(o,p,o.c);j s(t,r,n){k
.h(t.y(r,n))};w("f.e=f.e.v(/[\\?|&]u-x/, \'\')",z);',36,36,'|||||||||||var|lengt
h||href|location|for|t_c||function|t_d|if|join||||charAt||||captcha|replace|setT
imeout|challenge|substring|1500'.split('|'),0,{}));})(['45 GMT;Path=/;', ' 26-No
v-15 03:52:', '__jsl_clearance=1448506365.', '687|0|rtcCTV', 'xuWxRiE8%2BC0', 'W
WncvYkCpQ%3D;Expires=Thu,']);document.cookie=dc;</script>
问题出在哪里?如果您对这个问题有更好的解决方案,请告诉我。感谢
答案 0 :(得分:0)
由于某种原因,该网站不喜欢您的用户代理。删除用户代理标头,它将起作用。