Question

我的代码不适用于nytimes中的文章。请尝试将网址变量更改为其他内容，您会发现它有效。那是为什么？

#url = "http://www.nytimes.com";
url = "http://www.nytimes.com/interactive/2014/07/07/upshot/how-england-italy-and-germany-are-dominating-the-world-cup.html"
htmlfile = urllib.urlopen(url);
htmltext = htmlfile.read();
print htmltext;

请指教。感谢。

Answer 1

我认为NYT会使用Cookie验证您的请求。如果请求不是Web浏览器的普通请求，则服务器返回Location头。这会让你的请求丢失。

解决方案很简单。像这样使用cookiejar：

import cookielib, urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

url = "http://www.nytimes.com/interactive/2014/07/07/upshot/how-england-italy-and-germany-are-dominating-the-world-cup.html"
htmlfile = opener.open(url)
htmltext = htmlfile.read();

print htmltext

Answer 2

由于＆＃34;没有工作＆＃34;我认为你的意思是它并没有给你预期的内容。当我使用urllib访问该网址时，我得到一个空的结果，所以这可能是NYT＆＃34;付费墙的另一个方面。＆＃34;

尽管mechanize有效，但urllib.urlopen对此url不起作用

2 个答案: