Question

我开发了一部分代码，我在网页抓取中使用了这些代码：

link = 'http://www.cmegroup.com'+div.findAll('a')[3]['href']
user_agent = 'Mozilla/5.0'
headers = {'User-Agent':user_agent}
req = urllib2.Request(link, headers=headers)
page = urllib2.urlopen(req).read()

然而我不明白的是，有时我会收到请求链接的错误。但有时候，我没有。例如，错误：

urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

出来了这个链接：

http://www.cmegroup.com/trading/energy/refined-products/mini-european-naphtha-platts-cif-nwe-swap-futures_product_calendar_futures.html

当我重新运行代码时，我不会再次收到此链接的错误，但对于其他一些链接。这可能是由于无线连接吗？

Answer 1

这看起来像DNS或网络问题。如果你多次为同一个URL运行相同的代码，它有时可以工作，但有时却没有，问题可能不是你的代码。

要调试此问题，您可以在语句周围执行try-except块，然后从那里启动pdb或ipdb（如果已安装）：

try:
    response = urllib2.urlopen(req)
except urllib2.URLError as ex:
    import pdb; pdb.set_trace()  # Use ipdb if installed
else:
    page = response.read()

然后你可以看看响应，状态代码，异常跟踪等......

（作为旁注，如果外部依赖不是问题，我强烈建议使用requests包而不是urllib2。）

Urlopen [Errno -2] Python

1 个答案: