python的urllib2.urlopen()要么引发[errno 2],要么永远陷入困境

时间:2013-07-01 02:23:25

标签: python urllib2

我是Python新手,现在我正在学习用它编写一些网络抓取脚本。但是一些奇怪的事情不断发生,我不知道为什么。经过一段时间的测试后,我认为问题在于urllib2.urlopen()功能。听我说:

当我使用python在bash中打开Python解释器并输入:

import urllib2
urllib2.urlopen("http://www.baidu.com/") # which is a Chinese version of Google that most of us use only to test if the network connection is fine
事情变得非常丑陋:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 2] No such file or directory>

我不知道它究竟意味着什么,但我确实在网上进行了一些研究。虽然大多数结果都无法帮助我的案例,但我确实看到有人声称使用sudo一切正常。

所以我试了一下,用b sudo python从bash打开python,然后运行与上面完全相同的代码。这一次似乎永远陷入困境。最后我不得不使用KeyboardInterrupt,无论我在程序卡住的情况下等待多久,我都会得到相同的追溯结果:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1181, in do_open
    h.request(req.get_method(), req.get_selector(), req.data, headers)
  File "/usr/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 791, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 562, in create_connection
    sock.connect(sa)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
KeyboardInterrupt

我在便携式闪存驱动器上运行我的Ubuntu 13.04桌面上的python,目前它正在我公司的代理服务器后面运行。我不知道这是代理问题,我尝试通过

设置环境代理
$ export http_proxy="http://domain\username:password@proxyserver:port"

这样至少wget可以正常工作。

作为比较,当我ssh回到我家中的桌面计算机并运行相同的代码时,无论是否有sudo,这一切似乎都没问题:

Python 2.7.3 (default, Apr 10 2013, 05:09:49) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> urllib2.urlopen("http://www.baidu.com/")
<addinfourl at 3071291372L whose fp = <socket._fileobject object at 0xb71d6eec>>
>>>     

我尝试在我的笔记本电脑上从同一个闪存驱动器运行Ubuntu,它也不是那么好,但我不记得细节了。我将把它带回家并在今天下班后测试它以获取更多信息并将它们发回到这里。在那之前,有人帮忙吗?

0 个答案:

没有答案