eduardo@camizao:/$ python2.7
Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> url1 = 'http://www.google.com'
>>> url2 = 'https://www.google.com'
>>> f = urllib.urlopen(url1)
>>> f = urllib.urlopen(url2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 211, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 355, in open_http
'got a bad status line', None)
IOError: ('http protocol error', 0, 'got a bad status line', None)
>>>
当我尝试连接到https站点时,使用urllib我得到了上面的错误。 代理正确设置。调试python代码,我注意到在urllib.py中没有执行ssl库的导入。因此,也不会执行https调用。有人可以帮帮我吗?我必须使用urllib,而不是urllib2或其他。提前致谢。
答案 0 :(得分:0)
至少你写作的方式没有错:
$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> url1 = 'http://www.google.com'
>>> url2 = 'https://www.google.com'
>>> f = urllib.urlopen(url1)
>>> f = urllib.urlopen(url2)
>>> f.read()[:15]
'<!doctype html>'
>>>
所以这就是它不是。它必须与您的环境或配置相关。你说你在使用代理?
修改强>
我可以通过开放代理打开它(不会包含所谓的代理,因为谁知道它是否粗略 - 用你自己的代理代替:
$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> proxy_handler = urllib2.ProxyHandler({'http': 'http://some-sketchy-open-proxy'})
>>> opener = urllib2.build_opener(proxy_handler)
>>> opener.open('https://www.google.com')
<addinfourl at 140512985881056 whose fp = <socket._fileobject object at 0x7fcbba9b1ed0>>
>>> _.read()[:15]
'<!doctype html>'
>>>
尝试使用您自己的代理网址(注意我使用了urllib2,而不是urllib)。希望有所帮助!
编辑2 :
仅使用urllib:
$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26)
[GCC 4.7.3] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> import urllib
>>> proxies = {'http': '189.112.3.87:3128'}
>>> url = 'https://www.google.com'
>>> filehandle = urllib.urlopen(url,proxies=proxies)
>>> filehandle.read()[:15]
'<!doctype html>'
>>>