我有一个非常基本的脚本,可以使用Python urllib2下载网站。
在过去的6个月里,这种情况一直很好用,今天早上它已经不再适用了吗?
#!/usr/bin/python
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://DOMAIN\USER:PASS@PROXY:PORT/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
translink = open('/tmp/trains.html' ,'w')
response = urllib2.urlopen('http://translink.com.au')
html = response.read()
translink.write(html)
translink.close()
我现在收到以下错误
Traceback (most recent call last):
File "./gettrains.py", line 7, in <module>
response = urllib2.urlopen('http://translink.com.au')
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 407, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 520, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 445, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 528, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 502: Proxy Error ( The HTTP message includes an unsupported header or an unsupported combination of headers. )
我是Python新手,非常感谢任何帮助。
干杯
#!/usr/bin/python
import requests
proxies = {
"http": "http://domain\user:pass@proxy:port",
"https": "http:// domain\user:pass@proxy:port",
}
html = requests.get("http://translink.com.au", proxies=proxies)
translink = open('/tmp/trains.html' ,'w')
translink.write(html.content)
translink.close()
答案 0 :(得分:0)
尝试更改标题。例如:
opener = urllib2.build_opener(proxy_support)
opener.addheaders = ([('User-Agent' , 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)')])
urllib2.install_opener(opener)
前几天我遇到了同样的问题。我的代理不承认默认标头user-agent ='Python-urllib / 2.7'
答案 1 :(得分:0)
为了简化一些事情,我会避免在python中进行代理设置,只需让你的操作系统为你管理它。您可以通过设置环境变量(如Linux中的export http_proxy="your_proxy"
)来完成此操作。然后直接通过python抓取文件,您可以使用urllib2
或requests
,也可以考虑wget
模块。
完全可能的是,您的代理可能会发生一些更改,这些更改会使用您的最终目标无法接受的标头转发请求。在那种情况下,你可以做的很少。