我在python中编写了一个使用cookie和POST / GET的脚本。我还在脚本中包含了代理支持。但是,当一个人进入死代理代理时,脚本崩溃。在运行我的其余脚本之前,有没有办法检查代理是否死/活?
此外,我注意到一些代理不能正确处理cookie / POST头。有什么方法可以解决这个问题吗?
答案 0 :(得分:16)
最简单的方法是从urllib中捕获IOError异常:
try:
urllib.urlopen(
"http://example.com",
proxies={'http':'http://example.com:8080'}
)
except IOError:
print "Connection error! (Check proxy)"
else:
print "All was fine"
此外,来自this blog post - "check status proxy address"(略有改进):
for python 2
import urllib2
import socket
def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.example.com') # change the URL to test here
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:
print "ERROR:", detail
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '213.55.87.162:6588']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print "Bad Proxy %s" % (currentProxy)
else:
print "%s is working" % (currentProxy)
if __name__ == '__main__':
main()
for python 3
import urllib.request
import socket
import urllib.error
def is_bad_proxy(pip):
try:
proxy_handler = urllib.request.ProxyHandler({'http': pip})
opener = urllib.request.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req=urllib.request.Request('http://www.example.com') # change the URL to test here
sock=urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
print('Error code: ', e.code)
return e.code
except Exception as detail:
print("ERROR:", detail)
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '25.176.126.9:80']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print("Bad Proxy %s" % (currentProxy))
else:
print("%s is working" % (currentProxy))
if __name__ == '__main__':
main()
请记住,这可能会使脚本占用的时间加倍,如果代理已关闭(因为您将不得不等待两个连接超时)。除非您特别需要知道代理有问题,否则处理IOError是远的更干净,更简单,更快..
答案 1 :(得分:1)
我认为更好的方法就像dbr所说,处理异常。
在某些情况下可能更好的另一种解决方案是使用外部 online proxy checker 工具检查代理服务器是否处于活动状态,然后继续使用您的脚本而不进行任何修改。< / p>
答案 2 :(得分:1)
您可以使用像这样简单的Proxy-checker库
from proxy_checker import ProxyChecker
checker = ProxyChecker()
checker.check_proxy('<ip>:<port>')
输出:
{
"country": "United States",
"country_code": "US",
"protocols": [
"socks4",
"socks5"
],
"anonymity": "Elite",
"timeout": 1649
}
可以生成自己的代理,并用两行代码对其进行检查
答案 3 :(得分:0)
有一个不错的包Grab 所以,如果你没问题,你可以写这样的东西(简单有效的代理检查器生成器):
from grab import Grab, GrabError
def get_valid_proxy(proxy_list): #format of items e.g. '128.2.198.188:3124'
g = Grab()
for proxy in proxy_list:
g.setup(proxy=proxy, proxy_type='http', connect_timeout=5, timeout=5)
try:
g.go('google.com')
except GrabError:
#logging.info("Test error")
pass
else:
yield proxy