我在txt文件中有很多免费代理,现在我想用它们作为代理来抓取网站,但是当我使用代理时,如下面的127.0.0.1,如何判断代理仍可用于使用?
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'}) opener = urllib2.build_opener(proxy) urllib2.install_opener(opener) urllib2.urlopen('http://www.google.com')
答案 0 :(得分:0)
使用此功能:
def is_OK(ip):
print 'Trying %s ...' % ip
try:
proxy_handler = urllib2.ProxyHandler({'http': ip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.icanhazip.com')
urllib2.urlopen(req)
print '%s is OK' % ip
return True
except urllib2.HTTPError:
print '%s is not OK' % ip
except Exception:
print '%s is not OK' % ip
return False
从这个回答:Python, checking if a proxy is alive?
所以你只需迭代文件(假设每行1个IP地址)并检查is_OK()是否返回True:
with open('ip_addresses.txt') as fp:
for ip in fp:
if is_OK(ip) is True:
do_something();