使用多个代理在urllib2中打开一个链接

时间:2013-05-25 05:59:24

标签: python proxy urllib2

我要做的是读取一行(一个IP地址),用该地址打开网站,然后重复文件中的所有地址。相反,我得到一个错误。我是python的新手,所以也许这是一个简单的错误。在此先感谢!!!

CODE:

>>> f = open("proxy.txt","r");          #file containing list of ip addresses
>>> address = (f.readline()).strip();      # to remove \n at end of line
>>> 
>>> while line:
        proxy = urllib2.ProxyHandler({'http': address })
        opener = urllib2.build_opener(proxy)
        urllib2.install_opener(opener)
        urllib2.urlopen('http://www.google.com')
        address = (f.readline()).strip();

ERROR:

Traceback (most recent call last):
  File "<pyshell#15>", line 5, in <module>
    urllib2.urlopen('http://www.google.com')
  File "D:\Programming\Python\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "D:\Programming\Python\lib\urllib2.py", line 394, in open
    response = self._open(req, data)
  File "D:\Programming\Python\lib\urllib2.py", line 412, in _open
    '_open', req)
  File "D:\Programming\Python\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "D:\Programming\Python\lib\urllib2.py", line 1199, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "D:\Programming\Python\lib\urllib2.py", line 1174, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

1 个答案:

答案 0 :(得分:1)

这意味着代理不可用。

这是一个代理检查器,它可以同时检查几个代理:

#!/usr/bin/env python
import fileinput # accept proxies from files or stdin

try:
    from gevent.pool import Pool # $ pip install gevent
    import gevent.monkey; gevent.monkey.patch_all() # patch stdlib
except ImportError: # fallback on using threads
    from multiprocessing.dummy import Pool

try:
    from urllib2 import ProxyHandler, build_opener
except ImportError: # Python 3
    from urllib.request import ProxyHandler, build_opener

def is_proxy_alive(proxy, timeout=5):
    opener = build_opener(ProxyHandler({'http': proxy})) # test redir. and such
    try: # send request, read response headers, close connection
        opener.open("http://example.com", timeout=timeout).close()
    except EnvironmentError:
        return None
    else:
        return proxy

candidate_proxies = (line.strip() for line in fileinput.input())
pool = Pool(20) # use 20 concurrent connections
for proxy in pool.imap_unordered(is_proxy_alive, candidate_proxies):
    if proxy is not None:
       print(proxy)

用法:

$ python alive-proxies.py proxy.txt
$ echo user:password@ip:port | python alive-proxies.py