在python中使用proxy来获取网页

时间:2010-03-27 00:33:19

标签: python proxy

我正在尝试用Python编写一个函数来使用公共匿名代理并获取一个网页,但我得到了一个相当奇怪的错误。
代码(我有Python 2.4):

import urllib2    
def get_source_html_proxy(url, pip, timeout):
# timeout in seconds (maximum number of seconds willing for the code to wait in
# case there is a proxy that is not working, then it gives up) 
    proxy_handler = urllib2.ProxyHandler({'http': pip})
    opener = urllib2.build_opener(proxy_handler)
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    urllib2.install_opener(opener)
    req=urllib2.Request(url)
    sock=urllib2.urlopen(req)
    timp=0 # a counter that is going to measure the time until the result (webpage) is
           # returned
    while 1:
        data = sock.read(1024)
        timp=timp+1
        if len(data) < 1024: break
        timpLimita=50000000 * timeout
        if timp==timpLimita: # 5 millions is about 1 second
            break
    if timp==timpLimita:
        print IPul + ": Connection is working, but the webpage is fetched in more than 50 seconds. This proxy returns the following IP: " + str(data)
        return str(data)
    else:
        print "This proxy " + IPul + "= good proxy. " + "It returns the following IP: " + str(data)
        return str(data)
# Now, I call the function to test it for one single proxy (IP:port) that does not support user and password (a public high anonymity proxy)
#(I put a proxy that I know is working - slow, but is working)
rez=get_source_html_proxy("http://www.whatismyip.com/automation/n09230945.asp", "93.84.221.248:3128", 50)
print rez

错误:

追踪(最近一次呼叫最后一次):

文件“./public_html/cgi-bin/teste5.py”,第43行,在?

rez = get_source_html_proxy(“http://www.whatismyip.com/automation/n09230945.asp”,“xx.yy.zzz.ww:3128”,50)

  

文件“./public_html/cgi-bin/teste5.py”,第18行,在get_source_html_proxy中      袜子= urllib2.urlopen(REQ)
   在urlopen中输入文件“/usr/lib64/python2.4/urllib2.py”,第130行      return _opener.open(url,data)
   文件“/usr/lib64/python2.4/urllib2.py”,第358行,处于打开状态      response = self._open(req,data)
   文件“/usr/lib64/python2.4/urllib2.py”,第376行,在_open中      '_open',req)
   _call_chain中的文件“/usr/lib64/python2.4/urllib2.py”,第337行      result = func(* args)
   文件“/usr/lib64/python2.4/urllib2.py”,第573行,in      lambda r,proxy = url,type = type,meth = self.proxy_open:\
   在proxy_open中输入文件“/usr/lib64/python2.4/urllib2.py”,第580行      如果主持人是'@':   TypeError:需要可迭代的参数

我不知道为什么字符“@”是一个问题(我的代码中没有这个。我应该吗?)
提前感谢您的宝贵帮助。

2 个答案:

答案 0 :(得分:3)

urllib2.build_opener获取处理程序的列表

opener = urllib2.build_opener([proxy_handler])

答案 1 :(得分:0)

@本身就是一个红色的鲱鱼,追溯来自它正在尝试执行x in host操作的事实,在这种情况下,这意味着host必须是可迭代的(如一个字符串)。您需要检查host的值,它是None或数字,而不是您的意思。