如何在Python中通过代理打开带有urllib的网站?

时间:2010-07-02 18:19:48

标签: python proxy

我有这个检查网站的程序,我想知道如何通过Python中的代理检查它...

这是代码,仅举例来说

while True:
    try:
        h = urllib.urlopen(website)
        break
    except:
        print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
        time.sleep(5)

4 个答案:

答案 0 :(得分:38)

默认情况下,urlopen使用环境变量http_proxy来确定要使用的HTTP代理:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

如果您想在应用程序中指定代理,可以向proxies提供urlopen参数:

proxies = {'http': 'http://myproxy.example.com:1234'}
print "Using HTTP proxy %s" % proxies['http']
urllib.urlopen("http://www.google.com", proxies=proxies)

修改:如果我正确理解您的评论,您可以尝试多个代理并在尝试时打印每个代理。这样的事情怎么样?

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print "Trying HTTP proxy %s" % proxy
    try:
        result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
        print "Got URL using proxy %s" % proxy
        break
    except:
        print "Trying next proxy in 5 seconds"
        time.sleep(5)

答案 1 :(得分:26)

Python 3在这里略有不同。它将尝试自动检测代理设置,但如果您需要特定或手动代理设置,请考虑这种代码:

#!/usr/bin/env python3
import urllib.request

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port', 
                                             'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

with urllib.request.urlopen(url) as response:
    # ... implement things such as 'html = response.read()'

另请参阅the relevant section in the Python 3 docs

答案 2 :(得分:3)

此处示例代码指南如何使用urllib通过代理连接:

authinfo = urllib.request.HTTPBasicAuthHandler()

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.google.com/')
"""

答案 3 :(得分:0)

对于http和https使用:

proxies = {'http':'http://proxy-source-ip:proxy-port',
           'https':'https://proxy-source-ip:proxy-port'}

可以类似地添加更多代理

proxies = {'http':'http://proxy1-source-ip:proxy-port',
           'http':'http://proxy2-source-ip:proxy-port'
           ...
          }

使用

filehandle = urllib.urlopen( external_url , proxies=proxies)

不要使用任何代理(如果是网络内的链接)

filehandle = urllib.urlopen(external_url, proxies={})

通过用户名和密码使用代理身份验证

proxies = {'http':'http://username:password@proxy-source-ip:proxy-port',
           'https':'https://username:password@proxy-source-ip:proxy-port'}
  

注意:避免在用户名和密码中使用:,@等特殊字符