使用urllib.request返回代理自动配置文件

时间:2018-07-09 12:04:35

标签: python urllib pac

我正在使用here中的Martin Konecny的代码从公司防火墙后面查询http网站:

代码是这样的:

    import urllib.request
req = urllib.request.Request(
    'http://www.espncricinfo.com/', 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close

但是,运行此代码后,我收到的是PAC文件,而不是URL的内容。

如何获得URL来下载网站内容?

谢谢!

1 个答案:

答案 0 :(得分:1)

import urllib.request

req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
opener = urllib.request.build_opener(proxy_support)
# make opener object the global default opener. 
urllib.request.install_opener(opener)


f = urllib.request.urlopen(req)

g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close