我正在使用here中的Martin Konecny的代码从公司防火墙后面查询http网站:
代码是这样的:
import urllib.request
req = urllib.request.Request(
'http://www.espncricinfo.com/',
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close
但是,运行此代码后,我收到的是PAC文件,而不是URL的内容。
如何获得URL来下载网站内容?
谢谢!
答案 0 :(得分:1)
import urllib.request
req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
opener = urllib.request.build_opener(proxy_support)
# make opener object the global default opener.
urllib.request.install_opener(opener)
f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close