我正在尝试在Selenium中使用代理IP地址进行网页抓取。我在Mac OSX 10.7.5上运行Python 2.7.3,我有以下python代码
import urllib2
from selenium import webdriver
fileproxylist = open('proxylist.txt', 'r')
proxyList = fileproxylist.readlines()
indexproxy = 0
totalproxy = len(proxyList)
def get_source_html_proxy(url, proxip):
proxyip=urllib2.ProxyHandler({'http':proxip})
opener = urllib2.build_opener(proxyip)
urllib2.install_opener(opener)
req=urllib2.Request(url)
sock=urllib2.urlopen(req)
data = sock.read()
return data
browser = webdriver.Chrome()
browser.get(get_source_html_proxy(MyUrl,proxyList[0]))
其中MyUrl
是我想要废弃的地址的网址,而proxlist[0]
是我想要抓取的IP地址,而不是我本地计算机的IP地址。当我运行此代码时,我收到以下错误:
Traceback (most recent call last):
File "Scrape.py", line 89, in <module>
browser.get(get_source_html_proxy(MyUrl,proxyList[0]))
File "Scrape.py", line 83, in get_source_html_proxy
sock=urllib2.urlopen(req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126,
in urlopen
return _opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400,
in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 418,
in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 378,
in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1207,
in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1177,
in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
我不确定这里的问题是什么。有人可以帮我弄清楚发生了什么吗?谢谢!