我正在尝试使用python和selenium遍历网页列表并在每个页面上下载文件。我能够一次打开一个页面并使用while循环下载我想要的第一个文件但是当我到达网页列表中的第二个元素时,selenium似乎出错了。
这是我的代码:
path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get("file:///path to html file")
#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']
index = 0
while (index <= 2):
url = all_trails[index]
browser.get(url)
browser.find_element_by_link_text('Sign In').click()
username = browser.find_element_by_xpath("//input[@placeholder='Log
in with email']")
password = browser.find_element_by_name('pass')
username.send_keys("username")
password.send_keys("password")
browser.find_element_by_xpath("//button[@type='submit' and
@class='btn btn-primary btn-lg' and contains(text(), 'Log
In')]").click()
results_url = browser.find_element_by_xpath("//a[@class='require-
user' and contains(text(), 'GPX File')]").click()
index += 1
browser.quit()
time.sleep(5)
我可以从数组中的第一个元素下载文件,即 www.google.com 。循环到达第二个列表元素 www.yahoo.com 但是一旦到达browser.get(url)
,我就会遇到此错误:
Traceback (most recent call last):
File "trails_scraper.py", line 22, in <module>
browser.get(url)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in get
self.execute(Command.GET, {'url': url})
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 306, in execute
response = self.command_executor.execute(driver_command, params)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 460, in execute
return self._request(command_info[0], url, body=data)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 483, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in request
self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1093, in _send_request
self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
self.send(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
self.connect()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
self.timeout, self.source_address)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 575, in create_connection
raise err
socket.error: [Errno 61] Connection refused
有谁知道发生了什么事?我知道更容易出错的方法是使用for循环但逻辑上我的代码似乎是正确的。
任何帮助都会得到极大的赞赏:)
答案 0 :(得分:1)
所以问题是你宣布你的浏览器不在循环中所以,当循环完成1次关闭浏览器并且如果你的
失败browser.get(url)
因为有任何浏览器。
你有2个解决方案:
1)在循环中引入浏览器声明
path_to_chromedriver = 'path to chromedriver location'
#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']
index = 0
while (index <= 2):
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get("file:///path to html file")
url = all_trails[index]
browser.get(url)
browser.find_element_by_link_text('Sign In').click()
username = browser.find_element_by_xpath("//input[@placeholder='Log
in with email']")
password = browser.find_element_by_name('pass')
username.send_keys("username")
password.send_keys("password")
browser.find_element_by_xpath("//button[@type='submit' and
@class='btn btn-primary btn-lg' and contains(text(), 'Log
In')]").click()
results_url = browser.find_element_by_xpath("//a[@class='require-
user' and contains(text(), 'GPX File')]").click()
index += 1
browser.quit()
time.sleep(5)
2)在循环之后关闭浏览器
path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get("file:///path to html file")
#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']
index = 0
while (index <= 2):
url = all_trails[index]
browser.get(url)
browser.find_element_by_link_text('Sign In').click()
username = browser.find_element_by_xpath("//input[@placeholder='Log
in with email']")
password = browser.find_element_by_name('pass')
username.send_keys("username")
password.send_keys("password")
browser.find_element_by_xpath("//button[@type='submit' and
@class='btn btn-primary btn-lg' and contains(text(), 'Log
In')]").click()
results_url = browser.find_element_by_xpath("//a[@class='require-
user' and contains(text(), 'GPX File')]").click()
index += 1
time.sleep(5)
browser.quit()