用python selenium打开多个页面

时间:2017-12-15 08:52:14

标签: python selenium-webdriver web-scraping selenium-chromedriver

我正在尝试使用python和selenium遍历网页列表并在每个页面上下载文件。我能够一次打开一个页面并使用while循环下载我想要的第一个文件但是当我到达网页列表中的第二个元素时,selenium似乎出错了。

这是我的代码:

path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)

browser.get("file:///path to html file")

#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1

    browser.quit()
    time.sleep(5)

我可以从数组中的第一个元素下载文件,即 www.google.com 。循环到达第二个列表元素 www.yahoo.com 但是一旦到达browser.get(url),我就会遇到此错误:

Traceback (most recent call last):
  File "trails_scraper.py", line 22, in <module>
    browser.get(url)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in get
    self.execute(Command.GET, {'url': url})
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 306, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 460, in execute
    return self._request(command_info[0], url, body=data)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 483, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in request
    self._send_request(method, url, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1093, in _send_request
    self.endheaders(body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
    self.send(msg)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
    self.connect()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
    self.timeout, self.source_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 575, in create_connection
    raise err
socket.error: [Errno 61] Connection refused

有谁知道发生了什么事?我知道更容易出错的方法是使用for循环但逻辑上我的代码似乎是正确的。

任何帮助都会得到极大的赞赏:)

1 个答案:

答案 0 :(得分:1)

所以问题是你宣布你的浏览器不在循环中所以,当循环完成1次关闭浏览器并且如果你的

失败
browser.get(url)

因为有任何浏览器。

你有2个解决方案:

1)在循环中引入浏览器声明

path_to_chromedriver = 'path to chromedriver location'


#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):
    browser = webdriver.Chrome(executable_path = path_to_chromedriver)

    browser.get("file:///path to html file")

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1

    browser.quit()
    time.sleep(5)

2)在循环之后关闭浏览器

path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)

browser.get("file:///path to html file")

#these are example webpages
all_trails = ['www.google.com', 'www.yahoo.com', 'www.bing.com']

index = 0

while (index <= 2):

    url = all_trails[index]
    browser.get(url)

    browser.find_element_by_link_text('Sign In').click()

    username = browser.find_element_by_xpath("//input[@placeholder='Log 
    in with email']")
    password = browser.find_element_by_name('pass')

    username.send_keys("username")
    password.send_keys("password")

    browser.find_element_by_xpath("//button[@type='submit' and 
    @class='btn btn-primary btn-lg' and contains(text(), 'Log 
    In')]").click()

    results_url = browser.find_element_by_xpath("//a[@class='require-
    user' and contains(text(), 'GPX File')]").click()
    index += 1
    time.sleep(5)
browser.quit()