Selenium PhantomJS:并行加载不同Windows中的页面

时间:2016-10-21 02:27:50

标签: python selenium download phantomjs

是否可以控制硒,使其在不同的窗口中并行打开多个页面?

Selenium的composer clear-cache命令等待get事件被触发,但我想启动页面加载序列并继续而不等待onload。下面基本上是我想要做的,但它需要onloadget_url_async,这是我刚刚编写的两个命令。

wait_for_onload

另一种选择是打开单独的webdriver实例,但我希望这些实例能够共享cookie和状态。

1 个答案:

答案 0 :(得分:0)

我明白了。您可以通过执行window.open javascript异步打开窗口。使用该功能,您可以同时并行访问多个页面。启动下载后,您可以实现自己的等待逻辑,以确定页面何时完成加载。

def download_parallel(urls, driver, process_page):
  '''Download pages in parallel using Selenium
  urls: a list of urls to download
  driver: The selenium webdriver
  process_page: a function that takes in the url and the driver to process 
                the page as you see fit.
  '''
  start_handle = driver.current_window_handle
  handles = []

  # Step 1: Initiate all of the page downloads
  for i, url in enumerate(urls):
    driver.switch_to.window(driver.window_handles[i])
    old_handles = driver.window_handles
    # Initiate a page get without waiting for onload
    driver.execute_script('window.open("%s", "para_win_%02d", '
            '"height = 450, width = 800, menubar=yes,scrollbars=yes,toolbar=yes,'
            'location=no,resizable=yes");'%(url, i))
    # We have to determine the handle for the new window.
    for h in driver.window_handles:
      if h not in old_handles:
        handles.append(h)
        break

  # Step 2: Wait for the pages to download.
  for i, url in enumerate(urls):
    driver.switch_to.window(handles[i])
    # Wait for some css to load. There are other waiting functions you can use.
    WebDriverWait(driver, 10).until(
      EC.visibility_of_element_located((By.CSS_SELECTOR, "body #my_main"))
    )
    # Do more processing of the page here
    process_page(url, driver)
    # Close the window now that we're done with it.
    driver.close()

  # Go back to the window we started in
  driver.switch_to.window(start_handle)