PhantomJS的行为与Firefox webdriver不同

时间:2015-07-12 19:02:11

标签: javascript python firefox selenium-webdriver phantomjs

我正在研究一些使用Selenium网络驱动程序的代码 - Firefox。大多数事情似乎都有效,但当我尝试将浏览器更改为PhantomJS时,它开始表现不同。

我正在处理的页面需要慢慢滚动才能加载越来越多的结果,这可能就是问题所在。

以下是适用于Firefox webdriver的代码,但不适用于PhantomJS:

def get_url(destination,start_date,end_date): #the date is like %Y-%m-%d 
    return "https://www.pelikan.sk/sk/flights/listdfc=%s&dtc=C%s&rfc=C%s&rtc=%s&dd=%s&rd=%s&px=1000&ns=0&prc=&rng=0&rbd=0&ct=0&view=list" % ('CVIE%20BUD%20BTS',destination, destination,'CVIE%20BUD%20BTS', start_date, end_date)



def load_whole_page(self,destination,start_date,end_date):
        deb()

        url = get_url(destination,start_date,end_date)

        self.driver.maximize_window()
        self.driver.get(url)

        wait = WebDriverWait(self.driver, 60)
        wait.until(EC.invisibility_of_element_located((By.XPATH, '//img[contains(@src, "loading")]')))
        wait.until(EC.invisibility_of_element_located((By.XPATH,
                                                       u'//div[. = "Poprosíme o trpezlivosť, hľadáme pre Vás ešte viac letov"]/preceding-sibling::img')))
        i=0
        old_driver_html = ''
        end = False
        while end==False:
            i+=1

            results = self.driver.find_elements_by_css_selector("div.flightbox")
            print len(results)
            if len(results)>=__THRESHOLD__: # for testing purposes. Default value: 999
                break
            try:
                self.driver.execute_script("arguments[0].scrollIntoView();", results[0])
                self.driver.execute_script("arguments[0].scrollIntoView();", results[-1])            
            except:
                self.driver.save_screenshot('screen_before_'+str()+'.png')
                sleep(2)

                print 'EXCEPTION<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'
                continue 

            new_driver_html = self.driver.page_source
            if new_driver_html == old_driver_html:
                print 'END OF PAGE'
                break
            old_driver_html = new_driver_html

            wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, 'div.flightbox'), len(results)))
        sleep(10)

要检测页面何时满载,我会比较html和新html的旧副本,这可能不是我应该做的,但是使用Firefox就足够了。

这是加载停止时PhantomJS的屏幕:enter image description here

使用Firefox,它会加载越来越多的结果,但是使用PhantomJS时,它会停留在10个结果上。

有什么想法吗?这两个驱动程序有什么区别?

1 个答案:

答案 0 :(得分:2)

帮助我解决问题的两个关键因素:

  • 不要使用我之前为您提供的自定义等待
  • 首先将window.document.body.scrollTop设置为0,然后连续设置为document.body.scrollHeight

工作代码:

results = []
while len(results) < 200:
    results = driver.find_elements_by_css_selector("div.flightbox")

    print len(results)

    # scroll
    driver.execute_script("arguments[0].scrollIntoView();", results[0])
    driver.execute_script("window.document.body.scrollTop = 0;")
    driver.execute_script("window.document.body.scrollTop = document.body.scrollHeight;")
    driver.execute_script("arguments[0].scrollIntoView();", results[-1])

版本2 (无限循环,如果滚动中没有任何内容加载则停止):

results = []
while True:
    try:
        wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, "div.flightbox"), len(results)))
    except TimeoutException:
        break

    results = self.driver.find_elements_by_css_selector("div.flightbox")
    print len(results)

    # scroll
    for _ in xrange(5):
        try:
            self.driver.execute_script("""
                arguments[0].scrollIntoView();
                window.document.body.scrollTop = 0;
                window.document.body.scrollTop = document.body.scrollHeight;
                arguments[1].scrollIntoView();
            """, results[0], results[-1])
        except StaleElementReferenceException:
            break  # here it means more results were loaded

print "DONE. Result count: %d" % len(results)

请注意,我已在wait_for_more_than_n_elements预期条件中更改了比较。取代:

return count >= self.count

使用:

return count > self.count

版本3 (多次从页眉滚动到页脚):

header = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'header')))
footer = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'footer')))

results = []
while True:
    try:
        wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, "div.flightbox"), len(results)))
    except TimeoutException:
        break

    results = self.driver.find_elements_by_css_selector("div.flightbox")
    print len(results)

    # scroll
    for _ in xrange(5):
        self.driver.execute_script("""
            arguments[0].scrollIntoView();
            arguments[1].scrollIntoView();
        """, header, footer)
        sleep(1)