我正在研究一些使用Selenium网络驱动程序的代码 - Firefox。大多数事情似乎都有效,但当我尝试将浏览器更改为PhantomJS时,它开始表现不同。
我正在处理的页面需要慢慢滚动才能加载越来越多的结果,这可能就是问题所在。
以下是适用于Firefox webdriver的代码,但不适用于PhantomJS:
def get_url(destination,start_date,end_date): #the date is like %Y-%m-%d
return "https://www.pelikan.sk/sk/flights/listdfc=%s&dtc=C%s&rfc=C%s&rtc=%s&dd=%s&rd=%s&px=1000&ns=0&prc=&rng=0&rbd=0&ct=0&view=list" % ('CVIE%20BUD%20BTS',destination, destination,'CVIE%20BUD%20BTS', start_date, end_date)
def load_whole_page(self,destination,start_date,end_date):
deb()
url = get_url(destination,start_date,end_date)
self.driver.maximize_window()
self.driver.get(url)
wait = WebDriverWait(self.driver, 60)
wait.until(EC.invisibility_of_element_located((By.XPATH, '//img[contains(@src, "loading")]')))
wait.until(EC.invisibility_of_element_located((By.XPATH,
u'//div[. = "Poprosíme o trpezlivosť, hľadáme pre Vás ešte viac letov"]/preceding-sibling::img')))
i=0
old_driver_html = ''
end = False
while end==False:
i+=1
results = self.driver.find_elements_by_css_selector("div.flightbox")
print len(results)
if len(results)>=__THRESHOLD__: # for testing purposes. Default value: 999
break
try:
self.driver.execute_script("arguments[0].scrollIntoView();", results[0])
self.driver.execute_script("arguments[0].scrollIntoView();", results[-1])
except:
self.driver.save_screenshot('screen_before_'+str()+'.png')
sleep(2)
print 'EXCEPTION<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'
continue
new_driver_html = self.driver.page_source
if new_driver_html == old_driver_html:
print 'END OF PAGE'
break
old_driver_html = new_driver_html
wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, 'div.flightbox'), len(results)))
sleep(10)
要检测页面何时满载,我会比较html和新html的旧副本,这可能不是我应该做的,但是使用Firefox就足够了。
这是加载停止时PhantomJS的屏幕:
使用Firefox,它会加载越来越多的结果,但是使用PhantomJS时,它会停留在10个结果上。
有什么想法吗?这两个驱动程序有什么区别?
答案 0 :(得分:2)
帮助我解决问题的两个关键因素:
window.document.body.scrollTop
设置为0,然后连续设置为document.body.scrollHeight
工作代码:
results = []
while len(results) < 200:
results = driver.find_elements_by_css_selector("div.flightbox")
print len(results)
# scroll
driver.execute_script("arguments[0].scrollIntoView();", results[0])
driver.execute_script("window.document.body.scrollTop = 0;")
driver.execute_script("window.document.body.scrollTop = document.body.scrollHeight;")
driver.execute_script("arguments[0].scrollIntoView();", results[-1])
版本2 (无限循环,如果滚动中没有任何内容加载则停止):
results = []
while True:
try:
wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, "div.flightbox"), len(results)))
except TimeoutException:
break
results = self.driver.find_elements_by_css_selector("div.flightbox")
print len(results)
# scroll
for _ in xrange(5):
try:
self.driver.execute_script("""
arguments[0].scrollIntoView();
window.document.body.scrollTop = 0;
window.document.body.scrollTop = document.body.scrollHeight;
arguments[1].scrollIntoView();
""", results[0], results[-1])
except StaleElementReferenceException:
break # here it means more results were loaded
print "DONE. Result count: %d" % len(results)
请注意,我已在wait_for_more_than_n_elements
预期条件中更改了比较。取代:
return count >= self.count
使用:
return count > self.count
版本3 (多次从页眉滚动到页脚):
header = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'header')))
footer = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'footer')))
results = []
while True:
try:
wait.until(wait_for_more_than_n_elements((By.CSS_SELECTOR, "div.flightbox"), len(results)))
except TimeoutException:
break
results = self.driver.find_elements_by_css_selector("div.flightbox")
print len(results)
# scroll
for _ in xrange(5):
self.driver.execute_script("""
arguments[0].scrollIntoView();
arguments[1].scrollIntoView();
""", header, footer)
sleep(1)