硒滚动难题

时间:2018-06-20 11:05:07

标签: python selenium-webdriver

这是我可以找到的唯一向下滚动到页面末尾的代码,其他任何操作均无效。问题是,尽管While True语句永远不会完成,并且即使它触底,它仍会继续尝试向下滚动,因此永远不会进行下一步的打印。如何结束While True语句并打印结果?谢谢

 from selenium import webdriver

    url = 'http://www.tradingview.com/screener'
    driver = webdriver.Firefox()
    driver.get(url)

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # will give a list of all tickers
    tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

    for index in range(len(tickers)):
       print("Row " + tickers[index].text + " ") 

Errors I'm receiving


>>> from selenium import webdriver
>>> url = 'http://www.tradingview.com/screener'
>>> driver = webdriver.Firefox()
>>> driver.get(url)
>>>
>>> # Get scroll height
... last_height = driver.execute_script("return document.body.scrollHeight")
>>>
>>> selector = '.js-field-total.tv-screener-table__field-value--total'
>>> matches = driver.find_element_by_css_selector(selector)
>>> matches = int(matches.text.split()[0])
>>>
>>> visible_rows = 0
>>> scrolls = 0
>>>
>>> while visible_rows < matches:
...
  File "<stdin>", line 2

    ^
IndentationError: expected an indented block
>>>     driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  File "<stdin>", line 1
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    ^
IndentationError: unexpected indent
>>>
>>>     # Wait 10 scrolls before updating row information
...     if scrolls == 10:
  File "<stdin>", line 2
    if scrolls == 10:
    ^
IndentationError: unexpected indent
>>>         table = driver.find_elements_by_class_name('tv-data-table__tbody')
  File "<stdin>", line 1
    table = driver.find_elements_by_class_name('tv-data-table__tbody')
    ^
IndentationError: unexpected indent
>>>         visible_rows = len(table[1].find_elements_by_tag_name('tr'))
  File "<stdin>", line 1
    visible_rows = len(table[1].find_elements_by_tag_name('tr'))
    ^
IndentationError: unexpected indent
>>>         scrolls = 0
  File "<stdin>", line 1
    scrolls = 0
    ^
IndentationError: unexpected indent
>>>
>>>     scrolls += 1
  File "<stdin>", line 1
    scrolls += 1
    ^
IndentationError: unexpected indent
>>>
>>> # will give a list of all tickers
... tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
>>>
>>> for index in range(len(tickers)):
...    print("Row " + tickers[index].text + " ")
...

1 个答案:

答案 0 :(得分:0)

在置顶栏中,它告诉您表中有多少行(匹配项)。因此,一种选择是将可见行数与总行数进行比较。当达到该数目(可见行)时,就退出循环。

url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])

visible_rows = 0
scrolls = 0

while visible_rows < matches:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait 10 scrolls before updating row information 
    if scrolls == 10:
        table = driver.find_elements_by_class_name('tv-data-table__tbody')
        visible_rows = len(table[1].find_elements_by_tag_name('tr'))
        scrolls = 0

    scrolls += 1

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

for index in range(len(tickers)):
   print("Row " + tickers[index].text + " ") 

编辑:由于您的设置似乎不允许使用先前的解决方案,因此您可以尝试使用另一种方法。该页面一次加载150行。因此,我们可以使用期望的总匹配数/行数(例如4894),而不是计算可见行的数量,然后将其除以150,以得到滚动所需的次数。如果我们至少滚动多次,理论上,所有行都应该可见,我们可以继续执行代码。

from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)

try:

    selector = '.js-field-total.tv-screener-table__field-value--total'
    condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
    matches = WebDriverWait(driver, 10).until(condition)
    matches = int(matches.text.split()[0])

except (TimeoutException, Exception):
    print ('Problem finding matches, setting default...')
    matches = 4895 # Set default

# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)

for _ in range(num_loops):

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(2) # Pause briefly to allow loading time

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

n_tickers = len(tickers)

msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))

for index in range(n_tickers):
    print("Row " + tickers[index].text + " ")