这是我可以找到的唯一向下滚动到页面末尾的代码,其他任何操作均无效。问题是,尽管While True语句永远不会完成,并且即使它触底,它仍会继续尝试向下滚动,因此永远不会进行下一步的打印。如何结束While True语句并打印结果?谢谢
from selenium import webdriver
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
Errors I'm receiving
>>> from selenium import webdriver
>>> url = 'http://www.tradingview.com/screener'
>>> driver = webdriver.Firefox()
>>> driver.get(url)
>>>
>>> # Get scroll height
... last_height = driver.execute_script("return document.body.scrollHeight")
>>>
>>> selector = '.js-field-total.tv-screener-table__field-value--total'
>>> matches = driver.find_element_by_css_selector(selector)
>>> matches = int(matches.text.split()[0])
>>>
>>> visible_rows = 0
>>> scrolls = 0
>>>
>>> while visible_rows < matches:
...
File "<stdin>", line 2
^
IndentationError: expected an indented block
>>> driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
File "<stdin>", line 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
^
IndentationError: unexpected indent
>>>
>>> # Wait 10 scrolls before updating row information
... if scrolls == 10:
File "<stdin>", line 2
if scrolls == 10:
^
IndentationError: unexpected indent
>>> table = driver.find_elements_by_class_name('tv-data-table__tbody')
File "<stdin>", line 1
table = driver.find_elements_by_class_name('tv-data-table__tbody')
^
IndentationError: unexpected indent
>>> visible_rows = len(table[1].find_elements_by_tag_name('tr'))
File "<stdin>", line 1
visible_rows = len(table[1].find_elements_by_tag_name('tr'))
^
IndentationError: unexpected indent
>>> scrolls = 0
File "<stdin>", line 1
scrolls = 0
^
IndentationError: unexpected indent
>>>
>>> scrolls += 1
File "<stdin>", line 1
scrolls += 1
^
IndentationError: unexpected indent
>>>
>>> # will give a list of all tickers
... tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
>>>
>>> for index in range(len(tickers)):
... print("Row " + tickers[index].text + " ")
...
答案 0 :(得分:0)
在置顶栏中,它告诉您表中有多少行(匹配项)。因此,一种选择是将可见行数与总行数进行比较。当达到该数目(可见行)时,就退出循环。
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])
visible_rows = 0
scrolls = 0
while visible_rows < matches:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait 10 scrolls before updating row information
if scrolls == 10:
table = driver.find_elements_by_class_name('tv-data-table__tbody')
visible_rows = len(table[1].find_elements_by_tag_name('tr'))
scrolls = 0
scrolls += 1
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
编辑:由于您的设置似乎不允许使用先前的解决方案,因此您可以尝试使用另一种方法。该页面一次加载150行。因此,我们可以使用期望的总匹配数/行数(例如4894),而不是计算可见行的数量,然后将其除以150,以得到滚动所需的次数。如果我们至少滚动多次,理论上,所有行都应该可见,我们可以继续执行代码。
from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)
try:
selector = '.js-field-total.tv-screener-table__field-value--total'
condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
matches = WebDriverWait(driver, 10).until(condition)
matches = int(matches.text.split()[0])
except (TimeoutException, Exception):
print ('Problem finding matches, setting default...')
matches = 4895 # Set default
# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)
for _ in range(num_loops):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(2) # Pause briefly to allow loading time
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
n_tickers = len(tickers)
msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))
for index in range(n_tickers):
print("Row " + tickers[index].text + " ")