我正在将Selenium与Python结合使用。我正在尝试向下滚动Twitter页面。但是它不会向下滚动到页面末尾。它停在中间,twitter显示一条消息:"back to top"
。它甚至不显示页面最后一个月的所有帖子。这是我的页面:
users = ['BBCWorld']
username = browser.find_element_by_class_name("js-username-field")
username.send_keys("username")
password = browser.find_element_by_class_name("js-password-field")
password.send_keys("password")
signin_click = WebDriverWait(browser, 500000).until(
EC.element_to_be_clickable((By.XPATH, '//*[@id="page-container"]/div/div[1]/form/div[2]/button'))
)
signin_click.click()
for user in users:
# User's profile
browser.get('https://twitter.com/' + user)
time.sleep(0.5)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
# Quit browser
browser.quit()
答案 0 :(得分:1)
您忘记了这一点:
while True:
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
# break condition
if new_height == last_height:
break
last_height = new_height
您还拥有SCROLL_PAUSE_TIME = 0.5
,这不是很多,并且当要加载的帖子数量变多时,twitter会变慢。您必须增加此暂停时间。我会尝试SCROLL_PAUSE_TIME = 2
PS: :使用硬编码的暂停效果不是很有效。相反,您可以尝试在微博加载新内容时找到微调框或其他内容,然后等待微调框消失。这样会更优雅。