Twitter使用Selenium Python向下滚动所有帖子

时间:2018-08-07 00:05:36

标签: python selenium

我正在将Selenium与Python结合使用。我正在尝试向下滚动Twitter页面。但是它不会向下滚动到页面末尾。它停在中间,twitter显示一条消息:"back to top"。它甚至不显示页面最后一个月的所有帖子。这是我的页面:

users = ['BBCWorld']

    username = browser.find_element_by_class_name("js-username-field")
    username.send_keys("username")
    password = browser.find_element_by_class_name("js-password-field")
    password.send_keys("password")

    signin_click = WebDriverWait(browser, 500000).until(
            EC.element_to_be_clickable((By.XPATH, '//*[@id="page-container"]/div/div[1]/form/div[2]/button'))
        )
    signin_click.click()

    for user in users:
        # User's profile
        browser.get('https://twitter.com/' + user)

        time.sleep(0.5)

        SCROLL_PAUSE_TIME = 0.5

        # Get scroll height
        last_height = browser.execute_script("return document.body.scrollHeight")

        while True:
            # Scroll down to bottom
            browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")

            # Wait to load page
            time.sleep(SCROLL_PAUSE_TIME)


            # Calculate new scroll height and compare with last scroll height
            new_height = browser.execute_script("return document.body.scrollHeight")



        # Quit browser
        browser.quit()

1 个答案:

答案 0 :(得分:1)

您忘记了这一点:

while True:
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)


    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")

    # break condition
    if new_height == last_height:
        break
    last_height = new_height

您还拥有SCROLL_PAUSE_TIME = 0.5,这不是很多,并且当要加载的帖子数量变多时,twitter会变慢。您必须增加此暂停时间。我会尝试SCROLL_PAUSE_TIME = 2

PS: :使用硬编码的暂停效果不是很有效。相反,您可以尝试在微博加载新内容时找到微调框或其他内容,然后等待微调框消失。这样会更优雅。