在Python中使用Selenium滚动模态窗口

时间:2017-05-25 01:49:14

标签: python selenium selenium-webdriver beautifulsoup

我正试图在genius.com上为某些艺术家的歌曲页面链接,但我遇到了问题,因为单个歌曲页面的链接显示在弹出模式窗口中。

模态窗口不会一次性加载所有链接,而是在向下滚动到模式底部时通过ajax加载更多内容。

我尝试使用代码滚动到页面底部但不幸的是,它只是在模态后面的窗口中滚动而不是模态本身:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

然后我尝试选择模态中的最后一个元素并滚动到那个(想要这样做几次,直到所有的歌曲页面都被加载),但是它不会滚动得足够远以获得网站加载更多内容

last_element = driver.find_elements_by_xpath('//div[@class="mini_card-metadata"]')[-1]
last_element.location_once_scrolled_into_view

到目前为止,这是我的代码:

import os
from bs4 import BeautifulSoup
from selenium import webdriver

chrome_driver = "/Applications/chromedriver"
os.environ["webdriver.chrome.driver"] = chrome_driver
driver = webdriver.Chrome(chrome_driver)

base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)

xpath_str = '//div[contains(text(),"Show all songs by Stormzy")]'
driver.find_element_by_xpath(xpath_str).click()

有没有办法提取艺术家的所有歌曲页面链接?

2 个答案:

答案 0 :(得分:1)

尝试使用以下代码获取所需的输出:

from selenium import webdriver as web
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException

driver = web.Chrome()
base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)

# Open modal
driver.find_element_by_xpath('//div[normalize-space()="Show all songs by Stormzy"]').click()
song_locator = By.CSS_SELECTOR, 'a.mini_card.mini_card--small'
# Wait for first XHR complete
wait(driver, 10).until(EC.visibility_of_element_located(song_locator))
# Get current length of songs list
current_len = len(driver.find_elements(*song_locator))

while True:
    # Load new XHR until it's possible
    driver.find_element(*song_locator).send_keys(Keys.END)
    try:
        wait(driver, 3).until(lambda x: len(driver.find_elements(*song_locator)) > current_len)
        current_len = len(driver.find_elements(*song_locator))
    # Return full list of songs
    except TimeoutException:
        songs_list = [song.get_attribute('href') for song in driver.find_elements(*song_locator)]
        break

print(songs_list)

这应该允许你请求新的XHR,直到歌曲长度列表变得不变并最终返回链接列表

答案 1 :(得分:0)

当您滚动到模态对话框的底部时,它会调用

$scrollable_data_ctrl.load_next();

作为选项,您可以尝试执行它,直到新结果出现在模态

driver.execute_script("$scrollable_data_ctrl.load_next();")