我正试图在genius.com上为某些艺术家的歌曲页面链接,但我遇到了问题,因为单个歌曲页面的链接显示在弹出模式窗口中。
模态窗口不会一次性加载所有链接,而是在向下滚动到模式底部时通过ajax加载更多内容。
我尝试使用代码滚动到页面底部但不幸的是,它只是在模态后面的窗口中滚动而不是模态本身:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
然后我尝试选择模态中的最后一个元素并滚动到那个(想要这样做几次,直到所有的歌曲页面都被加载),但是它不会滚动得足够远以获得网站加载更多内容
last_element = driver.find_elements_by_xpath('//div[@class="mini_card-metadata"]')[-1]
last_element.location_once_scrolled_into_view
到目前为止,这是我的代码:
import os
from bs4 import BeautifulSoup
from selenium import webdriver
chrome_driver = "/Applications/chromedriver"
os.environ["webdriver.chrome.driver"] = chrome_driver
driver = webdriver.Chrome(chrome_driver)
base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)
xpath_str = '//div[contains(text(),"Show all songs by Stormzy")]'
driver.find_element_by_xpath(xpath_str).click()
有没有办法提取艺术家的所有歌曲页面链接?
答案 0 :(得分:1)
尝试使用以下代码获取所需的输出:
from selenium import webdriver as web
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
driver = web.Chrome()
base_url = 'https://genius.com/artists/Stormzy'
driver.get(base_url)
# Open modal
driver.find_element_by_xpath('//div[normalize-space()="Show all songs by Stormzy"]').click()
song_locator = By.CSS_SELECTOR, 'a.mini_card.mini_card--small'
# Wait for first XHR complete
wait(driver, 10).until(EC.visibility_of_element_located(song_locator))
# Get current length of songs list
current_len = len(driver.find_elements(*song_locator))
while True:
# Load new XHR until it's possible
driver.find_element(*song_locator).send_keys(Keys.END)
try:
wait(driver, 3).until(lambda x: len(driver.find_elements(*song_locator)) > current_len)
current_len = len(driver.find_elements(*song_locator))
# Return full list of songs
except TimeoutException:
songs_list = [song.get_attribute('href') for song in driver.find_elements(*song_locator)]
break
print(songs_list)
这应该允许你请求新的XHR
,直到歌曲长度列表变得不变并最终返回链接列表
答案 1 :(得分:0)
当您滚动到模态对话框的底部时,它会调用
$scrollable_data_ctrl.load_next();
作为选项,您可以尝试执行它,直到新结果出现在模态
中driver.execute_script("$scrollable_data_ctrl.load_next();")