我想抓取具有按钮以加载更多内容的无限滚动页面,这是我的代码

时间:2019-07-28 19:21:18

标签: python web-scraping

问题是当我运行脚本时,我没有得到page_source,而硒停止单击脚本中断,也没有从page_source获得链接

from selenium import webdriver
from bs4 import BeautifulSoup 
from selenium.webdriver.support import ui 
import time


#url = ''

driver = webdriver.Chrome(executable_path='C:/Users/yacerpc/Desktop/chrome/chromedriver')
driver.get('https://www.white-river-gems.com/shop')

while driver.find_element_by_class_name("dn9KO"):
    wait = ui.WebDriverWait(driver, 10)
    button = wait.until(lambda driver: driver.find_element_by_class_name("dn9KO"))
    button.click()
    print("clicked")
    html = driver.page_source


soup = BeautifulSoup(html, 'html.parser')
page = soup.find('div',{'class':'_1hM3_ jw2qu'})
find_links = page.find_all('li')

for url in find_links:
    link =  url.find('a',{'class':'_2zTHN _2AHc6'}).get('href')
    print(link)

我希望输出从page_source获得链接

1 个答案:

答案 0 :(得分:0)

像这样尝试:

driver.set_script_timeout(120)
driver.execute_async_script("""
  var interval = setInterval(() => {
    var button = document.querySelector('[data-hook="load-more-button"]')
    if(button){
      button.click()
    } else {
      clearInterval(interval)
      arguments[0]()
    }
  }, 5000)
""")

请注意,您要选择[data-hook="load-more-button"],因为dn9KO看起来会在下一次部署中发生变化。