我正在尝试从https://search.yhd.com/c0-0-1003817/(指向特定产品的链接)中获取所有href链接,但是尽管我的代码可以运行,但仅获取30个链接。我不知道为什么会这样。你能帮我吗?
我一直在使用Selenium(python 3.7),但是以前我也尝试使用漂亮的汤来获取代码。那也不起作用。
from selenium import webdriver
import time
import requests
import pandas as pd
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome()
driver.get(link)
time.sleep(3)
# Save the links
listing_links = []
links = driver.find_elements_by_xpath('//a[@class="img"]')
for link in links:
listing_links.append(str(link.get_attribute('href')))
driver.close()
return listing_links
imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")
我应该获得60个链接,但是我的代码只能获得30个链接。
答案 0 :(得分:2)
在初始加载时,页面仅包含30个图像/链接。仅当您向下滚动时,它才会加载全部60个项目。您需要执行以下操作:
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(link)
time.sleep(3)
# scroll down: repeated to ensure it reaches the bottom and all items are loaded
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
# Save the links
listing_links = []
links = driver.find_elements_by_xpath('//a[@class="img"]')
for link in links:
listing_links.append(str(link.get_attribute('href')))
driver.close()
return listing_links
imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")
print(len(imported)) ## Output: 60