Selenium不能从网页上获取所有的href

时间:2019-04-18 02:01:52

标签: python-3.x selenium

我正在尝试从https://search.yhd.com/c0-0-1003817/(指向特定产品的链接)中获取所有href链接,但是尽管我的代码可以运行,但仅获取30个链接。我不知道为什么会这样。你能帮我吗?

我一直在使用Selenium(python 3.7),但是以前我也尝试使用漂亮的汤来获取代码。那也不起作用。

from selenium import webdriver 
import time
import requests
import pandas as pd

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.get(link)
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

我应该获得60个链接,但是我的代码只能获得30个链接。

1 个答案:

答案 0 :(得分:2)

在初始加载时,页面仅包含30个图像/链接。仅当您向下滚动时,它才会加载全部60个项目。您需要执行以下操作:

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

print(len(imported))  ## Output:  60