硒只找到href链接的一小部分

时间:2019-05-26 18:19:09

标签: python-3.x selenium web-scraping

我正在尝试从this网页上获取所有产品的url,但我只得到了其中的一小部分。

我的第一个尝试是使用Beautifulsoup抓取该网页,但是后来我意识到硒会更好,因为我需要多次单击“显示更多”按钮。我也添加了一个代码来向下滚动页面,尽管那是问题所在,但结果没有改变。

import time   
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome(executable_path="")
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    listing_links = []  

    while True:
        try:
            driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="main-content"]/div[2]/div[2]/div[4]/button'))))
            driver.execute_script("arguments[0].click();", WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#main-content > div:nth-child(2) > div.main-column > div.btn-wrapper.center > button"))))
            print("Button clicked")
            links = driver.find_elements_by_class_name('fop-contentWrapper')
            for link in links:
                algo=link.find_element_by_css_selector('.fop-contentWrapper a').get_attribute('href')
                print(algo)
                listing_links.append(str(algo))
        except:
            print("No more Buttons")
            break

    driver.close()
    return listing_links 

fresh_food = getListingLinks("https://www.ocado.com/browse/fresh-20002")

print(len(fresh_food))  ## Output: 228

如您所见,我得到228个网址,而我想获得5605个链接,根据Ocado,这是网页中的实际产品数量。我相信我的代码顺序有问题,但是找不到正确的顺序。衷心感谢您的帮助。

0 个答案:

没有答案