我正在尝试从this网页上获取所有产品的url,但我只得到了其中的一小部分。
我的第一个尝试是使用Beautifulsoup抓取该网页,但是后来我意识到硒会更好,因为我需要多次单击“显示更多”按钮。我也添加了一个代码来向下滚动页面,尽管那是问题所在,但结果没有改变。
import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome(executable_path="")
driver.maximize_window()
driver.get(link)
time.sleep(3)
# scroll down: repeated to ensure it reaches the bottom and all items are loaded
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
listing_links = []
while True:
try:
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="main-content"]/div[2]/div[2]/div[4]/button'))))
driver.execute_script("arguments[0].click();", WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#main-content > div:nth-child(2) > div.main-column > div.btn-wrapper.center > button"))))
print("Button clicked")
links = driver.find_elements_by_class_name('fop-contentWrapper')
for link in links:
algo=link.find_element_by_css_selector('.fop-contentWrapper a').get_attribute('href')
print(algo)
listing_links.append(str(algo))
except:
print("No more Buttons")
break
driver.close()
return listing_links
fresh_food = getListingLinks("https://www.ocado.com/browse/fresh-20002")
print(len(fresh_food)) ## Output: 228
如您所见,我得到228个网址,而我想获得5605个链接,根据Ocado,这是网页中的实际产品数量。我相信我的代码顺序有问题,但是找不到正确的顺序。衷心感谢您的帮助。