使用加载更多按钮 Python 抓取网站

时间:2021-07-18 07:42:35

标签: python selenium web-scraping

我想从网站上抓取产品链接(675 个产品)。第一页只有 24 件带有“显示下一个 23”按钮的产品。我尝试了两种方法来加载更多产品,以便获取它们的链接。

from selenium import webdriver
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get('https://www.3m.com.au/3M/en_AU/p/c/medical')

while True:
    try:
        more_button = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'MMM-- 
        btn MMM--btn_tertiary MMM--btn_noAnimation js-pageLoader wt-link wtLoaded mix- 
        MMM--btn_allCaps'))).click()
    except TimeoutException:
        break

我也试过

more_button = wait.until(EC.visibility_of_element_located((By.XPATH,'
//*@id="pageContent"]/div[3]/div/div/div[3]/div[5]/div[2]/div[3]/div/div[2]/
div[2]/a'))).click()

但这两种方法都无法点击“SHOW NEXT 24”按钮。我相信错误 403-forbidden 不会让我加载更多产品。

这是标签的截图: enter image description here

任何提示或解决方案将不胜感激。提前致谢。

1 个答案:

答案 0 :(得分:3)

import requests
import pandas as pd

params = {
    'ort': 'cp',
    'rt': 'cart',
    'cartridgeId': 'root/content/contents[0]/Results[0]'
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        allin = []
        for num in range(0, 675, 24):
            params['No'] = num
            r = req.get(url, params=params)
            for item in r.json()['Results'][0]['records']:
                allin.append([item.get('name', 'N/A'), item['detailsUrl']])

        df = pd.DataFrame(allin, columns=["Title", "Url"])
        print(df)


main(
    'https://www.3m.com.au/wps/PA_Snaps286/AjaxServlet/portlet286/prod/en_AU/https/www.3m.com.au/3M/en_AU/p/c/medical/')

输出:

                                                 Title                                               Url
0             3M™ Littmann® Cardiology IV™ Stethoscope     https://www.3m.com.au/3M/en_AU/p/d/b00037563/
1               3M™ Littmann® Classic III™ Stethoscope     https://www.3m.com.au/3M/en_AU/p/d/b00037556/
2    3M™ Coban™ Self-Adherent Wrap 1581, Tan, 25mm ...    https://www.3m.com.au/3M/en_AU/p/d/v000106081/
3    3M™ Coban™ Self-Adherent Wrap 1581B, Blue, 25m...    https://www.3m.com.au/3M/en_AU/p/d/v000106085/
4    3M™ Coban™ Self-Adherent Wrap 1582, Tan, 50mm ...    https://www.3m.com.au/3M/en_AU/p/d/v000077505/
..                                                 ...                                               ...
670  3M™ Littmann® Master Classic II Veterinary Ste...  https://www.3m.com.au/3M/en_AU/p/d/v101112000/1/
671          3M™ Synthetic Cast Stockinet MS02, 1RL/BX  https://www.3m.com.au/3M/en_AU/p/d/v000199505/1/
672  3M™ Red Dot™ Repositionable Monitoring Electro...  https://www.3m.com.au/3M/en_AU/p/d/v000154357/1/
673  3M™ Bair Hugger™ Warming Blanket, 55501, Paedi...  https://www.3m.com.au/3M/en_AU/p/d/v000253003/1/
674  3M™ Red Dot™ Repositionable Monitoring Electro...    https://www.3m.com.au/3M/en_AU/p/d/v000154308/

[675 rows x 2 columns]