使用 Beautiful Soup 4 从 Sportchek 中抓取搜索结果以查找价格

时间:2021-04-13 20:22:59

标签: python python-3.x web-scraping beautifulsoup python-requests

所以我正在尝试使用 BS4 从 Sportchek 中抓取搜索结果,特别是此链接 "https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1 “。我想从这里获取鞋子的价格并将它们全部放入一个系统中进行排序,但是,要做到这一点,我需要先获取价格,但我找不到办法做到这一点。在 HTML 中,类是 product-price-text 但我无法从中收集任何信息。这时候,哪怕只拿到1双鞋的价格也无妨。我只需要在 BS4 上抓取任何与类相关的东西的帮助,因为它都不起作用。我试过了

print(soup.find_all("span", class_="product-price-text"))

即使那样也行不通,所以请帮忙。

2 个答案:

答案 0 :(得分:1)

数据通过 JavaScript 动态加载。您可以使用 requests 模块加载它:

import json
import requests

url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&lastVisibleProductNumber=12&x1=ast-id-level-3&q1=men%3A%3Ashoes-footwear%3A%3Abasketball&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&count=24"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}

data = requests.get(url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for p in data["products"]:
    print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))

打印:

332799300  83.97      Nike Unisex KD Trey 5 VII TB Basketball Shoes - Black/White/Volt - Black
333323940  180.0      Nike Men's Air Jordan 1 Zoom Air Comfort Basketball Shoes - Black/Chile Red-white-university Gold
333107663  134.99     Nike Men's Mamba Fury Basketball Shoes - Black/Smoke Grey/White
333003748  134.99     Nike Men's Lebron Witness IV Basketball Shoes - White
333003606  104.99     Nike Men's Kyrie Flytrap III Basketball Shoes - Black/Uni Red/Bright Crimson
333003543  94.99      Nike Men's Precision III Basketball Shoes - Black/White
333107554  94.99      Nike Men's Precision IV Basketball Shoes - Black/Mtlc Gold/Dk Smoke Grey
333107404  215.0      Nike Men's LeBron XVII Low Basketball Shoes - Black/White/Multicolor
333107617  119.99     Nike Men's KD Trey 5 VIII Basketball Shoes - Black/White-aurora Green/Smoke Grey
333166326  125.98     Nike Men's KD13 Basketball Shoes - Black/White-wolf Grey
333166731  138.98     Nike Men's LeBron XVII Low Basketball Shoes - Particle Grey/White-lt Smoke Grey-black
333183810  129.99     adidas Men's D.O.N 2 Basketball Shoes - Gold/Black/Gold
333206770  111.97     Under Armour Men's Embid Basketball Shoes - Red/White
333181809  165.0      Nike Men's Air Jordan React Elevation Basketball Shoes - Black/White-lt Smoke Grey-volt
333307276  104.99     adidas Men's Harden Stepback 2 Basketball Shoes - White/Blackwhite/Black
333017256  89.99      Under Armour Men's Jet Mid Sneaker - Black/Halo Grey
332912833  134.99     Nike Men's Zoom LeBron Witness IV Running Shoes - Black/Gym Red/University Red
332799162  79.88      Under Armour Men's Curry 7 "Quiet Eye" Basketball Shoes - Black - Black
333276525  119.99     Nike Men's Kyrie Flytrap IV Basketball Shoes - Black/White-metallic Silver
333106290  145.97     Nike Men's KD13 Basketball Shoes - Black/White/Wolf Grey
333181345  144.99     Nike Men's PG 4 TB Basketball Shoes - Black/White-pure Platinum
333241817  149.99     PUMA Men's Clyde All-Pro Basketball Shoes - Puma White/Blue Atolpuma White/Blue Atol
333186052  77.97      adidas Men's Harden Stepback Basketball Shoes - Black/Gold/White
333316063  245.0      Nike Men's Air Jordan 13 Retro Basketball Shoes - White/Blackwhite/Starfish-black

编辑:提取 API 网址:

import re
import json
import requests

# your URL:
url = "https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1"

api_url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&x1=ast-id-level-3&q1={cat}&count=24"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}
html_text = requests.get(url, headers=headers).text
cat = re.search(r"br_data\.cat_id=\'(.*?)';", html_text).group(1)

data = requests.get(api_url.format(cat=cat), headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for p in data["products"]:
    print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))

答案 1 :(得分:0)

使用 Selenium

from selenium import webdriver
from bs4 import BeautifulSoup
import time


browser = webdriver.Chrome('/home/cam/Downloads/chromedriver')
url='https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1'
browser.get(url)
time.sleep(10)
html = browser.page_source
soup = BeautifulSoup(html)

def get_data():
    links = soup.find_all('span', attrs={'class':"product-price-text"})
    for i in set(links):
        print(i.text)
    
get_data()

输出:

$245.00
$215.00 
$144.99
$165.00
$129.99
$104.99
$149.99
$195.00 
$180.00
$119.99
$134.99
$89.99
$94.99
$215.00