Question

我尝试抓捕一个电子商务网站，以找出每个类别中有哪些商品正在销售。该代码贯穿30页，每页包含30种产品。下面的代码为每个类别76提供相同的答案，但这是不正确的。我不完全确定为什么每次循环浏览页面时都会不断添加2，以及如何解决此问题。我觉得这是一个很小的页面，但似乎无法找出罪魁祸首。

可以通过.price-standard类来标识正在销售的产品。

import re

import requests
from bs4 import BeautifulSoup

urls = {
    "Charms": "https://us.pandora.net/en/charms/?sz=30&start={}&format=page-element",
    "Bracelets": "https://us.pandora.net/en/bracelets/?sz=30&start={}&format=page-element",
    "Rings": "https://us.pandora.net/en/rings/?sz=30&start={}&format=page-element",
    "Necklaces": "https://us.pandora.net/en/necklaces/?sz=30&start={}&format=page-element",
    "Earrings": "https://us.pandora.net/en/earrings/?sz=30&start={}&format=page-element"
}

#checks each item for whether it's on sale - which is classed by .price-standard
def fetch_items(link,page):
    Total_items = 0 
    while page<=900:
        #print("current page no: ",page)
        res = requests.get(link.format(page),headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"})
        soup = BeautifulSoup(res.text,"lxml")
        list_total = soup.select('.grid-tile .price-standard') #this is where the information can be found  
        Total_items += len(list_total)
        #print(Total_items)
        page+=30
    return Total_items


if __name__ == "__main__":
    page = 0
    total_items = fetch_items(url,page)

    #I try to make it print the Total for each category (charms, bracelets, rings, necklaces, earrings)    
    for category, url in urls.items():
        print("Total {}: {}".format(category, total_items))

编辑：可以，伙计们！这就是结果。

Total Charms: 295
Total Bracelets: 47
Total Rings: 174
Total Necklaces: 132
Total Earrings: 76

Answer 1

我认为您需要将total_items = fetch_items(url,page)放入循环中。

此代码仅获取一次，似乎url变量在其他位置定义。

代码未按预期循环-每个URL的结果相同

1 个答案: