代码未按预期循环-每个URL的结果相同

时间:2019-07-07 19:18:29

标签: python python-3.x web-scraping

我尝试抓捕一个电子商务网站,以找出每个类别中有哪些商品正在销售。该代码贯穿30页,每页包含30种产品。 下面的代码为每个类别76提供相同的答案,但这是不正确的。我不完全确定为什么每次循环浏览页面时都会不断添加2,以及如何解决此问题。 我觉得这是一个很小的页面,但似乎无法找出罪魁祸首。

可以通过.price-standard类来标识正在销售的产品。

import re

import requests
from bs4 import BeautifulSoup

urls = {
    "Charms": "https://us.pandora.net/en/charms/?sz=30&start={}&format=page-element",
    "Bracelets": "https://us.pandora.net/en/bracelets/?sz=30&start={}&format=page-element",
    "Rings": "https://us.pandora.net/en/rings/?sz=30&start={}&format=page-element",
    "Necklaces": "https://us.pandora.net/en/necklaces/?sz=30&start={}&format=page-element",
    "Earrings": "https://us.pandora.net/en/earrings/?sz=30&start={}&format=page-element"
}

#checks each item for whether it's on sale - which is classed by .price-standard
def fetch_items(link,page):
    Total_items = 0 
    while page<=900:
        #print("current page no: ",page)
        res = requests.get(link.format(page),headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"})
        soup = BeautifulSoup(res.text,"lxml")
        list_total = soup.select('.grid-tile .price-standard') #this is where the information can be found  
        Total_items += len(list_total)
        #print(Total_items)
        page+=30
    return Total_items


if __name__ == "__main__":
    page = 0
    total_items = fetch_items(url,page)

    #I try to make it print the Total for each category (charms, bracelets, rings, necklaces, earrings)    
    for category, url in urls.items():
        print("Total {}: {}".format(category, total_items))

编辑: 可以,伙计们! 这就是结果。

Total Charms: 295
Total Bracelets: 47
Total Rings: 174
Total Necklaces: 132
Total Earrings: 76

1 个答案:

答案 0 :(得分:0)

我认为您需要将total_items = fetch_items(url,page)放入循环中。

此代码仅获取一次,似乎url变量在其他位置定义。