我尝试抓捕一个电子商务网站,以找出每个类别中有哪些商品正在销售。该代码贯穿30页,每页包含30种产品。 下面的代码为每个类别76提供相同的答案,但这是不正确的。我不完全确定为什么每次循环浏览页面时都会不断添加2,以及如何解决此问题。 我觉得这是一个很小的页面,但似乎无法找出罪魁祸首。
可以通过.price-standard
类来标识正在销售的产品。
import re
import requests
from bs4 import BeautifulSoup
urls = {
"Charms": "https://us.pandora.net/en/charms/?sz=30&start={}&format=page-element",
"Bracelets": "https://us.pandora.net/en/bracelets/?sz=30&start={}&format=page-element",
"Rings": "https://us.pandora.net/en/rings/?sz=30&start={}&format=page-element",
"Necklaces": "https://us.pandora.net/en/necklaces/?sz=30&start={}&format=page-element",
"Earrings": "https://us.pandora.net/en/earrings/?sz=30&start={}&format=page-element"
}
#checks each item for whether it's on sale - which is classed by .price-standard
def fetch_items(link,page):
Total_items = 0
while page<=900:
#print("current page no: ",page)
res = requests.get(link.format(page),headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"})
soup = BeautifulSoup(res.text,"lxml")
list_total = soup.select('.grid-tile .price-standard') #this is where the information can be found
Total_items += len(list_total)
#print(Total_items)
page+=30
return Total_items
if __name__ == "__main__":
page = 0
total_items = fetch_items(url,page)
#I try to make it print the Total for each category (charms, bracelets, rings, necklaces, earrings)
for category, url in urls.items():
print("Total {}: {}".format(category, total_items))
编辑: 可以,伙计们! 这就是结果。
Total Charms: 295
Total Bracelets: 47
Total Rings: 174
Total Necklaces: 132
Total Earrings: 76
答案 0 :(得分:0)
我认为您需要将total_items = fetch_items(url,page)
放入循环中。
此代码仅获取一次,似乎url
变量在其他位置定义。