我正在尝试通过使用网络驱动程序和漂亮的汤4使用python进行网络抓取。 我正在尝试decathlon.fr上的代码,问题是包含产品的类是“文章”类。而且,当我启动程序时,它什么也没做,因为我需要将该类设为“ div”类或“ a”类。
from bs4 import BeautifulSoup
driver = webdriver.Chrome("chromedriver.exe")
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("https://www.decathlon.fr/search?Ntt=t+shirt+anti+uv+b%C3%A9b%C3%A9")
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('article', attrs={'class':'dkt-product.js-product-slider-init.product-printed'}):
name=a.find('a', attrs={'class':'dkt-product__title__wrapper'})
price=a.find('div', attrs={'class':'dkt-product__price'})
#rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
rating=a.find('span', attrs={'itemprop':'name'})
products.append(name)
prices.append(price)
ratings.append(rating)
print(products)
print(prices)
print(ratings)
答案 0 :(得分:0)
您需要对代码进行简单的更改:
soup = BeautifulSoup(content, 'lxml') # here add 'lxml'
for a in soup.findAll('article', attrs={'class':'dkt-product js-product-slider-init product-printed'}): # here, removed dots
name= a.find('a', attrs={'class':'dkt-product__title__wrapper'})
price=a.find('div', attrs={'class':'dkt-price__cartridge'}) # here
#rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
rating=a.find('span', attrs={'itemprop':'ratingValue'}) # here
products.append(name)
prices.append(price)
ratings.append(rating)