使用Beautiful Soup的Python Web Scrape-从页面返回所有产品详细信息

时间:2019-02-09 13:00:29

标签: python web-scraping beautifulsoup

为这个新手问题道歉,但我才刚刚开始我的Python旅程,并开始学习网络抓取。

我已经写了一些代码来刮擦时尚网站并返回一些产品信息。我真正想做的是刮掉主要类别页面并提取所有产品名称和价格。我认为我将需要使用FOR循环,并且尝试过在本网站上找到的各种迭代,但似乎无法正常工作。

我想提取页面上所有项目的产品名称和价格,以便随后导出。下面的代码可以很好地返回页面上的第一项,但是我不确定如何添加循环以获取其余内容。

import requests
from bs4 import BeautifulSoup
url = 'https://www.riverisland.com/c/men/seasonal-offers?icid=mhp/winter-treats/m/seasonal-offers/cat'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

data_item = []
  for item in name_box, price_box:
  data_item.append()

  name_box = soup.find('div', attrs={'class':'product__title ui-body-text'})
  price_box = soup.find('div', attrs={'class':'product-price__headline-product-price__headline--sale'})

  name = name_box.text.strip()
  price = price_box.text.strip()

3 个答案:

答案 0 :(得分:1)

您需要获取页面中的所有产品。 find仅会为您带来第一个产品。您需要使用find_all来获取页面中的所有产品。然后,您可以遍历并打印它们。

import requests
from bs4 import BeautifulSoup
url = 'https://www.riverisland.com/c/men/seasonal-offers?icid=mhp/winter-treats/m/seasonal-offers/cat'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

name_box = soup.find_all('div', attrs={'class':'product__title ui-body-text'})
price_box = soup.find_all('div', attrs={'class':'product-price__headline product-price__headline--sale'})

for product in zip(name_box,price_box):
    name,price=product
    name_proper=name.text.strip()
    price_proper=price.text.strip()
    print(name_proper,'-',price_proper)

输出

Bellfield navy three-in-one mac coat - £50.00
Black rib muscle fit short sleeve T-shirt - £12.00
Criminal Damage black colour block zip jacket - £50.00
Jack & Jones Premium green puffer gilet - £30.00
Jack & Jones red faux fur bomber jacket - £50.00
Jack & Jones black parka jacket - £70.00
Light grey ribbed muscle fit T-shirt - £12.00
Navy satin velour panel slim fit T-shirt - £12.00
Pepe Jeans light blue denim jacket - £90.00
Navy slim fit tape crew neck T-shirt - £12.00
Superdry green camo parka jacket - £90.00
Superdry green double zip Fuji padded jacket - £60.00
Superdry green hooded parka jacket - £80.00
Superdry navy hooded quilted jacket - £80.00
Superdry navy triple zip funnel neck jacket - £60.00
Superdry red zip funnel neck puffer jacket - £60.00
Superdry yellow lightweight hooded jacket - £70.00
Superdry black camo funnel neck coat - £70.00
Superdry black double zip Fuji padded jacket - £60.00
Superdry black funnel neck puffer jacket - £60.00
Superdry blue lightweight hooded jacket - £70.00
Superdry green army jacket - £60.00
Only & Sons black hooded puffer jacket - £40.00
Pepe Jeans dark blue denim jacket - £90.00
Red waffle slim fit short sleeve T-shirt - £12.00
Selected Homme black stripe long sleeve top - £50.00
White waffle slim fit short sleeve T-shirt - £12.00
Big and Tall R96 burgundy muscle fit T-shirt - £12.00
Black Dean straight leg jeans - £20.00
Black R96 muscle fit long sleeve T-shirt - £12.00
Black R96 pique muscle fit long sleeve shirt - £15.00
Black ribbed crew neck long sleeve top - £12.00
Black velour R96 slim fit piped joggers - £20.00
Blue Dylan slim fit distressed jeans - £25.00
Dark blue straight leg jeans - £20.00
Dark blue straight leg jeans - £20.00
Dark blue straight leg manhattan jeans - £20.00
Dark blue ripped super skinny jeans - £25.00
Dark blue Dean straight leg jeans - £20.00
Dark blue Dylan slim fit jeans - £25.00
Dark grey R96 muscle fit grandad shirt - £15.00
Burgundy slim fit colour block sleeve hoodie - £20.00
Burgundy R96 muscle fit grandad shirt - £15.00
Dark red R95 muscle fit raglan T-shirt - £12.00
Dark red R96 muscle fit long sleeve T-shirt - £12.00
Dark red wasp embroidered Oxford shirt - £15.00
Green poplin muscle fit long sleeve shirt - £15.00
Grey check button down long sleeve shirt - £20.00
Light blue long sleeve flannel shirt - £20.00
R96 black velour slim fit hoodie - £20.00
Pink R96 muscle fit button-down shirt - £15.00
White ribbed crew neck long sleeve top - £12.00
Khaki slim fit tape sleeve hoodie - £20.00
Stone pique muscle fit long sleeve shirt - £15.00
Black lace up chukka boot - £25.00
Black 'Prolific' padded puffer coat - £45.00
Black muscle fit rib crew neck jumper - £20.00
Black hooded borg lined jacket - £45.00
Black longline faux fur hooded parka jacket - £45.00
Black zip front funnel neck puffer jacket - £25.00

答案 1 :(得分:1)

好的。您犯了小错误。您尝试抓取的是通过find的单个产品名称。相反,您必须为所有产品尝试find_all

另一件事是您的price抓取数据中,实际上是两个class,应由.而不是-合并。

答案 2 :(得分:0)

我会尽力为您找到解决方案,但现在尝试使用

soup.find_all('div', attrs={'your attributes'}

功能