如何使用Beautiful Soup

时间:2018-02-04 19:36:01

标签: python python-3.x beautifulsoup scraper

这里有一个基本的bs4问题,但已经尝试了几个小时!

url = 'https://www.currys.co.uk/gbuk/search-keywords/xx_xx_xx_xx_xx/acer/xx-criteria.html'
r = urllib.request.urlopen(url).read()
soup = BeautifulSoup(r,'lxml')

price = soup.find_all("div", class_="productPrices")

我现在如何消费价格?在这种情况下,这就是"强类="价格"数据副产物="价格">标签

我也希望能够使用SKU产品:" productSKU":" 200341"

我希望能够遍历与我的搜索匹配的所有页面(在这种情况下,只是" acer")并存储为数据框所有skus和与该搜索匹配的价格。

1 个答案:

答案 0 :(得分:0)

你可以试试这个:

import requests
import re
from collections import namedtuple
product = namedtuple('product', ['name', 'price', 'sku'])
from bs4 import BeautifulSoup as soup
page_data = str(requests.get('https://www.currys.co.uk/gbuk/search-keywords/xx_xx_xx_xx_xx/acer/xx-criteria.html').text)
names = [i.text for i in soup(page_data, 'html.parser').find_all('span', {'data-product':'name'})]
prices = list(map(lambda x:re.sub('[\s\n]+', '', x), [i.text for i in soup(page_data, 'html.parser').find_all('strong', {'data-product':'price'})]))
skus = dict([(b[:-1], a) for a, b in re.findall('"productSKU":"(.*?)","productName":"(.*?)"', page_data)])
final_product_data = [product(a, b, int([h for c, h in skus.items() if c in a][0])) for a, b in zip(names, list(prices))]
print([(i.name, i.price, i.sku) for i in final_product_data])

输出:

[('KG221Q Full HD 21.5" LED Monitor - Black', '£99.99', '201795'), ('C22-760 21.5" All-in-One PC - Silver', '£399.97', '200341'), ('KG271 Full HD 27" LED Gaming Monitor - Black', '£179.99', '201797'), ('S242HLDBID Full HD 24" LED Monitor', '£119.99', '156512'), ('CB3-431 14" Full HD Chromebook - Silver', '£299.99', '169493'), ('CB3-431 14" Full HD Chromebook - Gold', '£299.99', '169493'), ('Iconia One 10 B3-A40 10.1" Tablet - 16 GB, White', '£139.99', '214589'), ('14 CB3-431 Chromebook - Silver', '£249.99', '183981'), ('ED242QRwi Full HD 24" Curved LCD Monitor - White', '£119.99', '224620'), ('Aspire E15 15.6" Laptop - Black', '£699.99', '204284'), ('11 CB3-131 Chromebook - White', '£199.97', '165016'), ('R241Ybmid Full HD 23.8" LED Monitor', '£134.99', '164002'), ('CB3-131 11.6" Chromebook - Blue', '£199.99', '214340'), ('15 CB3-532 Full HD Chromebook - Iron', '£279.99', '191983'), ('Chromebook R 13 CB5-312T 2-in-1 - Silver', '£399.99', '180082'), ('14 CB3-431 Chromebook - Gold', '£249.99', '183980'), ('Iconia One 10 B3-A40 10.1" Tablet - 32 GB, Black', '£149.99', '214589'), ('Swift 3 SF314-52 14" Laptop - Silver', '£649.99', '205493'), ('C24-760 23.8" All-in-One PC - Silver', '£599.99', '200448'), ('Chromebook R 11 CB5-132T 2-in-1 - White', '£279.99', '183985')]

现在,您的数据将存储为namedtuple对象列表,以便于访问。