使用python web scrap获取<span>值

时间:2018-12-05 02:11:19

标签: python web-scraping beautifulsoup

我正在尝试使用python中的BeautifulSoup获得产品价格。 但是,无论我尝试什么,我都会出错。

The picture of the site i am trying to web scrap

我想得到19,90的值。 我已经完成了获取所有产品名称的代码,现在需要它们的价格。

    import requests
from bs4 import BeautifulSoup

url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')

price = soup.find('span', itemprop_='price')

print(price)

2 个答案:

答案 0 :(得分:1)

不太理想的是解析包含价格的JSON

import requests
import json
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'
page = requests.get(url)    
soup = BeautifulSoup(page.content, 'lxml')
scripts = [script.text for script in soup.select('script') if 'var freedom = freedom ||' in script.text]
pricesJson = scripts[0].split('"items":')[1].split(']')[0] + ']'
prices = [item['price'] for item in  json.loads(pricesJson)]
names = [name.text for name in soup.select('#item-list [itemprop=name]')]
results = list(zip(names,prices))

df = pd.DataFrame(results)
print(df)

示例输出:

enter image description here

答案 1 :(得分:0)

span[itemprop='price']由javascript生成。使用div[data-final-price]之类的值存储在1990中的原始值,您可以使用Regex将其格式化为19,90

import re

...
soup = BeautifulSoup(page.text, 'html.parser')
prices = soup.select('div[data-final-price]')
for price in prices:
    price = re.sub(r'(\d\d$)', r',\1', price['data-final-price'])
    print(price)

结果:

19,90
134,89
29,90
119,90
104,90
59,90
....