我正在尝试使用python中的BeautifulSoup获得产品价格。 但是,无论我尝试什么,我都会出错。
The picture of the site i am trying to web scrap
我想得到19,90的值。 我已经完成了获取所有产品名称的代码,现在需要它们的价格。
import requests
from bs4 import BeautifulSoup
url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
price = soup.find('span', itemprop_='price')
print(price)
答案 0 :(得分:1)
不太理想的是解析包含价格的JSON
import requests
import json
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.zattini.com.br/busca?nsCat=Natural&q=amaro&searchTermCapitalized=Amaro&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
scripts = [script.text for script in soup.select('script') if 'var freedom = freedom ||' in script.text]
pricesJson = scripts[0].split('"items":')[1].split(']')[0] + ']'
prices = [item['price'] for item in json.loads(pricesJson)]
names = [name.text for name in soup.select('#item-list [itemprop=name]')]
results = list(zip(names,prices))
df = pd.DataFrame(results)
print(df)
示例输出:
答案 1 :(得分:0)
span[itemprop='price']
由javascript生成。使用div[data-final-price]
之类的值存储在1990
中的原始值,您可以使用Regex将其格式化为19,90
。
import re
...
soup = BeautifulSoup(page.text, 'html.parser')
prices = soup.select('div[data-final-price]')
for price in prices:
price = re.sub(r'(\d\d$)', r',\1', price['data-final-price'])
print(price)
结果:
19,90
134,89
29,90
119,90
104,90
59,90
....