BS4从类中获取具有奇怪名称的信息

时间:2016-02-16 17:09:16

标签: python bs4

来自the Steam Community market search的这个奇怪的HTML:

<span class=\"normal_price\">$2.69 USD<\/span>

如何使用bs4提取数据?这不起作用:

soup.find("span", attrs={"class": "\"normal_price\""})

1 个答案:

答案 0 :(得分:1)

您在JSON字符串中嵌入了HTML,必须转义引号。而不是手动提取该数据,首先解析JSON:

import json

data = json.loads(json_data)
html = data['results_html']

如果您使用的是requests库,则可以为您解码响应:

response = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3&currency=3&l=english&cc=pt')
html = response.json()['results_html']

之后你可以使用BeautifulSoup解析它:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> html = requests.get('http://steamcommunity.com/market/search/render/?query=appid:730&start=0&count=3&currency=3&l=english&cc=pt').json()['results_html']
>>> BeautifulSoup(html, 'lxml').find('span', class_='normal_price').span
<span class="normal_price">$2.69 USD</span>