使用BeautifulSoup4在Python中存储标记中的数据

时间:2016-09-16 14:48:43

标签: python json beautifulsoup

使用BeautifulSoup4,我可以隔离:

<a href="#" data-nutrition="{
    &quot;serving-name&quot;:&quot;Milk, 2%&quot;,
    &quot;serving-size&quot;:&quot;16 FL OZ&quot;,
    &quot;calories&quot;:&quot;267&quot;}">
Milk, 2%
<i class="icon-leaf icon-hidden-text">Meatless</i>
</a>

通过运行:

for i in soup('a', attrs={'data-nutrition' : True}):
    sample = i
    break
print(sample)

我需要创建字典:

my_dict = {
    'serving-name': 'Milk, 2%',
    'serving-size': '16 FL OZ',
    'calories': '267'
}

如何在Python中使用BeautifulSoup4执行此操作?

1 个答案:

答案 0 :(得分:1)

找到元素并使用json.loads()data-nutrition属性值加载到Python字典中:

import json
from bs4 import BeautifulSoup


data = """
<a href="#" data-nutrition="{
    &quot;serving-name&quot;:&quot;Milk, 2%&quot;,
    &quot;serving-size&quot;:&quot;16 FL OZ&quot;,
    &quot;calories&quot;:&quot;267&quot;}">
Milk, 2%
<i class="icon-leaf icon-hidden-text">Meatless</i>
</a>"""
soup = BeautifulSoup(data, "html.parser")

a = soup.select_one("a[data-nutrition]")
nutrition = json.loads(a["data-nutrition"])
print(nutrition)

打印:

{'serving-name': 'Milk, 2%', 'serving-size': '16 FL OZ', 'calories': '267'}