我有以下样本:
[<div class="options__list">
<a href="/link1">
<div class="options__list__item" option-message="closed" data-option='{"id":1,"is_active":true,"name":"Fran","city":{"id":32,"name":"Paris","is_top":null,"url_key":"paris","main_area":{"id":null,"name":null,"url_key":null}}}'></div>
</a><a href="/link2">
<div class="options__list__item" option-message="closed" data-option='{"id":2,"is_active":true,"name":"Fran2","city":{"id":32,"name":"Paris","is_top":null,"url_key":"paris","main_area":{"id":null,"name":null,"url_key":null}}}'></div>
</a>]
我想提取:
最好的方法是什么?而且,假设我只想从“数据选项”字典中提取特定键,我该怎么做?
非常感谢提前。
答案 0 :(得分:3)
想法是迭代链接,获取href
属性值,然后找到内部选项列表项并使用json.loads()
将data-option
值加载到python字典中:
import json
from bs4 import BeautifulSoup
data = """
<div>
<div class="options__list">
<a href="/link1">
<div class="options__list__item" option-message="closed" data-option='{"id":1,"is_active":true,"name":"Fran","city":{"id":32,"name":"Paris","is_top":null,"url_key":"paris","main_area":{"id":null,"name":null,"url_key":null}}}'></div>
</a>
<a href="/link2">
<div class="options__list__item" option-message="closed" data-option='{"id":2,"is_active":true,"name":"Fran2","city":{"id":32,"name":"Paris","is_top":null,"url_key":"paris","main_area":{"id":null,"name":null,"url_key":null}}}'></div>
</a>
</div>
</div>
"""
soup = BeautifulSoup(data, "html.parser")
for link in soup.select(".options__list > a"):
href = link['href']
data_option = json.loads(link.select_one("div.options__list__item")["data-option"])
print(href, data_option['id'])
打印(打印href
值和选项ID以用于演示目的):
(u'/link1', 1)
(u'/link2', 2)