我正在尝试提取房屋的属性和相应的值。我有兴趣获得{key:{物业类型:商业物业,购买价格:475,000瑞士法郎等}
我能够逐个提取值,但不能作为更新字典的循环。
<dl class="row xsmall-up-2 medium-up-3 large-up-4 attributes-grid">
<div class="column">
<dt class="label-text">
Property type
</dt>
<dd>
Commercial property </dd>
</div>
<div class="column">
<dt class="label-text">
Purchase price
</dt>
<dd>
CHF 475,000 </dd>
</div>
<div class="column">
<dt class="label-text">
Floor space
</dt>
<dd>
114 m² </dd>
</div>
<div class="column">
<dt class="label-text">
Floor
</dt>
<dd>
1. floor </dd>
</div>
<div class="column">
<dt class="label-text">
Year of construction
</dt>
<dd>
1989 </dd>
</div>
<div class="column">
<dt class="label-text">
Balcony/ies
</dt>
<dd>
<i class="fa fa-check text-green" aria-hidden="true"></i>
</dd>
</div>
<div class="column">
<dt class="label-text">
Indoor parking
</dt>
<dd>
<i class="fa fa-check text-green" aria-hidden="true"></i>
</dd>
</div>
<div class="column">
<dt class="label-text">
Outdoor parking
</dt>
<dd>
<i class="fa fa-check text-green" aria-hidden="true"></i>
</dd>
</div>
<div class="column">
<dt class="label-text">
Lift
</dt>
<dd>
<i class="fa fa-check text-green" aria-hidden="true"></i>
</dd>
</div>
<div class="column">
<dt class="label-text">
Cable TV
</dt>
<dd>
<i class="fa fa-check text-green" aria-hidden="true"></i>
</dd>
</div>
<div class="column">
<dt class="label-text">
Public transport stop
</dt>
<dd>
150 m </dd>
</div>
<div class="column">
<dt class="label-text">
Motorway
</dt>
<dd>
500 m </dd>
</div>
<div class="column">
<dt class="label-text">
Shops
</dt>
<dd>
300 m </dd>
</div>
</dl>
答案 0 :(得分:1)
考虑您提供的html文本,该文本以table_text
中的字符串形式存储。
from bs4 import BeautifulSoup
soup = BeautifulSoup(table_text,"lxml")
temp_dict = {}
for d in soup.find_all("div",{"class":"column"}):
temp_dict[d.find("dt").text.strip()] = d.find("dd").text.strip()
print(temp_dict)
我猜你提供的html文本只用于表的一行,如果你想要所有的行,循环它们并保留一个父词典,你将行更新为一个键,temp_dict作为一个值每次迭代。这将为您提供所需的结构。