我是python的新手,正在关注一些视频教程,从列表网站上进行一些数据抓取。
HTML:
<div class="listing-info">
<h3>
<a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
</a>
</h3>
<ul class="listing-features">
<li class="lst-details">
<span class="lst-ptype">Semi-Detached House </span>
<span class="lst-tenure">Freehold</span>
</li>
</ul>
</div>
在元素ul - 列表功能下,我想使用半独立式住宅&#39;作为住宅类型和&#39; Freehold&#39;作为住宅使用权。
我将上面的html解析为 listing-info 变量。
我的代码试用版:
listing-info.li.text
在这里,我能够获得半独立式住宅
了解列表中有两个span类,因此我尝试了一下:
listing-info.find('span',class_='1st-ptype')
listing-info.find('span',class_='1st-tenure')
两个回归都是空的 有谁可以启发我这个?
提前谢谢
答案 0 :(得分:0)
您可以直接搜索住房类型:
import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
<h3>
<a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
</a>
</h3>
<ul class="listing-features">
<li class="lst-details">
<span class="lst-ptype">Semi-Detached House </span>
<span class="lst-tenure">Freehold</span>
</li>
</ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-ptype|lst-tenure')})]
输出:
[u'Semi-Detached House ', u'Freehold']
答案 1 :(得分:0)
你的问题是你误读了班级名称&#34; l st-ptype&#34;并键入&#34; 1 st-ptype&#34;代替。
答案 2 :(得分:0)
对@Ajax代码的简单修改:
import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
<h3>
<a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
</a>
</h3>
<ul class="listing-features">
<li class="lst-details">
<span class="lst-ptype">Semi-Detached House </span>
<span class="lst-tenure">Freehold</span>
</li>
</ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-[a-z]*')})]
print(housing_types)
['Semi-Detached House ', 'Freehold']