Question

我是python的新手，正在关注一些视频教程，从列表网站上进行一些数据抓取。

HTML：

<div class="listing-info">
  <h3>
    <a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
    </a>
  </h3>
  <ul class="listing-features">
    <li class="lst-details">
      <span class="lst-ptype">Semi-Detached House </span>
      <span class="lst-tenure">Freehold</span>
    </li>
  </ul>
</div>

在元素ul - 列表功能下，我想使用半独立式住宅＆＃39;作为住宅类型和＆＃39; Freehold＆＃39;作为住宅使用权。

我将上面的html解析为 listing-info 变量。

我的代码试用版：

listing-info.li.text

在这里，我能够获得半独立式住宅

了解列表中有两个span类，因此我尝试了一下：

listing-info.find('span',class_='1st-ptype')
listing-info.find('span',class_='1st-tenure')

两个回归都是空的有谁可以启发我这个？

提前谢谢

Answer 1

您可以直接搜索住房类型：

import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
  <h3>
    <a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
     </a>
  </h3>
  <ul class="listing-features">
    <li class="lst-details">
      <span class="lst-ptype">Semi-Detached House </span>
      <span class="lst-tenure">Freehold</span>
    </li>
  </ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-ptype|lst-tenure')})]

输出：

[u'Semi-Detached House ', u'Freehold']

Answer 2

你的问题是你误读了班级名称＆＃34; l st-ptype＆＃34;并键入＆＃34; 1 st-ptype＆＃34;代替。

Answer 3

对@Ajax代码的简单修改：

import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
<h3>
<a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
</a>
</h3>
<ul class="listing-features">
<li class="lst-details">
   <span class="lst-ptype">Semi-Detached House </span>
   <span class="lst-tenure">Freehold</span>
 </li>
</ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-[a-z]*')})]
print(housing_types)
['Semi-Detached House ', 'Freehold']

python web scraping .find（）什么都不返回

3 个答案: