python web scraping .find()什么都不返回

时间:2018-01-31 01:57:31

标签: python web-scraping

我是python的新手,正在关注一些视频教程,从列表网站上进行一些数据抓取。

HTML:

<div class="listing-info">
  <h3>
    <a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
    </a>
  </h3>
  <ul class="listing-features">
    <li class="lst-details">
      <span class="lst-ptype">Semi-Detached House </span>
      <span class="lst-tenure">Freehold</span>
    </li>
  </ul>
</div>

在元素ul - 列表功能下,我想使用半独立式住宅&#39;作为住宅类型和&#39; Freehold&#39;作为住宅使用权。

我将上面的html解析为 listing-info 变量。

我的代码试用版:

listing-info.li.text

在这里,我能够获得半独立式住宅

了解列表中有两个span类,因此我尝试了一下:

listing-info.find('span',class_='1st-ptype')
listing-info.find('span',class_='1st-tenure')

两个回归都是空的 有谁可以启发我这个?

提前谢谢

3 个答案:

答案 0 :(得分:0)

您可以直接搜索住房类型:

import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
  <h3>
    <a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
     </a>
  </h3>
  <ul class="listing-features">
    <li class="lst-details">
      <span class="lst-ptype">Semi-Detached House </span>
      <span class="lst-tenure">Freehold</span>
    </li>
  </ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-ptype|lst-tenure')})]

输出:

[u'Semi-Detached House ', u'Freehold']

答案 1 :(得分:0)

你的问题是你误读了班级名称&#34; l st-ptype&#34;并键入&#34; 1 st-ptype&#34;代替。

答案 2 :(得分:0)

对@Ajax代码的简单修改:

import re
from bs4 import BeautifulSoup as soup
s = """
<div class="listing-info">
<h3>
<a href="/property-listing/english-townhouse-residence" itemprop="url" title = "For Sale English Townhouse"><span itemprop="name">English Townhouse</span>
</a>
</h3>
<ul class="listing-features">
<li class="lst-details">
   <span class="lst-ptype">Semi-Detached House </span>
   <span class="lst-tenure">Freehold</span>
 </li>
</ul>
</div>
"""
s = soup(s, 'lxml')
housing_types = [i.text for i in s.find_all('span', {'class':re.compile('lst-[a-z]*')})]
print(housing_types)
['Semi-Detached House ', 'Freehold']