Question

我正在使用BeautifulSoup从随机网站中提取数据。我试图找到所有带有类名的div标签作为simpleList。但是没有收集数据。它只是显示一个空列表。

              
         </div>
         <div class="clear"></div>
         <div id="locationSearchResults" class="simpleList">
             
               
               <div class="result ">
                  <span class="cell cellBorder normalWidth" onclick="document.location='/real-estate/rock-spring-ga/LCGAROCKSPRING/';">
                     <a onclick="Track.doEvent('Location Search Results', 'Select Listings', 'Rock Spring, GA');" tabindex="2" title="Listings in Rock Spring, GA" class="suggestCollapse" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/"><b>Rock Spring, GA</b></a>
                  </span>
                  
                  
                     <span class="cell cellBorder normalWidth"><a onclick="Track.doEvent('Location Search Results', 'Select Homes for Sale', 'Rock Spring, GA');" title="Homes for Sale in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/">56 Listings</a></span>
                  
                  
                  
                  
                  
                     <span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Rentals in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/?ty=3">0 Rentals</a></span>
                  
                  
                  
                  
                     <span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Agents in Rock Spring, GA" href="/real-estate-agents/rock-spring-ga/LCGAROCKSPRING/">0 Agents</a></span>

import requests
from bs4 import BeautifulSoup
r=requests.get("http://www.century21.com/locationsearch.c21?
q=Rock+Spring&v=0#r=10&l=Rock+Spring&c=1")
c=r.content
soup=BeautifulSoup(c,"html.parser")
print(soup)
all=soup.find_all("div",{"class":"simpleList"})
print(all)

错误是什么？

Answer 1

试试这个：

from bs4 import BeautifulSoup
from urllib.request import urlopen

html = urlopen("http://your_site.com")
soup = BeautifulSoup(html, 'lxml)
all = soup.find_all('div', {'class': 'simpleList'})
print(all)

Answer 2

问题出在你正在使用的HTML解析器上。

使用lxml或html5lib。

我使用html5lib并且效果很好：

all = soup.find('div', {'class': 'simpleList'}).findAll('div')
print(len(all))

它给了我12个。

修改

此表总结了每个解析器库的优缺点：

来源：https://www.crummy.com/software/BeautifulSoup/bs4/doc/

BeautifulSoup.find_all不检索网页的元素

2 个答案: