我正在使用BeautifulSoup从随机网站中提取数据。我试图找到所有带有类名的div标签作为simpleList。但是没有收集数据。它只是显示一个空列表。
</div>
<div class="clear"></div>
<div id="locationSearchResults" class="simpleList">
<div class="result ">
<span class="cell cellBorder normalWidth" onclick="document.location='/real-estate/rock-spring-ga/LCGAROCKSPRING/';">
<a onclick="Track.doEvent('Location Search Results', 'Select Listings', 'Rock Spring, GA');" tabindex="2" title="Listings in Rock Spring, GA" class="suggestCollapse" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/"><b>Rock Spring, GA</b></a>
</span>
<span class="cell cellBorder normalWidth"><a onclick="Track.doEvent('Location Search Results', 'Select Homes for Sale', 'Rock Spring, GA');" title="Homes for Sale in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/">56 Listings</a></span>
<span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Rentals in Rock Spring, GA" href="/real-estate/rock-spring-ga/LCGAROCKSPRING/?ty=3">0 Rentals</a></span>
<span class="cell cellBorder normalWidth disabled"><a onclick="return false;" title="Agents in Rock Spring, GA" href="/real-estate-agents/rock-spring-ga/LCGAROCKSPRING/">0 Agents</a></span>
import requests
from bs4 import BeautifulSoup
r=requests.get("http://www.century21.com/locationsearch.c21?
q=Rock+Spring&v=0#r=10&l=Rock+Spring&c=1")
c=r.content
soup=BeautifulSoup(c,"html.parser")
print(soup)
all=soup.find_all("div",{"class":"simpleList"})
print(all)
错误是什么?
答案 0 :(得分:0)
试试这个:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://your_site.com")
soup = BeautifulSoup(html, 'lxml)
all = soup.find_all('div', {'class': 'simpleList'})
print(all)
答案 1 :(得分:0)
问题出在你正在使用的HTML解析器上。
使用lxml
或html5lib
。
我使用html5lib
并且效果很好:
all = soup.find('div', {'class': 'simpleList'}).findAll('div')
print(len(all))
它给了我12个。
修改强>
此表总结了每个解析器库的优缺点: