我相信以下代码在两种情况下都应返回相同的结果。在我提交这个问题的错误报告之前,我想让第二双眼睛看到它。你是否同意这是一个错误?或者我误解了什么?
我使用的是Python 2.7.6和BeautifulSoup 4.3.2。谢谢!
import requests
from bs4 import BeautifulSoup, SoupStrainer
def print_em(list_name):
list_ = eval(list_name)
print ('len(%s):' % list_name), len(list_)
for item in list_:
print item
print
if __name__ == '__main__':
url = 'http://www.zzstream.li/2013/12/'
url += 'sons-of-anarchy-s6-e13-a-mothers-work.html'
body = requests.get(url).text
args = ['div']
kwargs = {'class_': 'postTabs_divs'}
# these should be identical, right?
# it seems the SoupStrainer class_ keyword
# only works for identical class matches
# while the find_all class_ keyword accepts
# partial matches
found = BeautifulSoup(body).find_all(*args, **kwargs)
strained = BeautifulSoup(body,
parse_only=SoupStrainer(*args, **kwargs)
).find_all(*args, **kwargs)
print_em('found')
print_em('strained')
以下是结果。请注意,找到中的第一个结果的类是" postTabs_divs postTabs_curr_div"而紧张的第一个结果的类是" postTabs_divs":
len(found): 9
<div class="postTabs_divs postTabs_curr_div" id="postTabs_0_46840">
<span class="postTabs_titles"><b>AllMyV</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://allmyvideos.net/embed-89r88fy6j96b-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_1_46840">
<span class="postTabs_titles"><b>VSpot</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidspot.net/embed-yrbdku7edj40.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_2_46840">
<span class="postTabs_titles"><b>Vidbull</b></span><br/>
<iframe frameborder="0" height="338" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidbull.com/embed-wjk6s7t57h90-540x318.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_3_46840">
<span class="postTabs_titles"><b>VK-Mobile</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=-62728793&id=167008843&hash=9fb3811a70148272&hd=1" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_4_46840">
<span class="postTabs_titles"><b>VK-Mob</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=167280541&id=166731105&hash=29efdfc219ebfb8c" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_5_46840">
<span class="postTabs_titles"><b>Vidto</b></span><br/>
<iframe allowfullscreen="" frameborder="0" height="330" src="http://vidto.me/embed-3dmmkakzx611-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_6_46840">
<span class="postTabs_titles"><b>Played</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://played.to/embed-923k56hkxvuh-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_7_46840">
<span class="postTabs_titles"><b>IShar</b></span><br/>
<iframe scrolling="no" src="http://ishared.eu/embed/bHPE4Ca76ZQGUbFHO_6NUd0DI7ZN5ojCJcK5QNpGuYY?width=540&height=330?width=540&height=330" style="overflow: hidden; b
540px; height: 330px;"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_8_46840">
<span class="postTabs_titles"><b>YouWa</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://youwatch.org/embed-75hw5d2podx5-540x330.html" width="540"></iframe>
</div>
len(strained): 8
<div class="postTabs_divs" id="postTabs_1_46840">
<span class="postTabs_titles"><b>VSpot</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidspot.net/embed-yrbdku7edj40.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_2_46840">
<span class="postTabs_titles"><b>Vidbull</b></span><br/>
<iframe frameborder="0" height="338" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidbull.com/embed-wjk6s7t57h90-540x318.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_3_46840">
<span class="postTabs_titles"><b>VK-Mobile</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=-62728793&id=167008843&hash=9fb3811a70148272&hd=1" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_4_46840">
<span class="postTabs_titles"><b>VK-Mob</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=167280541&id=166731105&hash=29efdfc219ebfb8c" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_5_46840">
<span class="postTabs_titles"><b>Vidto</b></span><br/>
<iframe allowfullscreen="" frameborder="0" height="330" src="http://vidto.me/embed-3dmmkakzx611-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_6_46840">
<span class="postTabs_titles"><b>Played</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://played.to/embed-923k56hkxvuh-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_7_46840">
<span class="postTabs_titles"><b>IShar</b></span><br/>
<iframe scrolling="no" src="http://ishared.eu/embed/bHPE4Ca76ZQGUbFHO_6NUd0DI7ZN5ojCJcK5QNpGuYY?width=540&height=330?width=540&height=330" style="overflow: hidden; b
540px; height: 330px;"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_8_46840">
<span class="postTabs_titles"><b>YouWa</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://youwatch.org/embed-75hw5d2podx5-540x330.html" width="540"></iframe>
</div>