BeautifulSoup SoupStrainer bug?

时间:2014-03-21 20:39:22

标签: python beautifulsoup

我相信以下代码在两种情况下都应返回相同的结果。在我提交这个问题的错误报告之前,我想让第二双眼睛看到它。你是否同意这是一个错误?或者我误解了什么?

我使用的是Python 2.7.6和BeautifulSoup 4.3.2。谢谢!

import requests
from bs4 import BeautifulSoup, SoupStrainer

def print_em(list_name):
    list_ = eval(list_name)
    print ('len(%s):' % list_name), len(list_)
    for item in list_:
        print item
    print

if __name__ == '__main__':
    url = 'http://www.zzstream.li/2013/12/'
    url += 'sons-of-anarchy-s6-e13-a-mothers-work.html'
    body = requests.get(url).text

    args = ['div']
    kwargs = {'class_': 'postTabs_divs'}

    # these should be identical, right?
    # it seems the SoupStrainer class_ keyword
    # only works for identical class matches
    # while the find_all class_ keyword accepts
    # partial matches
    found = BeautifulSoup(body).find_all(*args, **kwargs)
    strained = BeautifulSoup(body,
        parse_only=SoupStrainer(*args, **kwargs)
    ).find_all(*args, **kwargs)

    print_em('found')
    print_em('strained')

以下是结果。请注意,找到中的第一个结果的类是" postTabs_divs postTabs_curr_div"而紧张的第一个结果的类是" postTabs_divs":

len(found): 9
<div class="postTabs_divs postTabs_curr_div" id="postTabs_0_46840">
<span class="postTabs_titles"><b>AllMyV</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://allmyvideos.net/embed-89r88fy6j96b-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_1_46840">
<span class="postTabs_titles"><b>VSpot</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidspot.net/embed-yrbdku7edj40.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_2_46840">
<span class="postTabs_titles"><b>Vidbull</b></span><br/>
<iframe frameborder="0" height="338" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidbull.com/embed-wjk6s7t57h90-540x318.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_3_46840">
<span class="postTabs_titles"><b>VK-Mobile</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=-62728793&amp;id=167008843&amp;hash=9fb3811a70148272&amp;hd=1" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_4_46840">
<span class="postTabs_titles"><b>VK-Mob</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=167280541&amp;id=166731105&amp;hash=29efdfc219ebfb8c" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_5_46840">
<span class="postTabs_titles"><b>Vidto</b></span><br/>
<iframe allowfullscreen="" frameborder="0" height="330" src="http://vidto.me/embed-3dmmkakzx611-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_6_46840">
<span class="postTabs_titles"><b>Played</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://played.to/embed-923k56hkxvuh-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_7_46840">
<span class="postTabs_titles"><b>IShar</b></span><br/>
<iframe scrolling="no" src="http://ishared.eu/embed/bHPE4Ca76ZQGUbFHO_6NUd0DI7ZN5ojCJcK5QNpGuYY?width=540&amp;height=330?width=540&amp;height=330" style="overflow: hidden; b
540px; height: 330px;"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_8_46840">
<span class="postTabs_titles"><b>YouWa</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://youwatch.org/embed-75hw5d2podx5-540x330.html" width="540"></iframe>
</div>

len(strained): 8
<div class="postTabs_divs" id="postTabs_1_46840">
<span class="postTabs_titles"><b>VSpot</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidspot.net/embed-yrbdku7edj40.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_2_46840">
<span class="postTabs_titles"><b>Vidbull</b></span><br/>
<iframe frameborder="0" height="338" marginheight="0" marginwidth="0" scrolling="NO" src="http://vidbull.com/embed-wjk6s7t57h90-540x318.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_3_46840">
<span class="postTabs_titles"><b>VK-Mobile</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=-62728793&amp;id=167008843&amp;hash=9fb3811a70148272&amp;hd=1" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_4_46840">
<span class="postTabs_titles"><b>VK-Mob</b></span><br/>
<iframe frameborder="0" height="330" src="http://vk.com/video_ext.php?oid=167280541&amp;id=166731105&amp;hash=29efdfc219ebfb8c" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_5_46840">
<span class="postTabs_titles"><b>Vidto</b></span><br/>
<iframe allowfullscreen="" frameborder="0" height="330" src="http://vidto.me/embed-3dmmkakzx611-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_6_46840">
<span class="postTabs_titles"><b>Played</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://played.to/embed-923k56hkxvuh-540x330.html" width="540"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_7_46840">
<span class="postTabs_titles"><b>IShar</b></span><br/>
<iframe scrolling="no" src="http://ishared.eu/embed/bHPE4Ca76ZQGUbFHO_6NUd0DI7ZN5ojCJcK5QNpGuYY?width=540&amp;height=330?width=540&amp;height=330" style="overflow: hidden; b
540px; height: 330px;"></iframe><br/>
</div>
<div class="postTabs_divs" id="postTabs_8_46840">
<span class="postTabs_titles"><b>YouWa</b></span><br/>
<iframe frameborder="0" height="330" marginheight="0" marginwidth="0" scrolling="NO" src="http://youwatch.org/embed-75hw5d2podx5-540x330.html" width="540"></iframe>
</div>

0 个答案:

没有答案