beautifulsoup:Parse Span标题

时间:2014-03-02 00:25:50

标签: python html-parsing beautifulsoup

我正在尝试解析一个html页面,我已成功到达html dom树的子区域,但我被困在一个有span标签的地方。

示例:我最初解析页面如下:

        user_url = base_url + str(user_id) + "/" + display_name
        user_page = urllib2.urlopen(user_url)
        souping_page = bs(user_page)
        badges = souping_page.body.find('div', attrs={'class': 'badges'})

徽章会让我跟随:

<span><span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span><span title="23 silver badges"><span class="badge2"></span><span class="badgecount">23</span></span><span title="43 bronze badges"><span class="badge3"></span><span class="badgecount">43</span></span></span>

但我试图通过遍历dom结构来提取<span title="3 gold badges">和所有其他span title属性。我怎么能在beautifulsoup中做到这一点。

1 个答案:

答案 0 :(得分:3)

你可以这样做:

>>> badges.span.span
<span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span>