使用BeautifulSoup的findAll后无法访问属性

时间:2013-03-18 18:25:18

标签: python-2.7 beautifulsoup

我正试图在BBC网站上搜索this one这样的网站,以获取该计划列表的相关部分,我刚开始使用BeautifulSoup来执行此操作。

感兴趣的部分从以下部分开始:

<li about="/programmes/p013zzsl#segment" class="segment track" id="segmentevent-p013zzsm" typeof="po:MusicSegment">

<li about="/programmes/p014003v#segment" class="segment speech alt" id="segmentevent_p014003w" typeof="po:SpeechSegment">

到目前为止我所做的是将soup打开HTML,然后使用soup.findAll(typeof=['po:MusicSegment', 'po:SpeechSegment'])给出我感兴趣的部分的ResultSet,以及它们出现的顺序。

我当时要做的是检查某个部分是否在HTML中引用po:MusicSegmentpo:SpeechSegment,如下所示:

<li about="/programmes/p01400m9#segment" class="segment track" id="segmentevent-p01400mb" typeof="po:MusicSegment"> <span class="artist-image"> <span class="depiction" rel="foaf:depiction"><img alt="" height="63" src="http://static.bbci.co.uk/programmes/2.54.3/img/thumbnail/artists_default.jpg" width="112"/></span> </span> <script type="text/javascript"> window.programme_data.tracklist.push({ segment_event_pid : "p01400mb", segment_pid : "p01400m9", playlist : "http://www.bbc.co.uk/programmes/p01400m9.emp" }); </script> <h3> <span rel="mo:performer"> <span class="artist no-image" property="foaf:name" typeof="mo:MusicArtist">Mala</span> </span> <span class="title" property="dc:title">Calle F</span> </h3></li>

我想访问与typeof相关联的<li>属性,但如果此大块HTML(作为BS4标记)被调用section并且我输入section.li,它返回None

请注意,如果我改为section.img,我会收到回复:

<img alt="" height="63" src="http://static.bbci.co.uk/programmes/2.54.3/img/thumbnail/artists_default.jpg" width="112"/>
然后我可以这样做,例如section.img['height']返回u'63'

我想要的是与section.li部分类似的内容,因此section.li['typeof']给我po:MusicSegmentpo:SpeechSegment

当然,我可以简单地将每个结果转换为文本,然后进行简单的字符串搜索,但按属性搜索看起来更优雅。

1 个答案:

答案 0 :(得分:2)

我会遍历findAll返回的列表:

soup = BeautifulSoup('<li about="/programmes/p013zzsl#segment" class="segment track" id="segmentevent-p013zzsm" typeof="po:MusicSegment"><li about="/programmes/p014003v#segment" class="segment speech alt" id="segmentevent_p014003w" typeof="po:SpeechSegment">')

for elem in soup.findAll(typeof=['po:MusicSegment', 'po:SpeechSegment']):
    print elem['typeof']

返回

po:MusicSegment
po:SpeechSegment

然后有条件地执行其他任务:

if elem['typeof'] == 'po:MusicSegment'
    do.something()
elif elem['typeof'] == 'po:SpeechSegment':
    do.something_else()