beautifulsoup - 如何找到以某些属性开头的标签?

时间:2014-03-02 05:30:35

标签: python html-parsing beautifulsoup

例如,我有:

<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>

我如何得到:

['<a href="http://example2.com" class="banana"><img ... /></a>','<a href="http://google.com">link3</a>']

1 个答案:

答案 0 :(得分:0)

您可以使用css选择器a[href]来获取a属性为href的标记:

h = '''
<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(h)
print(soup.select('a[href]'))

输出:

[<a class="banana" href="http://example.com">link1</a>,
 <a class="banana" href="http://example2.com"><img ...=""/></a>,
 <a href="http://google.com">link3</a>]