例如,我有:
<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>
我如何得到:
['<a href="http://example2.com" class="banana"><img ... /></a>','<a href="http://google.com">link3</a>']
答案 0 :(得分:0)
您可以使用css选择器a[href]
来获取a
属性为href
的标记:
h = '''
<a class="banana" href="http://example.com">link1</a>
<a href="http://example2.com" class="banana"><img ... /></a>
<a class="banana">link2</a>
<a href="http://google.com">link3</a>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(h)
print(soup.select('a[href]'))
输出:
[<a class="banana" href="http://example.com">link1</a>,
<a class="banana" href="http://example2.com"><img ...=""/></a>,
<a href="http://google.com">link3</a>]