在BS4中解析具有相同类的相同标签

时间:2018-08-03 15:20:30

标签: python beautifulsoup

大家好

from bs4 import BeautifulSoup as b


data = """
<div class="hello1">
<span class="string1">This is string 1</span>
<span class="string2">This is string 2</span>
</div>
<div class="hello2">
<span class="string1">Another String 1</span>
</div>"""

bsObj = b(data, 'html.parser')
print(bsObj.find('span', 'string'))

现在我只想解析“另一个字符串1”,但是当我运行代码时,结果是“这是字符串1”。
如果我将查找结果更改为findAll,它会从div.hello1和div.hello2打印string1,但我只想要div.hello2中的跨度

2 个答案:

答案 0 :(得分:0)

您必须告诉BS 哪里您要搜索跨度:

bsObj.find('div','hello2').find('span','string1')
#<span class="string1">Another String 1</span>

答案 1 :(得分:0)

您可以使用CSS选择器通过方法select() / select_one()来定位标签。选择器div.hello2 span将定位到<span>标签下的<div>标签,类别为hello2

from bs4 import BeautifulSoup as b

data = """
<div class="hello1">
<span class="string1">This is string 1</span>
<span class="string2">This is string 2</span>
</div>
<div class="hello2">
<span class="string1">Another String 1</span>
</div>"""

bsObj = b(data, 'html.parser')

print(bsObj.select_one('div.hello2 span').text)

打印:

Another String 1