Question

我有以下html，我想在<b>Name in Thai</b>之后得到: this is what I want

的文字

content = """
<html><body><b>Name of Bangkok Bus station:</b>
<span itemprop="name">Victory Monument</span>
<meta content="http://www.transitbangkok.com/stations/Bangkok%20Bus/Victory%20Monument" itemprop="url"/>
<meta content="http://www.transitbangkok.com/stations/Bangkok%20Bus/Victory%20Monument" itemprop="map"/>
<br/><b>Name in Thai</b>: this is what i want<br/>
</body></html>
"""

我尝试使用next_sibling的解决方案如下

soup = BeautifulSoup(content, "lxml")
soup.find('b').next_sibling

但是，我得到\n作为输出。有没有办法在特定标签之后获取文本（解释会很棒！）？

Answer 1

但是，我得到\n作为输出。

这是因为find("b")会返回它遇到的第一个<b>标记，而content中的第一个标记只返回换行符。

如果您反而遍历所有<b>标记。然后你会看到next_sibling给你你想要的东西：

for tag in soup.find_all("b"):
    print(tag.text)
    print(tag.next_sibling)

输出：

Name of Bangkok Bus station:


Name in Thai
: this is what i want

你可以通过它们进行迭代，然后在strip()'next_sibling之后找到一个有空格的东西。

for tag in soup.find_all("b"):
    after = tag.next_sibling.strip()
    if after:
        print(tag.next_sibling)

BeautifulSoup在html标签后获取文本

1 个答案: