为什么BeautifulSoup4中标签的兄弟姐妹可以是字符串?

时间:2019-07-20 11:44:09

标签: python beautifulsoup

乍看之下,我认为自然而然地认为.next_siblingprevious_sibling应该是兄弟姐妹标签。但是当我今天玩它时,它产生了像"\n"这样的NavigableString。

在仔细检查its documentation之后,它指出:

In real documents, the .next_sibling or .previous_sibling of a tag will usually be a string containing whitespace. Going back to the “three sisters” document:

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
You might think that the .next_sibling of the first <a> tag would be the second <a> tag. But actually, it’s a string: the comma and newline that separate the first <a> tag from the second:

link = soup.a
link
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

link.next_sibling
# u',\n'
The second <a> tag is actually the .next_sibling of the comma:

link.next_sibling.next_sibling
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

那是为什么?

2 个答案:

答案 0 :(得分:1)

.find_next_sibling属性用于对HTML文档进行细粒度搜索。 CSS选择器无法执行的操作(它们可以选择标签,而不能选择标签之间的字符串,例如,您不能使用CSS选择器SELECT THIS选择字符串<p>some text</p>SELECT THIS<p>some text</p>)。

如果要搜索同级标签,请使用find_next_sibling()方法。您还可以通过将.find_next_sibling参数传递到text=True来模拟find_next_sibling()的行为:

data = '''
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>'''


from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

link = soup.a
print(link)                                     # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
print(type(link.next_sibling))                  # <class 'bs4.element.NavigableString'>
print(link.find_next_sibling())                 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
print(type(link.find_next_sibling(text=True)))  # <class 'bs4.element.NavigableString'>

答案 1 :(得分:0)

Enter image description here

文档第16页 “” “”“

我希望我回答了你的问题。