Question

我读到.contents返回标记的直接子节点，如果我们想迭代这些子节点，我们应该使用.children。但是我已经尝试了它们并得到了相同的输出。

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p></body></html>
"""
soup = BeautifulSoup(html_doc, "html.parser")
title_tag = soup.title

for child in title_tag.children:
    print(child)
for child in title_tag.contents:
    print(child)

Answer 1

文档比这更微妙。它说

您可以使用.children生成器迭代标记的子项，而不是将它们作为列表获取

但是您可以直接在for循环中迭代列表，并且可以通过调用iter()来获取迭代器，因此即使拥有.children属性也没有意义。仔细观察，以下是children的实施方式。

#Generator methods
@property
def children(self):
    # return iter() to make the purpose of the method clear
    return iter(self.contents)  # XXX This seems to be untested.

是的，这完全没有意义。这两个代码片段是相同的，只是for child in title_tag.contents获取列表的迭代器，for child in title_tag.children使用它已经传递的迭代器。

Answer 2

考虑到你在谈论BeautifulSoup（你应该给我们一些背景内容！）......

正如here所述，主要区别在于.contents您获得了一个列表，而使用.children您将获得一个生成器。

它可能似乎没有任何区别，因为您可以迭代它们，但是当您使用大量数据时，您应该总是更喜欢使用生成器来备用计算机＆＃39记忆。

想象一下：你有一个10k的文本文件，你需要一次处理每一行。使用列表时（例如：with open('t.txt') as f: lines = f.readlines()），你会用一些你不会马上工作的东西填满你的记忆，只是挂在那里花费空间（更不用说那取决于你了）环境，你可能没有足够的内存......）在使用生成器时，你可以根据需要及时获得一行，但没有内存消耗...

.contents和.children之间的区别

2 个答案: