Question

我正在寻找一种方法，给定一个BeautifulSoup解析文档，找出最大嵌套级别。

E.g。我需要magic_function：

r = requests.get("http//example.com")
soup = BeautifulSoup(r.text)
depth = magic_function(soup)

例如，对于本文档，将返回4：

<html>
    <body>
        <p>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
        </p>
        <p>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
            <strong>Some Text.</strong>
        </p>
    </body>
</html>

我曾经有过一些想法：

BeautifulSoup中有功能吗？查看文档和谷歌搜索没有任何帮助。
是否有其他图书馆允许我这样做？再一次，谷歌搜索没有给我任何利益，但我可能根本不知道该搜索什么。
我应该尝试使用我自己建立的功能遍历树吗？我真的不愿意，但我当然可以这样做。

Answer 1

使用您自己的magic_function()遍历树并不困难。您可以使用简单的递归函数，如：

def magic_function(soup):
    if hasattr(soup, "contents") and soup.contents:
        return max([magic_function(child) for child in soup.contents]) + 1
    else:
        return 0

您可能希望使用文档的顶级html标记调用该函数，以便它不会将soup对象中的嵌套计为嵌套级别。

使用上述文档结构，此函数调用返回4：

>>> magic_function(soup.html)
4

使用BeautifulSoup获取最大的标签嵌套

1 个答案: