Question

好像我的BeautifulSoup解析器忽略了我请求的元素的路径，并返回找到的第一个带有路径中最终元素名称的标记，而不管到那一点的路径。

XML：

<root>
    <firstcategory>
        <subcategory>
            <id>123</id>
            <name>SubcategX</name>
        </subcategory>
        <id>789</id>
        <name>Category1</name>
    </firstCategory>
</root>

Python代码：

from bs4 import BeautifulSoup

testXML = "<root><firstcategory><subcategory><id>123</id><name>SubcategX</name></subcategory><id>789</id><name>Category1</name></firstCategory></root>"

soup = BeautifulSoup(testXML)
#below should be 789
categID = soup.root.firstcategory.id
#this prints 123, which corresponds to the path root.firstcategory.subcategory.id, not root.firstcategory.id
print("categID = %s" % categID)

为什么BeautifulSoup只是在层次结构中找到第一个id标签而不考虑指定的路径？

Answer 1

当您使用点语法时，BeautifulSoup以递归方式搜索所有祖先。碰巧首先找到子类别[HttpPost] public ActionResult Create(CreateUserModel model) { ...。

为了防止递归，你可以这样做：

<id>

以下是docs for the recursive argument。

BeautifulSoup导航忽略指定的路径

1 个答案: