我希望在BeautifulSoup中使用findParent()方法来查找具有id属性的特定标记的父级。例如,请考虑以下示例XML:
<monograph>
<section id="1234">
<head>Test Heading</head>
<p>Here's a paragraph with some text in it.</p>
</section>
</monograph>
假设我已经匹配段落中的内容,我想使用findParent不加选择地在树中找到具有id属性的第一个父级。类似的东西:
for hit in monograph(text="paragraph with"):
containername = hit.findParent(re.compile([A-Za-z]+), {id}).name
但是,前面的代码不会返回任何匹配。
答案 0 :(得分:2)
使用id=True
匹配具有id
属性的元素,无论属性的值如何:
hit.find_parent(id=True)
相反,使用id=False
会找到第一个父元素,而不是一个id
属性。
请注意,您应该对BeautifulSoup方法使用 lower_case_with_underscores 样式; findParent
是BeautifulSoup 3拼写has been deprecated。
演示:
>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <monograph>
... <section id="1234">
... <head>Test Heading</head>
... <p>Here's a paragraph with some text in it.</p>
... </section>
... </monograph>
... '''
>>> soup = BeautifulSoup(sample, 'xml')
>>> str(soup.p)
"<p>Here's a paragraph with some text in it.</p>"
>>> print(soup.p.find_parent(id=True).prettify())
<section id="1234">
<head>
Test Heading
</head>
<p>
Here's a paragraph with some text in it.
</p>
</section>
>>> print(soup.p.find_parent(id=False).prettify())
<monograph>
<section id="1234">
<head>
Test Heading
</head>
<p>
Here's a paragraph with some text in it.
</p>
</section>
</monograph>