Python - BeautifulSoup findParent属性

时间:2015-02-16 16:30:53

标签: python python-3.x beautifulsoup

我希望在BeautifulSoup中使用findParent()方法来查找具有id属性的特定标记的父级。例如,请考虑以下示例XML:

<monograph>
    <section id="1234">
        <head>Test Heading</head>
        <p>Here's a paragraph with some text in it.</p>
    </section>
</monograph>

假设我已经匹配段落中的内容,我想使用findParent不加选择地在树中找到具有id属性的第一个父级。类似的东西:

 for hit in monograph(text="paragraph with"):
     containername = hit.findParent(re.compile([A-Za-z]+), {id}).name

但是,前面的代码不会返回任何匹配。

1 个答案:

答案 0 :(得分:2)

使用id=True匹配具有id属性的元素,无论属性的值如何:

hit.find_parent(id=True)

相反,使用id=False会找到第一个父元素,而不是一个id属性。

请注意,您应该对BeautifulSoup方法使用 lower_case_with_underscores 样式; findParent是BeautifulSoup 3拼写has been deprecated

演示:

>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <monograph>
...     <section id="1234">
...         <head>Test Heading</head>
...         <p>Here's a paragraph with some text in it.</p>
...     </section>
... </monograph>
... '''
>>> soup = BeautifulSoup(sample, 'xml')
>>> str(soup.p)
"<p>Here's a paragraph with some text in it.</p>"
>>> print(soup.p.find_parent(id=True).prettify())
<section id="1234">
 <head>
  Test Heading
 </head>
 <p>
  Here's a paragraph with some text in it.
 </p>
</section>

>>> print(soup.p.find_parent(id=False).prettify())
<monograph>
 <section id="1234">
  <head>
   Test Heading
  </head>
  <p>
   Here's a paragraph with some text in it.
  </p>
 </section>
</monograph>