在新标签中包围兄弟姐妹序列

时间:2020-04-02 09:17:53

标签: python beautifulsoup

我有一个HTML文档,如下所示:

import bs4 as bs
content = '''\
<p class="foo">Introductory text</p>
  <p class="bar">quoted material p1</p>
  <p class="bar">quoted material p2</p>
<p class="foo">discussion of quoted material</p>
<p class="foo">more text</p>
  <p class="bar">short quote</p>
<p class="foo">discussion of quoted material</p>'''
soup = bs.BeautifulSoup(content)

我想在<p>标签中包围类为bar的{​​{1}}个项目。但是,当多个blockquote依次出现时,我希望将它们视为一个blockquote。所需的输出:

<p class="bar">

以下解决方案是不可接受的:

<p class="foo">Introductory text</p>
<blockquote>
  <p class="bar">quoted material p1</p>
  <p class="bar">quoted material p2</p>
</blockquote>
<p class="foo">discussion of quoted material</p>
<p class="foo">more text</p>
<blockquote>
  <p class="bar">short quote</p>
</blockquote>
<p class="foo">discussion of quoted material</p>

收益

for bar in soup.find_all('p', {'class': 'bar'}):
    bar.wrap(soup.new_tag('blockquote'))

一个想法是使用<html><body><p class="foo">Introductory text</p> <blockquote><p class="bar">quoted material p1</p></blockquote> <blockquote><p class="bar">quoted material p2</p></blockquote> <p class="foo">discussion of quoted material</p> <p class="foo">more text</p> <blockquote><p class="bar">short quote</p></blockquote> <p class="foo">discussion of quoted material</p></body></html> .find_next()insert_before()方法,但是我无法让bs4分别插入insert_after()标记的每个“一半”( related

0 个答案:

没有答案