我有一个HTML文档,如下所示:
import bs4 as bs
content = '''\
<p class="foo">Introductory text</p>
<p class="bar">quoted material p1</p>
<p class="bar">quoted material p2</p>
<p class="foo">discussion of quoted material</p>
<p class="foo">more text</p>
<p class="bar">short quote</p>
<p class="foo">discussion of quoted material</p>'''
soup = bs.BeautifulSoup(content)
我想在<p>
标签中包围类为bar
的{{1}}个项目。但是,当多个blockquote
依次出现时,我希望将它们视为一个blockquote。所需的输出:
<p class="bar">
以下解决方案是不可接受的:
<p class="foo">Introductory text</p>
<blockquote>
<p class="bar">quoted material p1</p>
<p class="bar">quoted material p2</p>
</blockquote>
<p class="foo">discussion of quoted material</p>
<p class="foo">more text</p>
<blockquote>
<p class="bar">short quote</p>
</blockquote>
<p class="foo">discussion of quoted material</p>
收益
for bar in soup.find_all('p', {'class': 'bar'}):
bar.wrap(soup.new_tag('blockquote'))
一个想法是使用<html><body><p class="foo">Introductory text</p>
<blockquote><p class="bar">quoted material p1</p></blockquote>
<blockquote><p class="bar">quoted material p2</p></blockquote>
<p class="foo">discussion of quoted material</p>
<p class="foo">more text</p>
<blockquote><p class="bar">short quote</p></blockquote>
<p class="foo">discussion of quoted material</p></body></html>
,.find_next()
和insert_before()
方法,但是我无法让bs4分别插入insert_after()
标记的每个“一半”( related。