我希望自动在两个段落之间插入一个html标签,用于数千个类似的页面。代码片段是这样的(必须在标题类的段落之后插入新标记):
<p align="center"><span class="header">My Title</span></p>
{insert new tag <article> here}
<p align="center">bla-bla-bla</p>
我正在使用Python和美丽的汤。我的困难是找到要插入的位置以及如何在两段之间插入。这是我的代码到目前为止还没有成功。有什么帮助吗?
soup = BeautifulSoup(page, 'html.parser')
cells = soup.findAll('p', attrs={"class":"header"})
index=str(cells).index('</p><p>') # search location between two paragraphs
output_line = cells[:index] + '<article> ' + cells[index:]
答案 0 :(得分:1)
试试这个:
soup = BeautifulSoup(page, 'html.parser')
p = soup.find('span', {'class': 'header'}).parent
p.insert_after(soup.new_tag('article'))
快速查看BeautifulSoup documentation会为这些事情提供许多有用的辅助方法。
答案 1 :(得分:1)
from bs4 import BeautifulSoup
page = """
<p align="center"><span class="header">My Title1</span></p>
<p align="center">bla-bla-bla</p>
<p align="center"><span class="header">My Title2</span></p>
<p align="center">bla-bla-bla</p>
<p align="center"><span class="header">My Title3</span></p>
<p align="center">bla-bla-bla</p>
"""
soup = BeautifulSoup(page, "html.parser")
for header in soup.find_all('span', class_='header'):
article = soup.new_tag('article')
article.string = 'article content'
header.insert_after(article)
print soup.prettify()
输出:
<p align="center">
<span class="header">
My Title1
</span>
</p>
<article>
article content
</article>
<p align="center">
bla-bla-bla
</p>
<p align="center">
<span class="header">
My Title2
</span>
</p>
<article>
article content
</article>
<p align="center">
bla-bla-bla
</p>
<p align="center">
<span class="header">
My Title3
</span>
</p>
<article>
article content
</article>
<p align="center">
bla-bla-bla
</p>