Question

当前，有一个标头标签，其内容已附加到该标头标签上。我需要通过将标题保留在单独的段落标签中来将其标题与内容分开。

block_tag = <p>1.1 <u>Header Information</u>.  Content of the header with multiple lines</p>

type(block_tag)
<class 'bs4.element.Tag'>

标头应包含在或标记中

预期结果：

block_tag
<p>1.1 <u>Header Information</u>.</p><p>  Content of the header with multiple lines</p>

到目前为止，我已经尝试使用-

添加段落标签

new_tag（“ p”）创建。需要反向标记<\p>

方法1

para_tag = soup.new_tag("p")
block_tag.insert(2,para_tag)
block_tag
<p>1.1 <u>Header Information</u>. <p></p> Content of the header with multiple lines</p>

方法2

block_tag.insert(2,"<\p><p>")
block_tag
<p>1.1 <u>Header Information</u>&lt;\p&gt;&lt;p&gt;.  Content of the header with multiple lines</p>

谢谢

Answer 1

您可以在标头和 wrap 之后的新p标记中获取其余内容。然后从原始标签和extract原始标签开始 insert_after 。

from bs4 import BeautifulSoup
html="""
<p>1.1 <u>Header Information</u>.  Content of the header with multiple lines</p>
"""
soup=BeautifulSoup(html,'html.parser')
block_tag=soup.find('p')
remaining=block_tag.contents[-1]
new_tag=remaining.wrap(soup.new_tag("p"))
block_tag.insert_after(new_tag.extract())
print(soup)

输出：

<p>1.1 <u>Header Information</u></p><p>.  Content of the header with multiple lines</p>

除了停止之外，几乎是完美的。

注意：我不确定Content of the header with multiple lines是什么，但不要将其视为确切答案。您可能需要即兴创作。

BeautifulSoup4：需要添加反段落标记以将字段分为两段

1 个答案: