如何重复插入<button class="accordion"> </button>
我有以下代码,我想使用beautifulsoap库中的wrap函数向每个标题添加一个按钮。
我尝试遍历h2对象,从每个标题行向上查找父级3层,然后插入button标记。但是,该逻辑不适用于wrap函数。一个按钮放置在两个标题上,代码的结构发生了变化。
有人可以在这里解释自动换行功能的机制/请更正所使用的逻辑吗?
输入
<html>
<body>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-1">
Header 1
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p TEXT_1
</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-2">
Header 2
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
</div>
</body>
</html>
代码:
from bs4 import BeautifulSoup
soup_2 = BeautifulSoup(open('snippet_test.html'), 'html.parser')
h2s = soup_2.find_all("h2")
wrapper = soup_2.new_tag('button', **{"class": "accordion"})
for h_2 in h2s:
h_2.parent.parent.wrap(wrapper)
html = soup_2.prettify("utf-8")
with open("snippet.html", "wb") as file:
file.write(html)
输出(仅一个按钮放置不正确,意外更改了代码):
<html>
<body>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p <="" p="" text_1="">
</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<button class="accordion">
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-1">
Header 1
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-2">
Header 2
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
</button>
</div>
</body>
</html>
所需的输出:
<html>
<body>
<button class="accordion" >
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-1">
Header 1
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
</div>
</button>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p TEXT_1
</p>
</div>
</div>
</div>
<button class="accordion" >
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Header-2">
Header 2
<a class="anchor-link" href="#Header-1">
</a>
</h2>
</div>
</div>
</div>
</button>
</body>
</html>
答案 0 :(得分:1)
您仅创建一个包装,而您使用了两次。您需要创建两个对象,每个标头一个。我也想增加一个“ .parent”
我的代码:
from bs4 import BeautifulSoup
soup_2 = BeautifulSoup(open('snippet_test.html'), 'html.parser')
h2s = soup_2.find_all("h2")
for h_2 in h2s:
wrapper = soup_2.new_tag('button', **{"class": "accordion"})
h_2.parent.parent.parent.wrap(wrapper)
html = soup_2.prettify("utf-8")
with open("snippet.html", "wb") as file:
file.write(html)