python-BeautifulSoup-wrap()-将元素添加到多个部分

时间:2018-11-13 15:19:45

标签: python html beautifulsoup

如何重复插入<button class="accordion"> </button>

我有以下代码,我想使用beautifulsoap库中的wrap函数向每个标题添加一个按钮。

我尝试遍历h2对象,从每个标题行向上查找父级3层,然后插入button标记。但是,该逻辑不适用于wrap函数。一个按钮放置在两个标题上,代码的结构发生了变化。
有人可以在这里解释自动换行功能的机制/请更正所使用的逻辑吗?

输入

<html>
<body>    
    <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <h2 id="Header-1">
        Header 1
        <a class="anchor-link" href="#Header-1">
        </a>
       </h2>
      </div>
     </div>
    </div>
    <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <p TEXT_1
       </p>
      </div>
     </div>
    </div>  
     <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <h2 id="Header-2">
        Header 2
        <a class="anchor-link" href="#Header-1">
        </a>
       </h2>
      </div>
     </div>
    </div> 
</body>
</html>

代码:

from bs4 import BeautifulSoup

soup_2 = BeautifulSoup(open('snippet_test.html'), 'html.parser')
h2s = soup_2.find_all("h2")
wrapper = soup_2.new_tag('button', **{"class": "accordion"})

for h_2 in h2s:    
     h_2.parent.parent.wrap(wrapper)

html = soup_2.prettify("utf-8")
with open("snippet.html", "wb") as file:
file.write(html)

输出(仅一个按钮放置不正确,意外更改了代码):

<html>
 <body>
  <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
  </div>
  <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
    <div class="text_cell_render border-box-sizing rendered_html">
     <p <="" p="" text_1="">
     </p>
    </div>
   </div>
  </div>
  <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <button class="accordion">
    <div class="inner_cell">
     <div class="text_cell_render border-box-sizing rendered_html">
      <h2 id="Header-1">
       Header 1
       <a class="anchor-link" href="#Header-1">
       </a>
      </h2>
     </div>
    </div>
    <div class="inner_cell">
     <div class="text_cell_render border-box-sizing rendered_html">
      <h2 id="Header-2">
       Header 2
       <a class="anchor-link" href="#Header-1">
       </a>
      </h2>
     </div>
    </div>
   </button>
  </div>
 </body>
</html>

所需的输出:

<html>
<body>
    <button class="accordion" >    
    <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <h2 id="Header-1">
        Header 1
        <a class="anchor-link" href="#Header-1">
        </a>
       </h2>
      </div>
     </div>
    </div>
    </button>
    <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <p TEXT_1
       </p>
      </div>
     </div>
    </div>
    <button class="accordion" >     
     <div class="cell border-box-sizing text_cell rendered">
     <div class="prompt input_prompt">
     </div>
     <div class="inner_cell">
      <div class="text_cell_render border-box-sizing rendered_html">
       <h2 id="Header-2">
        Header 2
        <a class="anchor-link" href="#Header-1">
        </a>
       </h2>
      </div>
     </div>
    </div> 
    </button>
</body>
</html>

1 个答案:

答案 0 :(得分:1)

您仅创建一个包装,而您使用了两次。您需要创建两个对象,每个标头一个。我也想增加一个“ .parent”

我的代码:

from bs4 import BeautifulSoup

soup_2 = BeautifulSoup(open('snippet_test.html'), 'html.parser')
h2s = soup_2.find_all("h2")

for h_2 in h2s:
    wrapper = soup_2.new_tag('button', **{"class": "accordion"})
    h_2.parent.parent.parent.wrap(wrapper)

html = soup_2.prettify("utf-8")
with open("snippet.html", "wb") as file:
    file.write(html)