在标记里面的美丽的汤标记

时间:2018-02-07 19:53:06

标签: python-3.x beautifulsoup

我试图将新链接添加为无序列表元素。

但是我无法在另一个用Beautiful Soup添加标签。

with open('index.html') as fp:
    soup = BeautifulSoup(fp, 'html.parser')

a = soup.select_one("id[class=pr]")
ntag1 = soup.new_tag("a", href="hm/test")
ntag1.string = 'TEST'
... (part with problem)
a.insert_after(ntag2)

ntag1必须留在"<li>"内,所以我试过

   ntag2 = ntag1.new_tag('li')  
   TypeError: 'NoneType' object is not callable

with wrap()

 ntag2 = ntag1.wrap('li')
   ValueError: Cannot replace one element with another when theelement to be replaced is not part of a tree.

原始HMTL

<id class="pr">
    </id>
    <li>
     <a href="pr/protocol">
      protocol
     </a>

理想的html输出

<id class="pr">
</id>
<li>
 <a href="hm/test">
  TEST
 </a>
</li>
<li>
 <a href="pr/protocol">
  protocol
 </a>
</li>

1 个答案:

答案 0 :(得分:2)

为什么您收到NoneType错误是因为ntag2 = ntag1.new_tag('li')正在尝试调用Tag对象没有的方法。

Cannot replace one element with another when theelement 是因为您创建了一个与树没有关联的标记,它有无父,如果你有没有父正试图包裹

创建父li 更有意义,只需附加锚子

html = """<div class="pr">
</div>
<li>
 <a href="pr/protocol">
  protocol
 </a>
 </li>"""

soup = BeautifulSoup(html, "lxml")

a = soup.select_one("div[class=pr]")

# Li parent
parent = soup.new_tag("li", class_="parent")
# Child anchor
child = soup.new_tag("a", href="hm/test", class_="child")
child.string = 'TEST'
# Append child to parent
parent.append(child)
# Insert parent
a.insert_after(parent)
print(soup.prettify())

这将为您提供您想要的输出,以阻止html无效。

如果你有一个实际的ul,你想要在一个元素之后,即

html = """<div class="pr">
    </div>
    <ul>
        <li>
          <a href="pr/protocol">
          protocol
          </a>
         </li>
     </ul>
     """

将css选择器设置为div[class=pr] + ul"并插入父级:

a = soup.select_one("div[class=pr] + ul")
.....
a.insert(0, parent)
print(soup.prettify())

哪会给你:

<html>
 <body>
  <div class="pr">
  </div>
  <ul>
   <li class_="parent">
    <a class_="child" href="hm/test">
     TEST
    </a>
   </li>
   <li>
    <a href="pr/protocol">
     protocol
    </a>
   </li>
  </ul>
 </body>
</html>

如果您想换行一个现有标签:

from bs4 import BeautifulSoup, Tag

html = """<div class="pr">
    </div>
     <a href="pr/protocol">
          protocol
     """

soup = BeautifulSoup(html, "lxml")

a = soup.select_one("div[class=pr] + a")
a.wrap(Tag(name="div"))
print(soup.prettify())

哪个会包装现有的锚:

<html>
 <body>
  <div class="pr">
  </div>
  <div>
   <a href="pr/protocol">
    protocol
   </a>
  </div>
 </body>
</html>