将标记字符串附加到BeautifulSoup中的标记

时间:2014-11-18 01:09:15

标签: python string beautifulsoup html-parsing markup

是否可以将标记设置为标记内容(类似于在JavaScript中设置innerHtml)?

为了举例,我们假设我要向<a>添加10个<div>元素,但用逗号分隔它们:

soup = BeautifulSoup(<<some document here>>)

a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings
div = soup.new_tag("div")
a_str = ",".join(a_tags)

使用div.append(a_str)<>转义为&lt;&gt;,最后我

<div> &lt;a1&gt; 1 &lt;/a&gt; ... </div>

BeautifulSoup(a_str)将此内容包含在<html>中,我认为将树从其中删除是一种不雅的黑客。

怎么办?

1 个答案:

答案 0 :(得分:6)

您需要在包含链接的BeautifulSoup字符串中创建HTML对象:

from bs4 import BeautifulSoup

soup = BeautifulSoup()
div = soup.new_tag('div')

a_tags = ["<a>1</a>", "<a>2</a>", "<a>3</a>", "<a>4</a>", "<a>5</a>"]
a_str = ",".join(a_tags)

div.append(BeautifulSoup(a_str, 'html.parser'))

soup.append(div)
print soup

打印:

<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>

替代解决方案:

为每个链接创建一个Tag并将其附加到div。另外,在除了last:

之外的每个链接后面添加逗号
from bs4 import BeautifulSoup

soup = BeautifulSoup()
div = soup.new_tag('div')

for x in xrange(1, 6):
    link = soup.new_tag('a')
    link.string = str(x)
    div.append(link)

    # do not append comma after the last element
    if x != 6:
        div.append(",")

soup.append(div)

print soup

打印:

<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>